NASA Astrophysics Data System (ADS)
Niu, Yingli; Li, Wenqiang; Peng, Qian; Geng, Hua; Yi, Yuanping; Wang, Linjun; Nan, Guangjun; Wang, Dong; Shuai, Zhigang
2018-04-01
MOlecular MAterials Property Prediction Package (MOMAP) is a software toolkit for molecular materials property prediction. It focuses on luminescent properties and charge mobility properties. This article contains a brief descriptive introduction of key features, theoretical models and algorithms of the software, together with examples that illustrate the performance. First, we present the theoretical models and algorithms for molecular luminescent properties calculation, which includes the excited-state radiative/non-radiative decay rate constant and the optical spectra. Then, a multi-scale simulation approach and its algorithm for the molecular charge mobility are described. This approach is based on hopping model and combines with Kinetic Monte Carlo and molecular dynamics simulations, and it is especially applicable for describing a large category of organic semiconductors, whose inter-molecular electronic coupling is much smaller than intra-molecular charge reorganisation energy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Tong; Gu, YuanTong, E-mail: yuantong.gu@qut.edu.au
As all-atom molecular dynamics method is limited by its enormous computational cost, various coarse-grained strategies have been developed to extend the length scale of soft matters in the modeling of mechanical behaviors. However, the classical thermostat algorithm in highly coarse-grained molecular dynamics method would underestimate the thermodynamic behaviors of soft matters (e.g. microfilaments in cells), which can weaken the ability of materials to overcome local energy traps in granular modeling. Based on all-atom molecular dynamics modeling of microfilament fragments (G-actin clusters), a new stochastic thermostat algorithm is developed to retain the representation of thermodynamic properties of microfilaments at extra coarse-grainedmore » level. The accuracy of this stochastic thermostat algorithm is validated by all-atom MD simulation. This new stochastic thermostat algorithm provides an efficient way to investigate the thermomechanical properties of large-scale soft matters.« less
A fast recursive algorithm for molecular dynamics simulation
NASA Technical Reports Server (NTRS)
Jain, A.; Vaidehi, N.; Rodriguez, G.
1993-01-01
The present recursive algorithm for solving molecular systems' dynamical equations of motion employs internal variable models that reduce such simulations' computation time by an order of magnitude, relative to Cartesian models. Extensive use is made of spatial operator methods recently developed for analysis and simulation of the dynamics of multibody systems. A factor-of-450 speedup over the conventional O(N-cubed) algorithm is demonstrated for the case of a polypeptide molecule with 400 residues.
Xiao, Li; Cai, Qin; Li, Zhilin; Zhao, Hongkai; Luo, Ray
2014-11-25
A multi-scale framework is proposed for more realistic molecular dynamics simulations in continuum solvent models by coupling a molecular mechanics treatment of solute with a fluid mechanics treatment of solvent. This article reports our initial efforts to formulate the physical concepts necessary for coupling the two mechanics and develop a 3D numerical algorithm to simulate the solvent fluid via the Navier-Stokes equation. The numerical algorithm was validated with multiple test cases. The validation shows that the algorithm is effective and stable, with observed accuracy consistent with our design.
NASA Technical Reports Server (NTRS)
Jain, Abhinandan
2011-01-01
Ndarts software provides algorithms for computing quantities associated with the dynamics of articulated, rigid-link, multibody systems. It is designed as a general-purpose dynamics library that can be used for the modeling of robotic platforms, space vehicles, molecular dynamics, and other such applications. The architecture and algorithms in Ndarts are based on the Spatial Operator Algebra (SOA) theory for computational multibody and robot dynamics developed at JPL. It uses minimal, internal coordinate models. The algorithms are low-order, recursive scatter/ gather algorithms. In comparison with the earlier Darts++ software, this version has a more general and cleaner design needed to support a larger class of computational dynamics needs. It includes a frames infrastructure, allows algorithms to operate on subgraphs of the system, and implements lazy and deferred computation for better efficiency. Dynamics modeling modules such as Ndarts are core building blocks of control and simulation software for space, robotic, mechanism, bio-molecular, and material systems modeling.
Xiao, Li; Cai, Qin; Li, Zhilin; Zhao, Hongkai; Luo, Ray
2014-01-01
A multi-scale framework is proposed for more realistic molecular dynamics simulations in continuum solvent models by coupling a molecular mechanics treatment of solute with a fluid mechanics treatment of solvent. This article reports our initial efforts to formulate the physical concepts necessary for coupling the two mechanics and develop a 3D numerical algorithm to simulate the solvent fluid via the Navier-Stokes equation. The numerical algorithm was validated with multiple test cases. The validation shows that the algorithm is effective and stable, with observed accuracy consistent with our design. PMID:25404761
Modeling of diatomic molecule using the Morse potential and the Verlet algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fidiani, Elok
Performing molecular modeling usually uses special software for Molecular Dynamics (MD) such as: GROMACS, NAMD, JMOL etc. Molecular dynamics is a computational method to calculate the time dependent behavior of a molecular system. In this work, MATLAB was used as numerical method for a simple modeling of some diatomic molecules: HCl, H{sub 2} and O{sub 2}. MATLAB is a matrix based numerical software, in order to do numerical analysis, all the functions and equations describing properties of atoms and molecules must be developed manually in MATLAB. In this work, a Morse potential was generated to describe the bond interaction betweenmore » the two atoms. In order to analyze the simultaneous motion of molecules, the Verlet Algorithm derived from Newton’s Equations of Motion (classical mechanics) was operated. Both the Morse potential and the Verlet algorithm were integrated using MATLAB to derive physical properties and the trajectory of the molecules. The data computed by MATLAB is always in the form of a matrix. To visualize it, Visualized Molecular Dynamics (VMD) was performed. Such method is useful for development and testing some types of interaction on a molecular scale. Besides, this can be very helpful for describing some basic principles of molecular interaction for educational purposes.« less
ePMV embeds molecular modeling into professional animation software environments.
Johnson, Graham T; Autin, Ludovic; Goodsell, David S; Sanner, Michel F; Olson, Arthur J
2011-03-09
Increasingly complex research has made it more difficult to prepare data for publication, education, and outreach. Many scientists must also wade through black-box code to interface computational algorithms from diverse sources to supplement their bench work. To reduce these barriers we have developed an open-source plug-in, embedded Python Molecular Viewer (ePMV), that runs molecular modeling software directly inside of professional 3D animation applications (hosts) to provide simultaneous access to the capabilities of these newly connected systems. Uniting host and scientific algorithms into a single interface allows users from varied backgrounds to assemble professional quality visuals and to perform computational experiments with relative ease. By enabling easy exchange of algorithms, ePMV can facilitate interdisciplinary research, smooth communication between broadly diverse specialties, and provide a common platform to frame and visualize the increasingly detailed intersection(s) of cellular and molecular biology. Copyright © 2011 Elsevier Ltd. All rights reserved.
ePMV Embeds Molecular Modeling into Professional Animation Software Environments
Johnson, Graham T.; Autin, Ludovic; Goodsell, David S.; Sanner, Michel F.; Olson, Arthur J.
2011-01-01
SUMMARY Increasingly complex research has made it more difficult to prepare data for publication, education, and outreach. Many scientists must also wade through black-box code to interface computational algorithms from diverse sources to supplement their bench work. To reduce these barriers, we have developed an open-source plug-in, embedded Python Molecular Viewer (ePMV), that runs molecular modeling software directly inside of professional 3D animation applications (hosts) to provide simultaneous access to the capabilities of these newly connected systems. Uniting host and scientific algorithms into a single interface allows users from varied backgrounds to assemble professional quality visuals and to perform computational experiments with relative ease. By enabling easy exchange of algorithms, ePMV can facilitate interdisciplinary research, smooth communication between broadly diverse specialties and provide a common platform to frame and visualize the increasingly detailed intersection(s) of cellular and molecular biology. PMID:21397181
Logic circuits based on molecular spider systems.
Mo, Dandan; Lakin, Matthew R; Stefanovic, Darko
2016-08-01
Spatial locality brings the advantages of computation speed-up and sequence reuse to molecular computing. In particular, molecular walkers that undergo localized reactions are of interest for implementing logic computations at the nanoscale. We use molecular spider walkers to implement logic circuits. We develop an extended multi-spider model with a dynamic environment wherein signal transmission is triggered via localized reactions, and use this model to implement three basic gates (AND, OR, NOT) and a cascading mechanism. We develop an algorithm to automatically generate the layout of the circuit. We use a kinetic Monte Carlo algorithm to simulate circuit computations, and we analyze circuit complexity: our design scales linearly with formula size and has a logarithmic time complexity. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The graph neural network model.
Scarselli, Franco; Gori, Marco; Tsoi, Ah Chung; Hagenbuchner, Markus; Monfardini, Gabriele
2009-01-01
Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) is an element of IR(m) that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.
Stochastic algorithm for simulating gas transport coefficients
NASA Astrophysics Data System (ADS)
Rudyak, V. Ya.; Lezhnev, E. V.
2018-02-01
The aim of this paper is to create a molecular algorithm for modeling the transport processes in gases that will be more efficient than molecular dynamics method. To this end, the dynamics of molecules are modeled stochastically. In a rarefied gas, it is sufficient to consider the evolution of molecules only in the velocity space, whereas for a dense gas it is necessary to model the dynamics of molecules also in the physical space. Adequate integral characteristics of the studied system are obtained by averaging over a sufficiently large number of independent phase trajectories. The efficiency of the proposed algorithm was demonstrated by modeling the coefficients of self-diffusion and the viscosity of several gases. It was shown that the accuracy comparable to the experimental one can be obtained on a relatively small number of molecules. The modeling accuracy increases with the growth of used number of molecules and phase trajectories.
Validating clustering of molecular dynamics simulations using polymer models.
Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn
2011-11-14
Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
NASA Astrophysics Data System (ADS)
Vijaykumar, Adithya; Ouldridge, Thomas E.; ten Wolde, Pieter Rein; Bolhuis, Peter G.
2017-03-01
The modeling of complex reaction-diffusion processes in, for instance, cellular biochemical networks or self-assembling soft matter can be tremendously sped up by employing a multiscale algorithm which combines the mesoscopic Green's Function Reaction Dynamics (GFRD) method with explicit stochastic Brownian, Langevin, or deterministic molecular dynamics to treat reactants at the microscopic scale [A. Vijaykumar, P. G. Bolhuis, and P. R. ten Wolde, J. Chem. Phys. 143, 214102 (2015)]. Here we extend this multiscale MD-GFRD approach to include the orientational dynamics that is crucial to describe the anisotropic interactions often prevalent in biomolecular systems. We present the novel algorithm focusing on Brownian dynamics only, although the methodology is generic. We illustrate the novel algorithm using a simple patchy particle model. After validation of the algorithm, we discuss its performance. The rotational Brownian dynamics MD-GFRD multiscale method will open up the possibility for large scale simulations of protein signalling networks.
Kulasiri, Don
2011-01-01
We discuss the quantification of molecular fluctuations in the biochemical reaction systems within the context of intracellular processes associated with gene expression. We take the molecular reactions pertaining to circadian rhythms to develop models of molecular fluctuations in this chapter. There are a significant number of studies on stochastic fluctuations in intracellular genetic regulatory networks based on single cell-level experiments. In order to understand the fluctuations associated with the gene expression in circadian rhythm networks, it is important to model the interactions of transcriptional factors with the E-boxes in the promoter regions of some of the genes. The pertinent aspects of a near-equilibrium theory that would integrate the thermodynamical and particle dynamic characteristics of intracellular molecular fluctuations would be discussed, and the theory is extended by using the theory of stochastic differential equations. We then model the fluctuations associated with the promoter regions using general mathematical settings. We implemented ubiquitous Gillespie's algorithms, which are used to simulate stochasticity in biochemical networks, for each of the motifs. Both the theory and the Gillespie's algorithms gave the same results in terms of the time evolution of means and variances of molecular numbers. As biochemical reactions occur far away from equilibrium-hence the use of the Gillespie algorithm-these results suggest that the near-equilibrium theory should be a good approximation for some of the biochemical reactions. © 2011 Elsevier Inc. All rights reserved.
Concepts and applications of "natural computing" techniques in de novo drug and peptide design.
Hiss, Jan A; Hartenfeller, Markus; Schneider, Gisbert
2010-05-01
Evolutionary algorithms, particle swarm optimization, and ant colony optimization have emerged as robust optimization methods for molecular modeling and peptide design. Such algorithms mimic combinatorial molecule assembly by using molecular fragments as building-blocks for compound construction, and relying on adaptation and emergence of desired pharmacological properties in a population of virtual molecules. Nature-inspired algorithms might be particularly suited for bioisosteric replacement or scaffold-hopping from complex natural products to synthetically more easily accessible compounds that are amenable to optimization by medicinal chemistry. The theory and applications of selected nature-inspired algorithms for drug design are reviewed, together with practical applications and a discussion of their advantages and limitations.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-07-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
An improved molecular dynamics algorithm to study thermodiffusion in binary hydrocarbon mixtures
NASA Astrophysics Data System (ADS)
Antoun, Sylvie; Saghir, M. Ziad; Srinivasan, Seshasai
2018-03-01
In multicomponent liquid mixtures, the diffusion flow of chemical species can be induced by temperature gradients, which leads to a separation of the constituent components. This cross effect between temperature and concentration is known as thermodiffusion or the Ludwig-Soret effect. The performance of boundary driven non-equilibrium molecular dynamics along with the enhanced heat exchange (eHEX) algorithm was studied by assessing the thermodiffusion process in n-pentane/n-decane (nC5-nC10) binary mixtures. The eHEX algorithm consists of an extended version of the HEX algorithm with an improved energy conservation property. In addition to this, the transferable potentials for phase equilibria-united atom force field were employed in all molecular dynamics (MD) simulations to precisely model the molecular interactions in the fluid. The Soret coefficients of the n-pentane/n-decane (nC5-nC10) mixture for three different compositions (at 300.15 K and 0.1 MPa) were calculated and compared with the experimental data and other MD results available in the literature. Results of our newly employed MD algorithm showed great agreement with experimental data and a better accuracy compared to other MD procedures.
Lapierre-Landry, Maryse; Tucker-Schwartz, Jason M.; Skala, Melissa C.
2016-01-01
Photothermal OCT (PT-OCT) is an emerging molecular imaging technique that occupies a spatial imaging regime between microscopy and whole body imaging. PT-OCT would benefit from a theoretical model to optimize imaging parameters and test image processing algorithms. We propose the first analytical PT-OCT model to replicate an experimental A-scan in homogeneous and layered samples. We also propose the PT-CLEAN algorithm to reduce phase-accumulation and shadowing, two artifacts found in PT-OCT images, and demonstrate it on phantoms and in vivo mouse tumors. PMID:27446693
Žuvela, Petar; Liu, J Jay; Macur, Katarzyna; Bączek, Tomasz
2015-10-06
In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (FPA), was compared in molecular descriptor selection for development of quantitative structure-retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.
Simulation of aerosol flow interaction with a solid body on molecular level
NASA Astrophysics Data System (ADS)
Amelyushkin, Ivan A.; Stasenko, Albert L.
2018-05-01
Physico-mathematical models and numerical algorithm of two-phase flow interaction with a solid body are developed. Results of droplet motion and its impingement upon a rough surface in real gas boundary layer simulation on the molecular level obtained via molecular dynamics technique are presented.
Markov-modulated Markov chains and the covarion process of molecular evolution.
Galtier, N; Jean-Marie, A
2004-01-01
The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models.
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta.
Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J
2010-03-01
PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta
Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J.
2010-01-01
Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site. Contact: pyrosetta@graylab.jhu.edu PMID:20061306
System Design for Nano-Network Communications
NASA Astrophysics Data System (ADS)
ShahMohammadian, Hoda
The potential applications of nanotechnology in a wide range of areas necessities nano-networking research. Nano-networking is a new type of networking which has emerged by applying nanotechnology to communication theory. Therefore, this dissertation presents a framework for physical layer communications in a nano-network and addresses some of the pressing unsolved challenges in designing a molecular communication system. The contribution of this dissertation is proposing well-justified models for signal propagation, noise sources, optimum receiver design and synchronization in molecular communication channels. The design of any communication system is primarily based on the signal propagation channel and noise models. Using the Brownian motion and advection molecular statistics, separate signal propagation and noise models are presented for diffusion-based and flow-based molecular communication channels. It is shown that the corrupting noise of molecular channels is uncorrelated and non-stationary with a signal dependent magnitude. The next key component of any communication system is the reception and detection process. This dissertation provides a detailed analysis of the effect of the ligand-receptor binding mechanism on the received signal, and develops the first optimal receiver design for molecular communications. The bit error rate performance of the proposed receiver is evaluated and the impact of medium motion on the receiver performance is investigated. Another important feature of any communication system is synchronization. In this dissertation, the first blind synchronization algorithm is presented for the molecular communication channels. The proposed algorithm uses a non-decision directed maximum likelihood criterion for estimating the channel delay. The Cramer-Rao lower bound is also derived and the performance of the proposed synchronization algorithm is evaluated by investigating its mean square error.
Machine learning of molecular properties: Locality and active learning
NASA Astrophysics Data System (ADS)
Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.
2018-06-01
In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.
NASA Astrophysics Data System (ADS)
Kumar, Rohit; Puri, Rajeev K.
2018-03-01
Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-01-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
Veliz-Cuba, Alan; Aguilar, Boris; Hinkelmann, Franziska; Laubenbacher, Reinhard
2014-06-26
A key problem in the analysis of mathematical models of molecular networks is the determination of their steady states. The present paper addresses this problem for Boolean network models, an increasingly popular modeling paradigm for networks lacking detailed kinetic information. For small models, the problem can be solved by exhaustive enumeration of all state transitions. But for larger models this is not feasible, since the size of the phase space grows exponentially with the dimension of the network. The dimension of published models is growing to over 100, so that efficient methods for steady state determination are essential. Several methods have been proposed for large networks, some of them heuristic. While these methods represent a substantial improvement in scalability over exhaustive enumeration, the problem for large networks is still unsolved in general. This paper presents an algorithm that consists of two main parts. The first is a graph theoretic reduction of the wiring diagram of the network, while preserving all information about steady states. The second part formulates the determination of all steady states of a Boolean network as a problem of finding all solutions to a system of polynomial equations over the finite number system with two elements. This problem can be solved with existing computer algebra software. This algorithm compares favorably with several existing algorithms for steady state determination. One advantage is that it is not heuristic or reliant on sampling, but rather determines algorithmically and exactly all steady states of a Boolean network. The code for the algorithm, as well as the test suite of benchmark networks, is available upon request from the corresponding author. The algorithm presented in this paper reliably determines all steady states of sparse Boolean networks with up to 1000 nodes. The algorithm is effective at analyzing virtually all published models even those of moderate connectivity. The problem for large Boolean networks with high average connectivity remains an open problem.
2014-01-01
Background A key problem in the analysis of mathematical models of molecular networks is the determination of their steady states. The present paper addresses this problem for Boolean network models, an increasingly popular modeling paradigm for networks lacking detailed kinetic information. For small models, the problem can be solved by exhaustive enumeration of all state transitions. But for larger models this is not feasible, since the size of the phase space grows exponentially with the dimension of the network. The dimension of published models is growing to over 100, so that efficient methods for steady state determination are essential. Several methods have been proposed for large networks, some of them heuristic. While these methods represent a substantial improvement in scalability over exhaustive enumeration, the problem for large networks is still unsolved in general. Results This paper presents an algorithm that consists of two main parts. The first is a graph theoretic reduction of the wiring diagram of the network, while preserving all information about steady states. The second part formulates the determination of all steady states of a Boolean network as a problem of finding all solutions to a system of polynomial equations over the finite number system with two elements. This problem can be solved with existing computer algebra software. This algorithm compares favorably with several existing algorithms for steady state determination. One advantage is that it is not heuristic or reliant on sampling, but rather determines algorithmically and exactly all steady states of a Boolean network. The code for the algorithm, as well as the test suite of benchmark networks, is available upon request from the corresponding author. Conclusions The algorithm presented in this paper reliably determines all steady states of sparse Boolean networks with up to 1000 nodes. The algorithm is effective at analyzing virtually all published models even those of moderate connectivity. The problem for large Boolean networks with high average connectivity remains an open problem. PMID:24965213
Bayesian estimation of realized stochastic volatility model by Hybrid Monte Carlo algorithm
NASA Astrophysics Data System (ADS)
Takaishi, Tetsuya
2014-03-01
The hybrid Monte Carlo algorithm (HMCA) is applied for Bayesian parameter estimation of the realized stochastic volatility (RSV) model. Using the 2nd order minimum norm integrator (2MNI) for the molecular dynamics (MD) simulation in the HMCA, we find that the 2MNI is more efficient than the conventional leapfrog integrator. We also find that the autocorrelation time of the volatility variables sampled by the HMCA is very short. Thus it is concluded that the HMCA with the 2MNI is an efficient algorithm for parameter estimations of the RSV model.
NASA Astrophysics Data System (ADS)
Waldmann, I. P.
2016-04-01
Here, we introduce the RobERt (Robotic Exoplanet Recognition) algorithm for the classification of exoplanetary emission spectra. Spectral retrieval of exoplanetary atmospheres frequently requires the preselection of molecular/atomic opacities to be defined by the user. In the era of open-source, automated, and self-sufficient retrieval algorithms, manual input should be avoided. User dependent input could, in worst-case scenarios, lead to incomplete models and biases in the retrieval. The RobERt algorithm is based on deep-belief neural (DBN) networks trained to accurately recognize molecular signatures for a wide range of planets, atmospheric thermal profiles, and compositions. Reconstructions of the learned features, also referred to as the “dreams” of the network, indicate good convergence and an accurate representation of molecular features in the DBN. Using these deep neural networks, we work toward retrieval algorithms that themselves understand the nature of the observed spectra, are able to learn from current and past data, and make sensible qualitative preselections of atmospheric opacities to be used for the quantitative stage of the retrieval process.
Modelling the spread of innovation in wild birds.
Shultz, Thomas R; Montrey, Marcel; Aplin, Lucy M
2017-06-01
We apply three plausible algorithms in agent-based computer simulations to recent experiments on social learning in wild birds. Although some of the phenomena are simulated by all three learning algorithms, several manifestations of social conformity bias are simulated by only the approximate majority (AM) algorithm, which has roots in chemistry, molecular biology and theoretical computer science. The simulations generate testable predictions and provide several explanatory insights into the diffusion of innovation through a population. The AM algorithm's success raises the possibility of its usefulness in studying group dynamics more generally, in several different scientific domains. Our differential-equation model matches simulation results and provides mathematical insights into the dynamics of these algorithms. © 2017 The Author(s).
Damrath, Martin; Korte, Sebastian; Hoeher, Peter Adam
2017-01-01
This paper introduces the equivalent discrete-time channel model (EDTCM) to the area of diffusion-based molecular communication (DBMC). Emphasis is on an absorbing receiver, which is based on the so-called first passage time concept. In the wireless communications community the EDTCM is well known. Therefore, it is anticipated that the EDTCM improves the accessibility of DBMC and supports the adaptation of classical wireless communication algorithms to the area of DBMC. Furthermore, the EDTCM has the capability to provide a remarkable reduction of computational complexity compared to random walk based DBMC simulators. Besides the exact EDTCM, three approximations thereof based on binomial, Gaussian, and Poisson approximation are proposed and analyzed in order to further reduce computational complexity. In addition, the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm is adapted to all four channel models. Numerical results show the performance of the exact EDTCM, illustrate the performance of the adapted BCJR algorithm, and demonstrate the accuracy of the approximations.
Szostak, Justyna; Martin, Florian; Talikka, Marja; Peitsch, Manuel C; Hoeng, Julia
2016-01-01
The cellular and molecular mechanisms behind the process of atherosclerotic plaque destabilization are complex, and molecular data from aortic plaques are difficult to interpret. Biological network models may overcome these difficulties and precisely quantify the molecular mechanisms impacted during disease progression. The atherosclerosis plaque destabilization biological network model was constructed with the semiautomated curation pipeline, BELIEF. Cellular and molecular mechanisms promoting plaque destabilization or rupture were captured in the network model. Public transcriptomic data sets were used to demonstrate the specificity of the network model and to capture the different mechanisms that were impacted in ApoE -/- mouse aorta at 6 and 32 weeks. We concluded that network models combined with the network perturbation amplitude algorithm provide a sensitive, quantitative method to follow disease progression at the molecular level. This approach can be used to investigate and quantify molecular mechanisms during plaque progression.
A conceptually and computationally simple method for the definition, display, quantification, and comparison of the shapes of three-dimensional mathematical molecular models is presented. Molecular or solvent-accessible volume and surface area can also be calculated. Algorithms, ...
Coarse-grained molecular dynamics simulations for giant protein-DNA complexes
NASA Astrophysics Data System (ADS)
Takada, Shoji
Biomolecules are highly hierarchic and intrinsically flexible. Thus, computational modeling calls for multi-scale methodologies. We have been developing a coarse-grained biomolecular model where on-average 10-20 atoms are grouped into one coarse-grained (CG) particle. Interactions among CG particles are tuned based on atomistic interactions and the fluctuation matching algorithm. CG molecular dynamics methods enable us to simulate much longer time scale motions of much larger molecular systems than fully atomistic models. After broad sampling of structures with CG models, we can easily reconstruct atomistic models, from which one can continue conventional molecular dynamics simulations if desired. Here, we describe our CG modeling methodology for protein-DNA complexes, together with various biological applications, such as the DNA duplication initiation complex, model chromatins, and transcription factor dynamics on chromatin-like environment.
WAM: an improved algorithm for modelling antibodies on the WEB.
Whitelegg, N R; Rees, A R
2000-12-01
An improved antibody modelling algorithm has been developed which incorporates significant improvements to the earlier versions developed by Martin et al. (1989, 1991), Pedersen et al. (1992) and Rees et al. (1996) and known as AbM (Oxford Molecular). The new algorithm, WAM (for Web Antibody Modelling), has been launched as an online modelling service and is located at URL http://antibody.bath.ac.uk. Here we provide a summary only of the important features of WAM. Readers interested in further details are directed to the website, which gives extensive background information on the methods employed. A brief description of the rationale behind some of the newer methodology (specifically, the knowledge-based screens) is also given.
Kast, Stefan M
2004-03-08
An argument brought forward by Sholl and Fichthorn against the stochastic collision-based constant temperature algorithm for molecular dynamics simulations developed by Kast et al. is refuted. It is demonstrated that the large temperature fluctuations noted by Sholl and Fichthorn are due to improperly chosen initial conditions within their formulation of the algorithm. With the original form or by suitable initialization of their variant no deficient behavior is observed.
Developments in the CCP4 molecular-graphics project.
Potterton, Liz; McNicholas, Stuart; Krissinel, Eugene; Gruber, Jan; Cowtan, Kevin; Emsley, Paul; Murshudov, Garib N; Cohen, Serge; Perrakis, Anastassis; Noble, Martin
2004-12-01
Progress towards structure determination that is both high-throughput and high-value is dependent on the development of integrated and automatic tools for electron-density map interpretation and for the analysis of the resulting atomic models. Advances in map-interpretation algorithms are extending the resolution regime in which fully automatic tools can work reliably, but at present human intervention is required to interpret poor regions of macromolecular electron density, particularly where crystallographic data is only available to modest resolution [for example, I/sigma(I) < 2.0 for minimum resolution 2.5 A]. In such cases, a set of manual and semi-manual model-building molecular-graphics tools is needed. At the same time, converting the knowledge encapsulated in a molecular structure into understanding is dependent upon visualization tools, which must be able to communicate that understanding to others by means of both static and dynamic representations. CCP4 mg is a program designed to meet these needs in a way that is closely integrated with the ongoing development of CCP4 as a program suite suitable for both low- and high-intervention computational structural biology. As well as providing a carefully designed user interface to advanced algorithms of model building and analysis, CCP4 mg is intended to present a graphical toolkit to developers of novel algorithms in these fields.
Density-based cluster algorithms for the identification of core sets
NASA Astrophysics Data System (ADS)
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
Analyzing milestoning networks for molecular kinetics: definitions, algorithms, and examples.
Viswanath, Shruthi; Kreuzer, Steven M; Cardenas, Alfredo E; Elber, Ron
2013-11-07
Network representations are becoming increasingly popular for analyzing kinetic data from techniques like Milestoning, Markov State Models, and Transition Path Theory. Mapping continuous phase space trajectories into a relatively small number of discrete states helps in visualization of the data and in dissecting complex dynamics to concrete mechanisms. However, not only are molecular networks derived from molecular dynamics simulations growing in number, they are also getting increasingly complex, owing partly to the growth in computer power that allows us to generate longer and better converged trajectories. The increased complexity of the networks makes simple interpretation and qualitative insight of the molecular systems more difficult to achieve. In this paper, we focus on various network representations of kinetic data and algorithms to identify important edges and pathways in these networks. The kinetic data can be local and partial (such as the value of rate coefficients between states) or an exact solution to kinetic equations for the entire system (such as the stationary flux between vertices). In particular, we focus on the Milestoning method that provides fluxes as the main output. We proposed Global Maximum Weight Pathways as a useful tool for analyzing molecular mechanism in Milestoning networks. A closely related definition was made in the context of Transition Path Theory. We consider three algorithms to find Global Maximum Weight Pathways: Recursive Dijkstra's, Edge-Elimination, and Edge-List Bisection. The asymptotic efficiency of the algorithms is analyzed and numerical tests on finite networks show that Edge-List Bisection and Recursive Dijkstra's algorithms are most efficient for sparse and dense networks, respectively. Pathways are illustrated for two examples: helix unfolding and membrane permeation. Finally, we illustrate that networks based on local kinetic information can lead to incorrect interpretation of molecular mechanisms.
Chen, Yunjie; Roux, Benoît
2015-08-11
Molecular dynamics (MD) trajectories based on a classical equation of motion provide a straightforward, albeit somewhat inefficient approach, to explore and sample the configurational space of a complex molecular system. While a broad range of techniques can be used to accelerate and enhance the sampling efficiency of classical simulations, only algorithms that are consistent with the Boltzmann equilibrium distribution yield a proper statistical mechanical computational framework. Here, a multiscale hybrid algorithm relying simultaneously on all-atom fine-grained (FG) and coarse-grained (CG) representations of a system is designed to improve sampling efficiency by combining the strength of nonequilibrium molecular dynamics (neMD) and Metropolis Monte Carlo (MC). This CG-guided hybrid neMD-MC algorithm comprises six steps: (1) a FG configuration of an atomic system is dynamically propagated for some period of time using equilibrium MD; (2) the resulting FG configuration is mapped onto a simplified CG model; (3) the CG model is propagated for a brief time interval to yield a new CG configuration; (4) the resulting CG configuration is used as a target to guide the evolution of the FG system; (5) the FG configuration (from step 1) is driven via a nonequilibrium MD (neMD) simulation toward the CG target; (6) the resulting FG configuration at the end of the neMD trajectory is then accepted or rejected according to a Metropolis criterion before returning to step 1. A symmetric two-ends momentum reversal prescription is used for the neMD trajectories of the FG system to guarantee that the CG-guided hybrid neMD-MC algorithm obeys microscopic detailed balance and rigorously yields the equilibrium Boltzmann distribution. The enhanced sampling achieved with the method is illustrated with a model system with hindered diffusion and explicit-solvent peptide simulations. Illustrative tests indicate that the method can yield a speedup of about 80 times for the model system and up to 21 times for polyalanine and (AAQAA)3 in water.
2015-01-01
Molecular dynamics (MD) trajectories based on a classical equation of motion provide a straightforward, albeit somewhat inefficient approach, to explore and sample the configurational space of a complex molecular system. While a broad range of techniques can be used to accelerate and enhance the sampling efficiency of classical simulations, only algorithms that are consistent with the Boltzmann equilibrium distribution yield a proper statistical mechanical computational framework. Here, a multiscale hybrid algorithm relying simultaneously on all-atom fine-grained (FG) and coarse-grained (CG) representations of a system is designed to improve sampling efficiency by combining the strength of nonequilibrium molecular dynamics (neMD) and Metropolis Monte Carlo (MC). This CG-guided hybrid neMD-MC algorithm comprises six steps: (1) a FG configuration of an atomic system is dynamically propagated for some period of time using equilibrium MD; (2) the resulting FG configuration is mapped onto a simplified CG model; (3) the CG model is propagated for a brief time interval to yield a new CG configuration; (4) the resulting CG configuration is used as a target to guide the evolution of the FG system; (5) the FG configuration (from step 1) is driven via a nonequilibrium MD (neMD) simulation toward the CG target; (6) the resulting FG configuration at the end of the neMD trajectory is then accepted or rejected according to a Metropolis criterion before returning to step 1. A symmetric two-ends momentum reversal prescription is used for the neMD trajectories of the FG system to guarantee that the CG-guided hybrid neMD-MC algorithm obeys microscopic detailed balance and rigorously yields the equilibrium Boltzmann distribution. The enhanced sampling achieved with the method is illustrated with a model system with hindered diffusion and explicit-solvent peptide simulations. Illustrative tests indicate that the method can yield a speedup of about 80 times for the model system and up to 21 times for polyalanine and (AAQAA)3 in water. PMID:26574442
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waldmann, I. P., E-mail: ingo@star.ucl.ac.uk
Here, we introduce the RobERt (Robotic Exoplanet Recognition) algorithm for the classification of exoplanetary emission spectra. Spectral retrieval of exoplanetary atmospheres frequently requires the preselection of molecular/atomic opacities to be defined by the user. In the era of open-source, automated, and self-sufficient retrieval algorithms, manual input should be avoided. User dependent input could, in worst-case scenarios, lead to incomplete models and biases in the retrieval. The RobERt algorithm is based on deep-belief neural (DBN) networks trained to accurately recognize molecular signatures for a wide range of planets, atmospheric thermal profiles, and compositions. Reconstructions of the learned features, also referred to as themore » “dreams” of the network, indicate good convergence and an accurate representation of molecular features in the DBN. Using these deep neural networks, we work toward retrieval algorithms that themselves understand the nature of the observed spectra, are able to learn from current and past data, and make sensible qualitative preselections of atmospheric opacities to be used for the quantitative stage of the retrieval process.« less
Li, Yan; Dong, Zigang
2016-06-27
Recently, the Markov state model has been applied for kinetic analysis of molecular dynamics simulations. However, discretization of the conformational space remains a primary challenge in model building, and it is not clear how the space decomposition by distinct clustering strategies exerts influence on the model output. In this work, different clustering algorithms are employed to partition the conformational space sampled in opening and closing of fatty acid binding protein 4 as well as inactivation and activation of the epidermal growth factor receptor. Various classifications are achieved, and Markov models are set up accordingly. On the basis of the models, the total net flux and transition rate are calculated between two distinct states. Our results indicate that geometric and kinetic clustering perform equally well. The construction and outcome of Markov models are heavily dependent on the data traits. Compared to other methods, a combination of Bayesian and hierarchical clustering is feasible in identification of metastable states.
NASA Astrophysics Data System (ADS)
Kum, Oyeon; Dickson, Brad M.; Stuart, Steven J.; Uberuaga, Blas P.; Voter, Arthur F.
2004-11-01
Parallel replica dynamics simulation methods appropriate for the simulation of chemical reactions in molecular systems with many conformational degrees of freedom have been developed and applied to study the microsecond-scale pyrolysis of n-hexadecane in the temperature range of 2100-2500 K. The algorithm uses a transition detection scheme that is based on molecular topology, rather than energetic basins. This algorithm allows efficient parallelization of small systems even when using more processors than particles (in contrast to more traditional parallelization algorithms), and even when there are frequent conformational transitions (in contrast to previous implementations of the parallel replica algorithm). The parallel efficiency for pyrolysis initiation reactions was over 90% on 61 processors for this 50-atom system. The parallel replica dynamics technique results in reaction probabilities that are statistically indistinguishable from those obtained from direct molecular dynamics, under conditions where both are feasible, but allows simulations at temperatures as much as 1000 K lower than direct molecular dynamics simulations. The rate of initiation displayed Arrhenius behavior over the entire temperature range, with an activation energy and frequency factor of Ea=79.7 kcal/mol and log A/s-1=14.8, respectively, in reasonable agreement with experiment and empirical kinetic models. Several interesting unimolecular reaction mechanisms were observed in simulations of the chain propagation reactions above 2000 K, which are not included in most coarse-grained kinetic models. More studies are needed in order to determine whether these mechanisms are experimentally relevant, or specific to the potential energy surface used.
ERIC Educational Resources Information Center
Rodrigues, João P. G. L. M.; Melquiond, Adrien S. J.; Bonvin, Alexandre M. J. J.
2016-01-01
Molecular modelling and simulations are nowadays an integral part of research in areas ranging from physics to chemistry to structural biology, as well as pharmaceutical drug design. This popularity is due to the development of high-performance hardware and of accurate and efficient molecular mechanics algorithms by the scientific community. These…
Alexiadis, Orestis; Daoulas, Kostas Ch; Mavrantzas, Vlasis G
2008-01-31
A new Monte Carlo algorithm is presented for the simulation of atomistically detailed alkanethiol self-assembled monolayers (R-SH) on a Au(111) surface. Built on a set of simpler but also more complex (sometimes nonphysical) moves, the new algorithm is capable of efficiently driving all alkanethiol molecules to the Au(111) surface, thereby leading to full surface coverage, irrespective of the initial setup of the system. This circumvents a significant limitation of previous methods in which the simulations typically started from optimally packed structures on the substrate close to thermal equilibrium. Further, by considering an extended ensemble of configurations each one of which corresponds to a different value of the sulfur-sulfur repulsive core potential, sigmass, and by allowing for configurations to swap between systems characterized by different sigmass values, the new algorithm can adequately simulate model R-SH/Au(111) systems for values of sigmass ranging from 4.25 A corresponding to the Hautman-Klein molecular model (J. Chem. Phys. 1989, 91, 4994; 1990, 93, 7483) to 4.97 A corresponding to the Siepmann-McDonald model (Langmuir 1993, 9, 2351), and practically any chain length. Detailed results are presented quantifying the efficiency and robustness of the new method. Representative simulation data for the dependence of the structural and conformational properties of the formed monolayer on the details of the employed molecular model are reported and discussed; an investigation of the variation of molecular organization and ordering on the Au(111) substrate for three CH3-(CH2)n-SH/Au(111) systems with n=9, 15, and 21 is also included.
GPU-Accelerated Molecular Modeling Coming Of Age
Stone, John E.; Hardy, David J.; Ufimtsev, Ivan S.
2010-01-01
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks. PMID:20675161
GPU-accelerated molecular modeling coming of age.
Stone, John E; Hardy, David J; Ufimtsev, Ivan S; Schulten, Klaus
2010-09-01
Graphics processing units (GPUs) have traditionally been used in molecular modeling solely for visualization of molecular structures and animation of trajectories resulting from molecular dynamics simulations. Modern GPUs have evolved into fully programmable, massively parallel co-processors that can now be exploited to accelerate many scientific computations, typically providing about one order of magnitude speedup over CPU code and in special cases providing speedups of two orders of magnitude. This paper surveys the development of molecular modeling algorithms that leverage GPU computing, the advances already made and remaining issues to be resolved, and the continuing evolution of GPU technology that promises to become even more useful to molecular modeling. Hardware acceleration with commodity GPUs is expected to benefit the overall computational biology community by bringing teraflops performance to desktop workstations and in some cases potentially changing what were formerly batch-mode computational jobs into interactive tasks. (c) 2010 Elsevier Inc. All rights reserved.
Karayiannis, Nikos Ch.; Kröger, Martin
2009-01-01
We review the methodology, algorithmic implementation and performance characteristics of a hierarchical modeling scheme for the generation, equilibration and topological analysis of polymer systems at various levels of molecular description: from atomistic polyethylene samples to random packings of freely-jointed chains of tangent hard spheres of uniform size. Our analysis focuses on hitherto less discussed algorithmic details of the implementation of both, the Monte Carlo (MC) procedure for the system generation and equilibration, and a postprocessing step, where we identify the underlying topological structure of the simulated systems in the form of primitive paths. In order to demonstrate our arguments, we study how molecular length and packing density (volume fraction) affect the performance of the MC scheme built around chain-connectivity altering moves. In parallel, we quantify the effect of finite system size, of polydispersity, and of the definition of the number of entanglements (and related entanglement molecular weight) on the results about the primitive path network. Along these lines we approve main concepts which had been previously proposed in the literature. PMID:20087477
SHOCKFIND - an algorithm to identify magnetohydrodynamic shock waves in turbulent clouds
NASA Astrophysics Data System (ADS)
Lehmann, Andrew; Federrath, Christoph; Wardle, Mark
2016-11-01
The formation of stars occurs in the dense molecular cloud phase of the interstellar medium. Observations and numerical simulations of molecular clouds have shown that supersonic magnetized turbulence plays a key role for the formation of stars. Simulations have also shown that a large fraction of the turbulent energy dissipates in shock waves. The three families of MHD shocks - fast, intermediate and slow - distinctly compress and heat up the molecular gas, and so provide an important probe of the physical conditions within a turbulent cloud. Here, we introduce the publicly available algorithm, SHOCKFIND, to extract and characterize the mixture of shock families in MHD turbulence. The algorithm is applied to a three-dimensional simulation of a magnetized turbulent molecular cloud, and we find that both fast and slow MHD shocks are present in the simulation. We give the first prediction of the mixture of turbulence-driven MHD shock families in this molecular cloud, and present their distinct distributions of sonic and Alfvénic Mach numbers. Using subgrid one-dimensional models of MHD shocks we estimate that ˜0.03 per cent of the volume of a typical molecular cloud in the Milky Way will be shock heated above 50 K, at any time during the lifetime of the cloud. We discuss the impact of this shock heating on the dynamical evolution of molecular clouds.
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osei-Kuffuor, Daniel; Fattebert, Jean-Luc
2014-01-01
Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N 3) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix,more » based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.« less
Jeong, Hyundoo; Yoon, Byung-Jun
2017-03-14
Network querying algorithms provide computational means to identify conserved network modules in large-scale biological networks that are similar to known functional modules, such as pathways or molecular complexes. Two main challenges for network querying algorithms are the high computational complexity of detecting potential isomorphism between the query and the target graphs and ensuring the biological significance of the query results. In this paper, we propose SEQUOIA, a novel network querying algorithm that effectively addresses these issues by utilizing a context-sensitive random walk (CSRW) model for network comparison and minimizing the network conductance of potential matches in the target network. The CSRW model, inspired by the pair hidden Markov model (pair-HMM) that has been widely used for sequence comparison and alignment, can accurately assess the node-to-node correspondence between different graphs by accounting for node insertions and deletions. The proposed algorithm identifies high-scoring network regions based on the CSRW scores, which are subsequently extended by maximally reducing the network conductance of the identified subnetworks. Performance assessment based on real PPI networks and known molecular complexes show that SEQUOIA outperforms existing methods and clearly enhances the biological significance of the query results. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/SEQUOIA .
Multipole Algorithms for Molecular Dynamics Simulation on High Performance Computers.
NASA Astrophysics Data System (ADS)
Elliott, William Dewey
1995-01-01
A fundamental problem in modeling large molecular systems with molecular dynamics (MD) simulations is the underlying N-body problem of computing the interactions between all pairs of N atoms. The simplest algorithm to compute pair-wise atomic interactions scales in runtime {cal O}(N^2), making it impractical for interesting biomolecular systems, which can contain millions of atoms. Recently, several algorithms have become available that solve the N-body problem by computing the effects of all pair-wise interactions while scaling in runtime less than {cal O}(N^2). One algorithm, which scales {cal O}(N) for a uniform distribution of particles, is called the Greengard-Rokhlin Fast Multipole Algorithm (FMA). This work describes an FMA-like algorithm called the Molecular Dynamics Multipole Algorithm (MDMA). The algorithm contains several features that are new to N-body algorithms. MDMA uses new, efficient series expansion equations to compute general 1/r^{n } potentials to arbitrary accuracy. In particular, the 1/r Coulomb potential and the 1/r^6 portion of the Lennard-Jones potential are implemented. The new equations are based on multivariate Taylor series expansions. In addition, MDMA uses a cell-to-cell interaction region of cells that is closely tied to worst case error bounds. The worst case error bounds for MDMA are derived in this work also. These bounds apply to other multipole algorithms as well. Several implementation enhancements are described which apply to MDMA as well as other N-body algorithms such as FMA and tree codes. The mathematics of the cell -to-cell interactions are converted to the Fourier domain for reduced operation count and faster computation. A relative indexing scheme was devised to locate cells in the interaction region which allows efficient pre-computation of redundant information and prestorage of much of the cell-to-cell interaction. Also, MDMA was integrated into the MD program SIgMA to demonstrate the performance of the program over several simulation timesteps. One MD application described here highlights the utility of including long range contributions to Lennard-Jones potential in constant pressure simulations. Another application shows the time dependence of long range forces in a multiple time step MD simulation.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
NASA Astrophysics Data System (ADS)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten
2017-11-01
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
Bindewald, Eckart; Grunewald, Calvin; Boyle, Brett; O'Connor, Mary; Shapiro, Bruce A
2008-10-01
One approach to designing RNA nanoscale structures is to use known RNA structural motifs such as junctions, kissing loops or bulges and to construct a molecular model by connecting these building blocks with helical struts. We previously developed an algorithm for detecting internal loops, junctions and kissing loops in RNA structures. Here we present algorithms for automating or assisting many of the steps that are involved in creating RNA structures from building blocks: (1) assembling building blocks into nanostructures using either a combinatorial search or constraint satisfaction; (2) optimizing RNA 3D ring structures to improve ring closure; (3) sequence optimisation; (4) creating a unique non-degenerate RNA topology descriptor. This effectively creates a computational pipeline for generating molecular models of RNA nanostructures and more specifically RNA ring structures with optimized sequences from RNA building blocks. We show several examples of how the algorithms can be utilized to generate RNA tecto-shapes.
Bindewald, Eckart; Grunewald, Calvin; Boyle, Brett; O’Connor, Mary; Shapiro, Bruce A.
2013-01-01
One approach to designing RNA nanoscale structures is to use known RNA structural motifs such as junctions, kissing loops or bulges and to construct a molecular model by connecting these building blocks with helical struts. We previously developed an algorithm for detecting internal loops, junctions and kissing loops in RNA structures. Here we present algorithms for automating or assisting many of the steps that are involved in creating RNA structures from building blocks: (1) assembling building blocks into nanostructures using either a combinatorial search or constraint satisfaction; (2) optimizing RNA 3D ring structures to improve ring closure; (3) sequence optimisation; (4) creating a unique non-degenerate RNA topology descriptor. This effectively creates a computational pipeline for generating molecular models of RNA nanostructures and more specifically RNA ring structures with optimized sequences from RNA building blocks. We show several examples of how the algorithms can be utilized to generate RNA tecto-shapes. PMID:18838281
Inferring phenomenological models of Markov processes from data
NASA Astrophysics Data System (ADS)
Rivera, Catalina; Nemenman, Ilya
Microscopically accurate modeling of stochastic dynamics of biochemical networks is hard due to the extremely high dimensionality of the state space of such networks. Here we propose an algorithm for inference of phenomenological, coarse-grained models of Markov processes describing the network dynamics directly from data, without the intermediate step of microscopically accurate modeling. The approach relies on the linear nature of the Chemical Master Equation and uses Bayesian Model Selection for identification of parsimonious models that fit the data. When applied to synthetic data from the Kinetic Proofreading process (KPR), a common mechanism used by cells for increasing specificity of molecular assembly, the algorithm successfully uncovers the known coarse-grained description of the process. This phenomenological description has been notice previously, but this time it is derived in an automated manner by the algorithm. James S. McDonnell Foundation Grant No. 220020321.
Ni, Jingchao; Koyuturk, Mehmet; Tong, Hanghang; Haines, Jonathan; Xu, Rong; Zhang, Xiang
2016-11-10
Accurately prioritizing candidate disease genes is an important and challenging problem. Various network-based methods have been developed to predict potential disease genes by utilizing the disease similarity network and molecular networks such as protein interaction or gene co-expression networks. Although successful, a common limitation of the existing methods is that they assume all diseases share the same molecular network and a single generic molecular network is used to predict candidate genes for all diseases. However, different diseases tend to manifest in different tissues, and the molecular networks in different tissues are usually different. An ideal method should be able to incorporate tissue-specific molecular networks for different diseases. In this paper, we develop a robust and flexible method to integrate tissue-specific molecular networks for disease gene prioritization. Our method allows each disease to have its own tissue-specific network(s). We formulate the problem of candidate gene prioritization as an optimization problem based on network propagation. When there are multiple tissue-specific networks available for a disease, our method can automatically infer the relative importance of each tissue-specific network. Thus it is robust to the noisy and incomplete network data. To solve the optimization problem, we develop fast algorithms which have linear time complexities in the number of nodes in the molecular networks. We also provide rigorous theoretical foundations for our algorithms in terms of their optimality and convergence properties. Extensive experimental results show that our method can significantly improve the accuracy of candidate gene prioritization compared with the state-of-the-art methods. In our experiments, we compare our methods with 7 popular network-based disease gene prioritization algorithms on diseases from Online Mendelian Inheritance in Man (OMIM) database. The experimental results demonstrate that our methods recover true associations more accurately than other methods in terms of AUC values, and the performance differences are significant (with paired t-test p-values less than 0.05). This validates the importance to integrate tissue-specific molecular networks for studying disease gene prioritization and show the superiority of our network models and ranking algorithms toward this purpose. The source code and datasets are available at http://nijingchao.github.io/CRstar/ .
Treatment Algorithms Based on Tumor Molecular Profiling: The Essence of Precision Medicine Trials.
Le Tourneau, Christophe; Kamal, Maud; Tsimberidou, Apostolia-Maria; Bedard, Philippe; Pierron, Gaëlle; Callens, Céline; Rouleau, Etienne; Vincent-Salomon, Anne; Servant, Nicolas; Alt, Marie; Rouzier, Roman; Paoletti, Xavier; Delattre, Olivier; Bièche, Ivan
2016-04-01
With the advent of high-throughput molecular technologies, several precision medicine (PM) studies are currently ongoing that include molecular screening programs and PM clinical trials. Molecular profiling programs establish the molecular profile of patients' tumors with the aim to guide therapy based on identified molecular alterations. The aim of prospective PM clinical trials is to assess the clinical utility of tumor molecular profiling and to determine whether treatment selection based on molecular alterations produces superior outcomes compared with unselected treatment. These trials use treatment algorithms to assign patients to specific targeted therapies based on tumor molecular alterations. These algorithms should be governed by fixed rules to ensure standardization and reproducibility. Here, we summarize key molecular, biological, and technical criteria that, in our view, should be addressed when establishing treatment algorithms based on tumor molecular profiling for PM trials. © The Author 2015. Published by Oxford University Press.
Applications of modern statistical methods to analysis of data in physical science
NASA Astrophysics Data System (ADS)
Wicker, James Eric
Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Parallel, stochastic measurement of molecular surface area.
Juba, Derek; Varshney, Amitabh
2008-08-01
Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
Diffusion-Based Model for Synaptic Molecular Communication Channel.
Khan, Tooba; Bilgin, Bilgesu A; Akan, Ozgur B
2017-06-01
Computational methods have been extensively used to understand the underlying dynamics of molecular communication methods employed by nature. One very effective and popular approach is to utilize a Monte Carlo simulation. Although it is very reliable, this method can have a very high computational cost, which in some cases renders the simulation impractical. Therefore, in this paper, for the special case of an excitatory synaptic molecular communication channel, we present a novel mathematical model for the diffusion and binding of neurotransmitters that takes into account the effects of synaptic geometry in 3-D space and re-absorption of neurotransmitters by the transmitting neuron. Based on this model we develop a fast deterministic algorithm, which calculates expected value of the output of this channel, namely, the amplitude of excitatory postsynaptic potential (EPSP), for given synaptic parameters. We validate our algorithm by a Monte Carlo simulation, which shows total agreement between the results of the two methods. Finally, we utilize our model to quantify the effects of variation in synaptic parameters, such as position of release site, receptor density, size of postsynaptic density, diffusion coefficient, uptake probability, and number of neurotransmitters in a vesicle, on maximum number of bound receptors that directly affect the peak amplitude of EPSP.
Computationally Efficient Multiconfigurational Reactive Molecular Dynamics
Yamashita, Takefumi; Peng, Yuxing; Knight, Chris; Voth, Gregory A.
2012-01-01
It is a computationally demanding task to explicitly simulate the electronic degrees of freedom in a system to observe the chemical transformations of interest, while at the same time sampling the time and length scales required to converge statistical properties and thus reduce artifacts due to initial conditions, finite-size effects, and limited sampling. One solution that significantly reduces the computational expense consists of molecular models in which effective interactions between particles govern the dynamics of the system. If the interaction potentials in these models are developed to reproduce calculated properties from electronic structure calculations and/or ab initio molecular dynamics simulations, then one can calculate accurate properties at a fraction of the computational cost. Multiconfigurational algorithms model the system as a linear combination of several chemical bonding topologies to simulate chemical reactions, also sometimes referred to as “multistate”. These algorithms typically utilize energy and force calculations already found in popular molecular dynamics software packages, thus facilitating their implementation without significant changes to the structure of the code. However, the evaluation of energies and forces for several bonding topologies per simulation step can lead to poor computational efficiency if redundancy is not efficiently removed, particularly with respect to the calculation of long-ranged Coulombic interactions. This paper presents accurate approximations (effective long-range interaction and resulting hybrid methods) and multiple-program parallelization strategies for the efficient calculation of electrostatic interactions in reactive molecular simulations. PMID:25100924
Protein Structure Prediction with Evolutionary Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hart, W.E.; Krasnogor, N.; Pelta, D.A.
1999-02-08
Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.
Multiscale equation-free algorithms for molecular dynamics
NASA Astrophysics Data System (ADS)
Abi Mansour, Andrew
Molecular dynamics is a physics-based computational tool that has been widely employed to study the dynamics and structure of macromolecules and their assemblies at the atomic scale. However, the efficiency of molecular dynamics simulation is limited because of the broad spectrum of timescales involved. To overcome this limitation, an equation-free algorithm is presented for simulating these systems using a multiscale model cast in terms of atomistic and coarse-grained variables. Both variables are evolved in time in such a way that the cross-talk between short and long scales is preserved. In this way, the coarse-grained variables guide the evolution of the atom-resolved states, while the latter provide the Newtonian physics for the former. While the atomistic variables are evolved using short molecular dynamics runs, time advancement at the coarse-grained level is achieved with a scheme that uses information from past and future states of the system while accounting for both the stochastic and deterministic features of the coarse-grained dynamics. To complete the multiscale cycle, an atom-resolved state consistent with the updated coarse-grained variables is recovered using algorithms from mathematical optimization. This multiscale paradigm is extended to nanofluidics using concepts from hydrodynamics, and it is demonstrated for macromolecular and nanofluidic systems. A toolkit is developed for prototyping these algorithms, which are then implemented within the GROMACS simulation package and released as an open source multiscale simulator.
NASA Astrophysics Data System (ADS)
Lilichenko, Mark; Kelley, Anne Myers
2001-04-01
A novel approach is presented for finding the vibrational frequencies, Franck-Condon factors, and vibronic linewidths that best reproduce typical, poorly resolved electronic absorption (or fluorescence) spectra of molecules in condensed phases. While calculation of the theoretical spectrum from the molecular parameters is straightforward within the harmonic oscillator approximation for the vibrations, "inversion" of an experimental spectrum to deduce these parameters is not. Standard nonlinear least-squares fitting methods such as Levenberg-Marquardt are highly susceptible to becoming trapped in local minima in the error function unless very good initial guesses for the molecular parameters are made. Here we employ a genetic algorithm to force a broad search through parameter space and couple it with the Levenberg-Marquardt method to speed convergence to each local minimum. In addition, a neural network trained on a large set of synthetic spectra is used to provide an initial guess for the fitting parameters and to narrow the range searched by the genetic algorithm. The combined algorithm provides excellent fits to a variety of single-mode absorption spectra with experimentally negligible errors in the parameters. It converges more rapidly than the genetic algorithm alone and more reliably than the Levenberg-Marquardt method alone, and is robust in the presence of spectral noise. Extensions to multimode systems, and/or to include other spectroscopic data such as resonance Raman intensities, are straightforward.
Wang, Zhaocai; Ji, Zuwen; Wang, Xiaoming; Wu, Tunhua; Huang, Wei
2017-12-01
As a promising approach to solve the computationally intractable problem, the method based on DNA computing is an emerging research area including mathematics, computer science and molecular biology. The task scheduling problem, as a well-known NP-complete problem, arranges n jobs to m individuals and finds the minimum execution time of last finished individual. In this paper, we use a biologically inspired computational model and describe a new parallel algorithm to solve the task scheduling problem by basic DNA molecular operations. In turn, we skillfully design flexible length DNA strands to represent elements of the allocation matrix, take appropriate biological experiment operations and get solutions of the task scheduling problem in proper length range with less than O(n 2 ) time complexity. Copyright © 2017. Published by Elsevier B.V.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...
2017-11-27
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
The contour-buildup algorithm to calculate the analytical molecular surface.
Totrov, M; Abagyan, R
1996-01-01
A new algorithm is presented to calculate the analytical molecular surface defined as a smooth envelope traced out by the surface of a probe sphere rolled over the molecule. The core of the algorithm is the sequential build up of multi-arc contours on the van der Waals spheres. This algorithm yields substantial reduction in both memory and time requirements of surface calculations. Further, the contour-buildup principle is intrinsically "local", which makes calculations of the partial molecular surfaces even more efficient. Additionally, the algorithm is equally applicable not only to convex patches, but also to concave triangular patches which may have complex multiple intersections. The algorithm permits the rigorous calculation of the full analytical molecular surface for a 100-residue protein in about 2 seconds on an SGI indigo with R4400++ processor at 150 Mhz, with the performance scaling almost linearly with the protein size. The contour-buildup algorithm is faster than the original Connolly algorithm an order of magnitude.
Graph-drawing algorithms geometries versus molecular mechanics in fullereness
NASA Astrophysics Data System (ADS)
Kaufman, M.; Pisanski, T.; Lukman, D.; Borštnik, B.; Graovac, A.
1996-09-01
The algorithms of Kamada-Kawai (KK) and Fruchterman-Reingold (FR) have been recently generalized (Pisanski et al., Croat. Chem. Acta 68 (1995) 283) in order to draw molecular graphs in three-dimensional space. The quality of KK and FR geometries is studied here by comparing them with the molecular mechanics (MM) and the adjacency matrix eigenvectors (AME) algorithm geometries. In order to compare different layouts of the same molecule, an appropriate method has been developed. Its application to a series of experimentally detected fullerenes indicates that the KK, FR and AME algorithms are able to reproduce plausible molecular geometries.
Hou, Tingjun; Xu, Xiaojie
2002-12-01
In this study, the relationships between the brain-blood concentration ratio of 96 structurally diverse compounds with a large number of structurally derived descriptors were investigated. The linear models were based on molecular descriptors that can be calculated for any compound simply from a knowledge of its molecular structure. The linear correlation coefficients of the models were optimized by genetic algorithms (GAs), and the descriptors used in the linear models were automatically selected from 27 structurally derived descriptors. The GA optimizations resulted in a group of linear models with three or four molecular descriptors with good statistical significance. The change of descriptor use as the evolution proceeds demonstrates that the octane/water partition coefficient and the partial negative solvent-accessible surface area multiplied by the negative charge are crucial to brain-blood barrier permeability. Moreover, we found that the predictions using multiple QSPR models from GA optimization gave quite good results in spite of the diversity of structures, which was better than the predictions using the best single model. The predictions for the two external sets with 37 diverse compounds using multiple QSPR models indicate that the best linear models with four descriptors are sufficiently effective for predictive use. Considering the ease of computation of the descriptors, the linear models may be used as general utilities to screen the blood-brain barrier partitioning of drugs in a high-throughput fashion.
NASA Astrophysics Data System (ADS)
Barnes, Brian C.; Leiter, Kenneth W.; Becker, Richard; Knap, Jaroslaw; Brennan, John K.
2017-07-01
We describe the development, accuracy, and efficiency of an automation package for molecular simulation, the large-scale atomic/molecular massively parallel simulator (LAMMPS) integrated materials engine (LIME). Heuristics and algorithms employed for equation of state (EOS) calculation using a particle-based model of a molecular crystal, hexahydro-1,3,5-trinitro-s-triazine (RDX), are described in detail. The simulation method for the particle-based model is energy-conserving dissipative particle dynamics, but the techniques used in LIME are generally applicable to molecular dynamics simulations with a variety of particle-based models. The newly created tool set is tested through use of its EOS data in plate impact and Taylor anvil impact continuum simulations of solid RDX. The coarse-grain model results from LIME provide an approach to bridge the scales from atomistic simulations to continuum simulations.
ChemTS: an efficient python library for de novo molecular generation.
Yang, Xiufeng; Zhang, Jinzhe; Yoshizoe, Kazuki; Terayama, Kei; Tsuda, Koji
2017-01-01
Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.
ChemTS: an efficient python library for de novo molecular generation
NASA Astrophysics Data System (ADS)
Yang, Xiufeng; Zhang, Jinzhe; Yoshizoe, Kazuki; Terayama, Kei; Tsuda, Koji
2017-12-01
Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.
Mori, Takaharu; Miyashita, Naoyuki; Im, Wonpil; Feig, Michael; Sugita, Yuji
2016-07-01
This paper reviews various enhanced conformational sampling methods and explicit/implicit solvent/membrane models, as well as their recent applications to the exploration of the structure and dynamics of membranes and membrane proteins. Molecular dynamics simulations have become an essential tool to investigate biological problems, and their success relies on proper molecular models together with efficient conformational sampling methods. The implicit representation of solvent/membrane environments is reasonable approximation to the explicit all-atom models, considering the balance between computational cost and simulation accuracy. Implicit models can be easily combined with replica-exchange molecular dynamics methods to explore a wider conformational space of a protein. Other molecular models and enhanced conformational sampling methods are also briefly discussed. As application examples, we introduce recent simulation studies of glycophorin A, phospholamban, amyloid precursor protein, and mixed lipid bilayers and discuss the accuracy and efficiency of each simulation model and method. This article is part of a Special Issue entitled: Membrane Proteins edited by J.C. Gumbart and Sergei Noskov. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
In vitro molecular machine learning algorithm via symmetric internal loops of DNA.
Lee, Ji-Hoon; Lee, Seung Hwan; Baek, Christina; Chun, Hyosun; Ryu, Je-Hwan; Kim, Jin-Woo; Deaton, Russell; Zhang, Byoung-Tak
2017-08-01
Programmable biomolecules, such as DNA strands, deoxyribozymes, and restriction enzymes, have been used to solve computational problems, construct large-scale logic circuits, and program simple molecular games. Although studies have shown the potential of molecular computing, the capability of computational learning with DNA molecules, i.e., molecular machine learning, has yet to be experimentally verified. Here, we present a novel molecular learning in vitro model in which symmetric internal loops of double-stranded DNA are exploited to measure the differences between training instances, thus enabling the molecules to learn from small errors. The model was evaluated on a data set of twenty dialogue sentences obtained from the television shows Friends and Prison Break. The wet DNA-computing experiments confirmed that the molecular learning machine was able to generalize the dialogue patterns of each show and successfully identify the show from which the sentences originated. The molecular machine learning model described here opens the way for solving machine learning problems in computer science and biology using in vitro molecular computing with the data encoded in DNA molecules. Copyright © 2017. Published by Elsevier B.V.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.
Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto
2012-11-21
This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data
2012-01-01
Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000
Physics Computing '92: Proceedings of the 4th International Conference
NASA Astrophysics Data System (ADS)
de Groot, Robert A.; Nadrchal, Jaroslav
1993-04-01
The Table of Contents for the book is as follows: * Preface * INVITED PAPERS * Ab Initio Theoretical Approaches to the Structural, Electronic and Vibrational Properties of Small Clusters and Fullerenes: The State of the Art * Neural Multigrid Methods for Gauge Theories and Other Disordered Systems * Multicanonical Monte Carlo Simulations * On the Use of the Symbolic Language Maple in Physics and Chemistry: Several Examples * Nonequilibrium Phase Transitions in Catalysis and Population Models * Computer Algebra, Symmetry Analysis and Integrability of Nonlinear Evolution Equations * The Path-Integral Quantum Simulation of Hydrogen in Metals * Digital Optical Computing: A New Approach of Systolic Arrays Based on Coherence Modulation of Light and Integrated Optics Technology * Molecular Dynamics Simulations of Granular Materials * Numerical Implementation of a K.A.M. Algorithm * Quasi-Monte Carlo, Quasi-Random Numbers and Quasi-Error Estimates * What Can We Learn from QMC Simulations * Physics of Fluctuating Membranes * Plato, Apollonius, and Klein: Playing with Spheres * Steady States in Nonequilibrium Lattice Systems * CONVODE: A REDUCE Package for Differential Equations * Chaos in Coupled Rotators * Symplectic Numerical Methods for Hamiltonian Problems * Computer Simulations of Surfactant Self Assembly * High-dimensional and Very Large Cellular Automata for Immunological Shape Space * A Review of the Lattice Boltzmann Method * Electronic Structure of Solids in the Self-interaction Corrected Local-spin-density Approximation * Dedicated Computers for Lattice Gauge Theory Simulations * Physics Education: A Survey of Problems and Possible Solutions * Parallel Computing and Electronic-Structure Theory * High Precision Simulation Techniques for Lattice Field Theory * CONTRIBUTED PAPERS * Case Study of Microscale Hydrodynamics Using Molecular Dynamics and Lattice Gas Methods * Computer Modelling of the Structural and Electronic Properties of the Supported Metal Catalysis * Ordered Particle Simulations for Serial and MIMD Parallel Computers * "NOLP" -- Program Package for Laser Plasma Nonlinear Optics * Algorithms to Solve Nonlinear Least Square Problems * Distribution of Hydrogen Atoms in Pd-H Computed by Molecular Dynamics * A Ray Tracing of Optical System for Protein Crystallography Beamline at Storage Ring-SIBERIA-2 * Vibrational Properties of a Pseudobinary Linear Chain with Correlated Substitutional Disorder * Application of the Software Package Mathematica in Generalized Master Equation Method * Linelist: An Interactive Program for Analysing Beam-foil Spectra * GROMACS: A Parallel Computer for Molecular Dynamics Simulations * GROMACS Method of Virial Calculation Using a Single Sum * The Interactive Program for the Solution of the Laplace Equation with the Elimination of Singularities for Boundary Functions * Random-Number Generators: Testing Procedures and Comparison of RNG Algorithms * Micro-TOPIC: A Tokamak Plasma Impurities Code * Rotational Molecular Scattering Calculations * Orthonormal Polynomial Method for Calibrating of Cryogenic Temperature Sensors * Frame-based System Representing Basis of Physics * The Role of Massively Data-parallel Computers in Large Scale Molecular Dynamics Simulations * Short-range Molecular Dynamics on a Network of Processors and Workstations * An Algorithm for Higher-order Perturbation Theory in Radiative Transfer Computations * Hydrostochastics: The Master Equation Formulation of Fluid Dynamics * HPP Lattice Gas on Transputers and Networked Workstations * Study on the Hysteresis Cycle Simulation Using Modeling with Different Functions on Intervals * Refined Pruning Techniques for Feed-forward Neural Networks * Random Walk Simulation of the Motion of Transient Charges in Photoconductors * The Optical Hysteresis in Hydrogenated Amorphous Silicon * Diffusion Monte Carlo Analysis of Modern Interatomic Potentials for He * A Parallel Strategy for Molecular Dynamics Simulations of Polar Liquids on Transputer Arrays * Distribution of Ions Reflected on Rough Surfaces * The Study of Step Density Distribution During Molecular Beam Epitaxy Growth: Monte Carlo Computer Simulation * Towards a Formal Approach to the Construction of Large-scale Scientific Applications Software * Correlated Random Walk and Discrete Modelling of Propagation through Inhomogeneous Media * Teaching Plasma Physics Simulation * A Theoretical Determination of the Au-Ni Phase Diagram * Boson and Fermion Kinetics in One-dimensional Lattices * Computational Physics Course on the Technical University * Symbolic Computations in Simulation Code Development and Femtosecond-pulse Laser-plasma Interaction Studies * Computer Algebra and Integrated Computing Systems in Education of Physical Sciences * Coordinated System of Programs for Undergraduate Physics Instruction * Program Package MIRIAM and Atomic Physics of Extreme Systems * High Energy Physics Simulation on the T_Node * The Chapman-Kolmogorov Equation as Representation of Huygens' Principle and the Monolithic Self-consistent Numerical Modelling of Lasers * Authoring System for Simulation Developments * Molecular Dynamics Study of Ion Charge Effects in the Structure of Ionic Crystals * A Computational Physics Introductory Course * Computer Calculation of Substrate Temperature Field in MBE System * Multimagnetical Simulation of the Ising Model in Two and Three Dimensions * Failure of the CTRW Treatment of the Quasicoherent Excitation Transfer * Implementation of a Parallel Conjugate Gradient Method for Simulation of Elastic Light Scattering * Algorithms for Study of Thin Film Growth * Algorithms and Programs for Physics Teaching in Romanian Technical Universities * Multicanonical Simulation of 1st order Transitions: Interface Tension of the 2D 7-State Potts Model * Two Numerical Methods for the Calculation of Periodic Orbits in Hamiltonian Systems * Chaotic Behavior in a Probabilistic Cellular Automata? * Wave Optics Computing by a Networked-based Vector Wave Automaton * Tensor Manipulation Package in REDUCE * Propagation of Electromagnetic Pulses in Stratified Media * The Simple Molecular Dynamics Model for the Study of Thermalization of the Hot Nucleon Gas * Electron Spin Polarization in PdCo Alloys Calculated by KKR-CPA-LSD Method * Simulation Studies of Microscopic Droplet Spreading * A Vectorizable Algorithm for the Multicolor Successive Overrelaxation Method * Tetragonality of the CuAu I Lattice and Its Relation to Electronic Specific Heat and Spin Susceptibility * Computer Simulation of the Formation of Metallic Aggregates Produced by Chemical Reactions in Aqueous Solution * Scaling in Growth Models with Diffusion: A Monte Carlo Study * The Nucleus as the Mesoscopic System * Neural Network Computation as Dynamic System Simulation * First-principles Theory of Surface Segregation in Binary Alloys * Data Smooth Approximation Algorithm for Estimating the Temperature Dependence of the Ice Nucleation Rate * Genetic Algorithms in Optical Design * Application of 2D-FFT in the Study of Molecular Exchange Processes by NMR * Advanced Mobility Model for Electron Transport in P-Si Inversion Layers * Computer Simulation for Film Surfaces and its Fractal Dimension * Parallel Computation Techniques and the Structure of Catalyst Surfaces * Educational SW to Teach Digital Electronics and the Corresponding Text Book * Primitive Trinomials (Mod 2) Whose Degree is a Mersenne Exponent * Stochastic Modelisation and Parallel Computing * Remarks on the Hybrid Monte Carlo Algorithm for the ∫4 Model * An Experimental Computer Assisted Workbench for Physics Teaching * A Fully Implicit Code to Model Tokamak Plasma Edge Transport * EXPFIT: An Interactive Program for Automatic Beam-foil Decay Curve Analysis * Mapping Technique for Solving General, 1-D Hamiltonian Systems * Freeway Traffic, Cellular Automata, and Some (Self-Organizing) Criticality * Photonuclear Yield Analysis by Dynamic Programming * Incremental Representation of the Simply Connected Planar Curves * Self-convergence in Monte Carlo Methods * Adaptive Mesh Technique for Shock Wave Propagation * Simulation of Supersonic Coronal Streams and Their Interaction with the Solar Wind * The Nature of Chaos in Two Systems of Ordinary Nonlinear Differential Equations * Considerations of a Window-shopper * Interpretation of Data Obtained by RTP 4-Channel Pulsed Radar Reflectometer Using a Multi Layer Perceptron * Statistics of Lattice Bosons for Finite Systems * Fractal Based Image Compression with Affine Transformations * Algorithmic Studies on Simulation Codes for Heavy-ion Reactions * An Energy-Wise Computer Simulation of DNA-Ion-Water Interactions Explains the Abnormal Structure of Poly[d(A)]:Poly[d(T)] * Computer Simulation Study of Kosterlitz-Thouless-Like Transitions * Problem-oriented Software Package GUN-EBT for Computer Simulation of Beam Formation and Transport in Technological Electron-Optical Systems * Parallelization of a Boundary Value Solver and its Application in Nonlinear Dynamics * The Symbolic Classification of Real Four-dimensional Lie Algebras * Short, Singular Pulses Generation by a Dye Laser at Two Wavelengths Simultaneously * Quantum Monte Carlo Simulations of the Apex-Oxygen-Model * Approximation Procedures for the Axial Symmetric Static Einstein-Maxwell-Higgs Theory * Crystallization on a Sphere: Parallel Simulation on a Transputer Network * FAMULUS: A Software Product (also) for Physics Education * MathCAD vs. FAMULUS -- A Brief Comparison * First-principles Dynamics Used to Study Dissociative Chemisorption * A Computer Controlled System for Crystal Growth from Melt * A Time Resolved Spectroscopic Method for Short Pulsed Particle Emission * Green's Function Computation in Radiative Transfer Theory * Random Search Optimization Technique for One-criteria and Multi-criteria Problems * Hartley Transform Applications to Thermal Drift Elimination in Scanning Tunneling Microscopy * Algorithms of Measuring, Processing and Interpretation of Experimental Data Obtained with Scanning Tunneling Microscope * Time-dependent Atom-surface Interactions * Local and Global Minima on Molecular Potential Energy Surfaces: An Example of N3 Radical * Computation of Bifurcation Surfaces * Symbolic Computations in Quantum Mechanics: Energies in Next-to-solvable Systems * A Tool for RTP Reactor and Lamp Field Design * Modelling of Particle Spectra for the Analysis of Solid State Surface * List of Participants
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures
Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad
2015-01-01
We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411
TADSim: Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics
Mniszewski, Susan M.; Junghans, Christoph; Voter, Arthur F.; ...
2015-04-16
Next-generation high-performance computing will require more scalable and flexible performance prediction tools to evaluate software--hardware co-design choices relevant to scientific applications and hardware architectures. Here, we present a new class of tools called application simulators—parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation. Parameterized choices for the algorithmic method and hardware options provide a rich space for design exploration and allow us to quickly find well-performing software--hardware combinations. We demonstrate our approach with a TADSim simulator that models the temperature-accelerated dynamics (TAD) method, an algorithmically complex and parameter-rich member of the accelerated molecular dynamics (AMD) family ofmore » molecular dynamics methods. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We accomplish this by identifying the time-intensive elements, quantifying algorithm steps in terms of those elements, abstracting them out, and replacing them by the passage of time. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We extend TADSim to model algorithm extensions, such as speculative spawning of the compute-bound stages, and predict performance improvements without having to implement such a method. Validation against the actual TAD code shows close agreement for the evolution of an example physical system, a silver surface. Finally, focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights and suggested extensions.« less
Clustering molecular dynamics trajectories for optimizing docking experiments.
De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D; Norberto de Souza, Osmar; Barros, Rodrigo C
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Molecular Isotopic Distribution Analysis (MIDAs) with Adjustable Mass Accuracy
NASA Astrophysics Data System (ADS)
Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo
2014-01-01
In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy.
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo
2014-01-01
In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Binomial tau-leap spatial stochastic simulation algorithm for applications in chemical kinetics.
Marquez-Lago, Tatiana T; Burrage, Kevin
2007-09-14
In cell biology, cell signaling pathway problems are often tackled with deterministic temporal models, well mixed stochastic simulators, and/or hybrid methods. But, in fact, three dimensional stochastic spatial modeling of reactions happening inside the cell is needed in order to fully understand these cell signaling pathways. This is because noise effects, low molecular concentrations, and spatial heterogeneity can all affect the cellular dynamics. However, there are ways in which important effects can be accounted without going to the extent of using highly resolved spatial simulators (such as single-particle software), hence reducing the overall computation time significantly. We present a new coarse grained modified version of the next subvolume method that allows the user to consider both diffusion and reaction events in relatively long simulation time spans as compared with the original method and other commonly used fully stochastic computational methods. Benchmarking of the simulation algorithm was performed through comparison with the next subvolume method and well mixed models (MATLAB), as well as stochastic particle reaction and transport simulations (CHEMCELL, Sandia National Laboratories). Additionally, we construct a model based on a set of chemical reactions in the epidermal growth factor receptor pathway. For this particular application and a bistable chemical system example, we analyze and outline the advantages of our presented binomial tau-leap spatial stochastic simulation algorithm, in terms of efficiency and accuracy, in scenarios of both molecular homogeneity and heterogeneity.
Yi, Huangjian; Chen, Duofang; Li, Wei; Zhu, Shouping; Wang, Xiaorui; Liang, Jimin; Tian, Jie
2013-05-01
Fluorescence molecular tomography (FMT) is an important imaging technique of optical imaging. The major challenge of the reconstruction method for FMT is the ill-posed and underdetermined nature of the inverse problem. In past years, various regularization methods have been employed for fluorescence target reconstruction. A comparative study between the reconstruction algorithms based on l1-norm and l2-norm for two imaging models of FMT is presented. The first imaging model is adopted by most researchers, where the fluorescent target is of small size to mimic small tissue with fluorescent substance, as demonstrated by the early detection of a tumor. The second model is the reconstruction of distribution of the fluorescent substance in organs, which is essential to drug pharmacokinetics. Apart from numerical experiments, in vivo experiments were conducted on a dual-modality FMT/micro-computed tomography imaging system. The experimental results indicated that l1-norm regularization is more suitable for reconstructing the small fluorescent target, while l2-norm regularization performs better for the reconstruction of the distribution of fluorescent substance.
Molecular ping-pong Game of Life on a two-dimensional DNA origami array.
Jonoska, N; Seeman, N C
2015-07-28
We propose a design for programmed molecular interactions that continuously change molecular arrangements in a predesigned manner. We introduce a model where environmental control through laser illumination allows platform attachment/detachment oscillations between two floating molecular species. The platform is a two-dimensional DNA origami array of tiles decorated with strands that provide both, the floating molecular tiles to attach and to pass communicating signals to neighbouring array tiles. In particular, we show how algorithmic molecular interactions can control cyclic molecular arrangements by exhibiting a system that can simulate the dynamics similar to two-dimensional cellular automata on a DNA origami array platform. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Empirical simulations of materials
NASA Astrophysics Data System (ADS)
Jogireddy, Vasantha
2011-12-01
Molecular dynamics is a specialized discipline of molecular modelling and computer techniques. In this work, first we presented simulation results from a study carried out on silicon nanowires. In the second part of the work, we presented an electrostatic screened coulomb potential developed for studying metal alloys and metal oxides. In particular, we have studied aluminum-copper alloys, aluminum oxides and copper oxides. Parameter optimization for the potential is done using multiobjective optimization algorithms.
Simulation of Devices with Molecular Potentials
2013-12-22
10] W. R. Frensley, Wigner - function model of a resonant-tunneling semiconductor de- vice, Phys. Rev. B, 36 (1987), pp. 1570–1580. 6 [11] M. J...develop the principal investigator’s Wigner -Poisson code and extend that code to deal with longer devices and more complex barrier profiles. Over...Research Triangle Park, NC 27709-2211 Molecular Confirmation, Sparse Interpolation, Wigner -Poisson Equation, Parallel Algorithms REPORT DOCUMENTATION PAGE 11
Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics
NASA Astrophysics Data System (ADS)
Junghans, Christoph; Mniszewski, Susan; Voter, Arthur; Perez, Danny; Eidenbenz, Stephan
2014-03-01
We present an example of a new class of tools that we call application simulators, parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation (PDES). We demonstrate our approach with a TADSim application simulator that models the Temperature Accelerated Dynamics (TAD) method, which is an algorithmically complex member of the Accelerated Molecular Dynamics (AMD) family. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We further extend TADSim to model algorithm extensions to standard TAD, such as speculative spawning of the compute-bound stages of the algorithm, and predict performance improvements without having to implement such a method. Focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights into the TAD algorithm behavior and suggested extensions to the TAD method.
Molecular beacon sequence design algorithm.
Monroe, W Todd; Haselton, Frederick R
2003-01-01
A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.
Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M
2018-05-11
The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.
ChemTS: an efficient python library for de novo molecular generation
Yang, Xiufeng; Zhang, Jinzhe; Yoshizoe, Kazuki; Terayama, Kei; Tsuda, Koji
2017-01-01
Abstract Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS. PMID:29435094
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Dejun, E-mail: dejun.lin@gmail.com
2015-09-21
Accurate representation of intermolecular forces has been the central task of classical atomic simulations, known as molecular mechanics. Recent advancements in molecular mechanics models have put forward the explicit representation of permanent and/or induced electric multipole (EMP) moments. The formulas developed so far to calculate EMP interactions tend to have complicated expressions, especially in Cartesian coordinates, which can only be applied to a specific kernel potential function. For example, one needs to develop a new formula each time a new kernel function is encountered. The complication of these formalisms arises from an intriguing and yet obscured mathematical relation between themore » kernel functions and the gradient operators. Here, I uncover this relation via rigorous derivation and find that the formula to calculate EMP interactions is basically invariant to the potential kernel functions as long as they are of the form f(r), i.e., any Green’s function that depends on inter-particle distance. I provide an algorithm for efficient evaluation of EMP interaction energies, forces, and torques for any kernel f(r) up to any arbitrary rank of EMP moments in Cartesian coordinates. The working equations of this algorithm are essentially the same for any kernel f(r). Recently, a few recursive algorithms were proposed to calculate EMP interactions. Depending on the kernel functions, the algorithm here is about 4–16 times faster than these algorithms in terms of the required number of floating point operations and is much more memory efficient. I show that it is even faster than a theoretically ideal recursion scheme, i.e., one that requires 1 floating point multiplication and 1 addition per recursion step. This algorithm has a compact vector-based expression that is optimal for computer programming. The Cartesian nature of this algorithm makes it fit easily into modern molecular simulation packages as compared with spherical coordinate-based algorithms. A software library based on this algorithm has been implemented in C++11 and has been released.« less
We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm. Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined.
Zhang, Yufeng; Wang, Xiaoan; Wo, Siukwan; Ho, Hingman; Han, Quanbin; Fan, Xiaohui; Zuo, Zhong
2015-01-01
Resolving components and determining their pseudo-molecular ions (PMIs) are crucial steps in identifying complex herbal mixtures by liquid chromatography-mass spectrometry. To tackle such labor-intensive steps, we present here a novel algorithm for simultaneous detection of components and their PMIs. Our method consists of three steps: (1) obtaining a simplified dataset containing only mono-isotopic masses by removal of background noise and isotopic cluster ions based on the isotopic distribution model derived from all the reported natural compounds in dictionary of natural products; (2) stepwise resolving and removing all features of the highest abundant component from current simplified dataset and calculating PMI of each component according to an adduct-ion model, in which all non-fragment ions in a mass spectrum are considered as PMI plus one or several neutral species; (3) visual classification of detected components by principal component analysis (PCA) to exclude possible non-natural compounds (such as pharmaceutical excipients). This algorithm has been successfully applied to a standard mixture and three herbal extract/preparations. It indicated that our algorithm could detect components' features as a whole and report their PMI with an accuracy of more than 98%. Furthermore, components originated from excipients/contaminants could be easily separated from those natural components in the bi-plots of PCA. Copyright © 2014 Elsevier B.V. All rights reserved.
2011-01-01
Background Network inference methods reconstruct mathematical models of molecular or genetic networks directly from experimental data sets. We have previously reported a mathematical method which is exclusively data-driven, does not involve any heuristic decisions within the reconstruction process, and deliveres all possible alternative minimal networks in terms of simple place/transition Petri nets that are consistent with a given discrete time series data set. Results We fundamentally extended the previously published algorithm to consider catalysis and inhibition of the reactions that occur in the underlying network. The results of the reconstruction algorithm are encoded in the form of an extended Petri net involving control arcs. This allows the consideration of processes involving mass flow and/or regulatory interactions. As a non-trivial test case, the phosphate regulatory network of enterobacteria was reconstructed using in silico-generated time-series data sets on wild-type and in silico mutants. Conclusions The new exact algorithm reconstructs extended Petri nets from time series data sets by finding all alternative minimal networks that are consistent with the data. It suggested alternative molecular mechanisms for certain reactions in the network. The algorithm is useful to combine data from wild-type and mutant cells and may potentially integrate physiological, biochemical, pharmacological, and genetic data in the form of a single model. PMID:21762503
Ahmed, Shiek S. S. J.; Ramakrishnan, V.
2012-01-01
Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability. PMID:22815781
Ahmed, Shiek S S J; Ramakrishnan, V
2012-01-01
Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability.
Enhanced Particle Swarm Optimization Algorithm: Efficient Training of ReaxFF Reactive Force Fields.
Furman, David; Carmeli, Benny; Zeiri, Yehuda; Kosloff, Ronnie
2018-06-12
Particle swarm optimization (PSO) is a powerful metaheuristic population-based global optimization algorithm. However, when it is applied to nonseparable objective functions, its performance on multimodal landscapes is significantly degraded. Here we show that a significant improvement in the search quality and efficiency on multimodal functions can be achieved by enhancing the basic rotation-invariant PSO algorithm with isotropic Gaussian mutation operators. The new algorithm demonstrates superior performance across several nonlinear, multimodal benchmark functions compared with the rotation-invariant PSO algorithm and the well-established simulated annealing and sequential one-parameter parabolic interpolation methods. A search for the optimal set of parameters for the dispersion interaction model in the ReaxFF- lg reactive force field was carried out with respect to accurate DFT-TS calculations. The resulting optimized force field accurately describes the equations of state of several high-energy molecular crystals where such interactions are of crucial importance. The improved algorithm also presents better performance compared to a genetic algorithm optimization method in the optimization of the parameters of a ReaxFF- lg correction model. The computational framework is implemented in a stand-alone C++ code that allows the straightforward development of ReaxFF reactive force fields.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
Exploiting molecular dynamics in Nested Sampling simulations of small peptides
NASA Astrophysics Data System (ADS)
Burkoff, Nikolas S.; Baldock, Robert J. N.; Várnai, Csilla; Wild, David L.; Csányi, Gábor
2016-04-01
Nested Sampling (NS) is a parameter space sampling algorithm which can be used for sampling the equilibrium thermodynamics of atomistic systems. NS has previously been used to explore the potential energy surface of a coarse-grained protein model and has significantly outperformed parallel tempering when calculating heat capacity curves of Lennard-Jones clusters. The original NS algorithm uses Monte Carlo (MC) moves; however, a variant, Galilean NS, has recently been introduced which allows NS to be incorporated into a molecular dynamics framework, so NS can be used for systems which lack efficient prescribed MC moves. In this work we demonstrate the applicability of Galilean NS to atomistic systems. We present an implementation of Galilean NS using the Amber molecular dynamics package and demonstrate its viability by sampling alanine dipeptide, both in vacuo and implicit solvent. Unlike previous studies of this system, we present the heat capacity curves of alanine dipeptide, whose calculation provides a stringent test for sampling algorithms. We also compare our results with those calculated using replica exchange molecular dynamics (REMD) and find good agreement. We show the computational effort required for accurate heat capacity estimation for small peptides. We also calculate the alanine dipeptide Ramachandran free energy surface for a range of temperatures and use it to compare the results using the latest Amber force field with previous theoretical and experimental results.
Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl
2017-01-01
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118
Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments
De Paris, Renata; Quevedo, Christian V.; Ruiz, Duncan D.; Norberto de Souza, Osmar; Barros, Rodrigo C.
2015-01-01
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand. PMID:25873944
Automated parameterization of intermolecular pair potentials using global optimization techniques
NASA Astrophysics Data System (ADS)
Krämer, Andreas; Hülsmann, Marco; Köddermann, Thorsten; Reith, Dirk
2014-12-01
In this work, different global optimization techniques are assessed for the automated development of molecular force fields, as used in molecular dynamics and Monte Carlo simulations. The quest of finding suitable force field parameters is treated as a mathematical minimization problem. Intricate problem characteristics such as extremely costly and even abortive simulations, noisy simulation results, and especially multiple local minima naturally lead to the use of sophisticated global optimization algorithms. Five diverse algorithms (pure random search, recursive random search, CMA-ES, differential evolution, and taboo search) are compared to our own tailor-made solution named CoSMoS. CoSMoS is an automated workflow. It models the parameters' influence on the simulation observables to detect a globally optimal set of parameters. It is shown how and why this approach is superior to other algorithms. Applied to suitable test functions and simulations for phosgene, CoSMoS effectively reduces the number of required simulations and real time for the optimization task.
Geometric Detection Algorithms for Cavities on Protein Surfaces in Molecular Graphics: A Survey
Simões, Tiago; Lopes, Daniel; Dias, Sérgio; Fernandes, Francisco; Pereira, João; Jorge, Joaquim; Bajaj, Chandrajit; Gomes, Abel
2017-01-01
Detecting and analyzing protein cavities provides significant information about active sites for biological processes (e.g., protein-protein or protein-ligand binding) in molecular graphics and modeling. Using the three-dimensional structure of a given protein (i.e., atom types and their locations in 3D) as retrieved from a PDB (Protein Data Bank) file, it is now computationally viable to determine a description of these cavities. Such cavities correspond to pockets, clefts, invaginations, voids, tunnels, channels, and grooves on the surface of a given protein. In this work, we survey the literature on protein cavity computation and classify algorithmic approaches into three categories: evolution-based, energy-based, and geometry-based. Our survey focuses on geometric algorithms, whose taxonomy is extended to include not only sphere-, grid-, and tessellation-based methods, but also surface-based, hybrid geometric, consensus, and time-varying methods. Finally, we detail those techniques that have been customized for GPU (Graphics Processing Unit) computing. PMID:29520122
Kobayashi, Chigusa; Jung, Jaewoon; Matsunaga, Yasuhiro; Mori, Takaharu; Ando, Tadashi; Tamura, Koichi; Kamiya, Motoshi; Sugita, Yuji
2017-09-30
GENeralized-Ensemble SImulation System (GENESIS) is a software package for molecular dynamics (MD) simulation of biological systems. It is designed to extend limitations in system size and accessible time scale by adopting highly parallelized schemes and enhanced conformational sampling algorithms. In this new version, GENESIS 1.1, new functions and advanced algorithms have been added. The all-atom and coarse-grained potential energy functions used in AMBER and GROMACS packages now become available in addition to CHARMM energy functions. The performance of MD simulations has been greatly improved by further optimization, multiple time-step integration, and hybrid (CPU + GPU) computing. The string method and replica-exchange umbrella sampling with flexible collective variable choice are used for finding the minimum free-energy pathway and obtaining free-energy profiles for conformational changes of a macromolecule. These new features increase the usefulness and power of GENESIS for modeling and simulation in biological research. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
A Graph-Algorithmic Approach for the Study of Metastability in Markov Chains
NASA Astrophysics Data System (ADS)
Gan, Tingyue; Cameron, Maria
2017-06-01
Large continuous-time Markov chains with exponentially small transition rates arise in modeling complex systems in physics, chemistry, and biology. We propose a constructive graph-algorithmic approach to determine the sequence of critical timescales at which the qualitative behavior of a given Markov chain changes, and give an effective description of the dynamics on each of them. This approach is valid for both time-reversible and time-irreversible Markov processes, with or without symmetry. Central to this approach are two graph algorithms, Algorithm 1 and Algorithm 2, for obtaining the sequences of the critical timescales and the hierarchies of Typical Transition Graphs or T-graphs indicating the most likely transitions in the system without and with symmetry, respectively. The sequence of critical timescales includes the subsequence of the reciprocals of the real parts of eigenvalues. Under a certain assumption, we prove sharp asymptotic estimates for eigenvalues (including pre-factors) and show how one can extract them from the output of Algorithm 1. We discuss the relationship between Algorithms 1 and 2 and explain how one needs to interpret the output of Algorithm 1 if it is applied in the case with symmetry instead of Algorithm 2. Finally, we analyze an example motivated by R. D. Astumian's model of the dynamics of kinesin, a molecular motor, by means of Algorithm 2.
O'Hagan, Steve; Knowles, Joshua; Kell, Douglas B.
2012-01-01
Comparatively few studies have addressed directly the question of quantifying the benefits to be had from using molecular genetic markers in experimental breeding programmes (e.g. for improved crops and livestock), nor the question of which organisms should be mated with each other to best effect. We argue that this requires in silico modelling, an approach for which there is a large literature in the field of evolutionary computation (EC), but which has not really been applied in this way to experimental breeding programmes. EC seeks to optimise measurable outcomes (phenotypic fitnesses) by optimising in silico the mutation, recombination and selection regimes that are used. We review some of the approaches from EC, and compare experimentally, using a biologically relevant in silico landscape, some algorithms that have knowledge of where they are in the (genotypic) search space (G-algorithms) with some (albeit well-tuned ones) that do not (F-algorithms). For the present kinds of landscapes, F- and G-algorithms were broadly comparable in quality and effectiveness, although we recognise that the G-algorithms were not equipped with any ‘prior knowledge’ of epistatic pathway interactions. This use of algorithms based on machine learning has important implications for the optimisation of experimental breeding programmes in the post-genomic era when we shall potentially have access to the full genome sequence of every organism in a breeding population. The non-proprietary code that we have used is made freely available (via Supplementary information). PMID:23185279
NASA Astrophysics Data System (ADS)
Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.
2016-04-01
We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
NASA Astrophysics Data System (ADS)
Jensen, Christian H.; Nerukh, Dmitry; Glen, Robert C.
2008-03-01
We investigate the sensitivity of a Markov model with states and transition probabilities obtained from clustering a molecular dynamics trajectory. We have examined a 500ns molecular dynamics trajectory of the peptide valine-proline-alanine-leucine in explicit water. The sensitivity is quantified by varying the boundaries of the clusters and investigating the resulting variation in transition probabilities and the average transition time between states. In this way, we represent the effect of clustering using different clustering algorithms. It is found that in terms of the investigated quantities, the peptide dynamics described by the Markov model is sensitive to the clustering; in particular, the average transition times are found to vary up to 46%. Moreover, inclusion of nonphysical sparsely populated clusters can lead to serious errors of up to 814%. In the investigation, the time step used in the transition matrix is determined by the minimum time scale on which the system behaves approximately Markovian. This time step is found to be about 100ps. It is concluded that the description of peptide dynamics with transition matrices should be performed with care, and that using standard clustering algorithms to obtain states and transition probabilities may not always produce reliable results.
NASA Astrophysics Data System (ADS)
Sastry, Kumara Narasimha
2007-03-01
Effective and efficient rnultiscale modeling is essential to advance both the science and synthesis in a, wide array of fields such as physics, chemistry, materials science; biology, biotechnology and pharmacology. This study investigates the efficacy and potential of rising genetic algorithms for rnultiscale materials modeling and addresses some of the challenges involved in designing competent algorithms that solve hard problems quickly, reliably and accurately. In particular, this thesis demonstrates the use of genetic algorithms (GAs) and genetic programming (GP) in multiscale modeling with the help of two non-trivial case studies in materials science and chemistry. The first case study explores the utility of genetic programming (GP) in multi-timescaling alloy kinetics simulations. In essence, GP is used to bridge molecular dynamics and kinetic Monte Carlo methods to span orders-of-magnitude in simulation time. Specifically, GP is used to regress symbolically an inline barrier function from a limited set of molecular dynamics simulations to enable kinetic Monte Carlo that simulate seconds of real time. Results on a non-trivial example of vacancy-assisted migration on a surface of a face-centered cubic (fcc) Copper-Cobalt (CuxCo 1-x) alloy show that GP predicts all barriers with 0.1% error from calculations for less than 3% of active configurations, independent of type of potentials used to obtain the learning set of barriers via molecular dynamics. The resulting method enables 2--9 orders-of-magnitude increase in real-time dynamics simulations taking 4--7 orders-of-magnitude less CPU time. The second case study presents the application of multiobjective genetic algorithms (MOGAs) in multiscaling quantum chemistry simulations. Specifically, MOGAs are used to bridge high-level quantum chemistry and semiempirical methods to provide accurate representation of complex molecular excited-state and ground-state behavior. Results on ethylene and benzene---two common building blocks in organic chemistry---indicate that MOGAs produce High-quality semiempirical methods that (1) are stable to small perturbations, (2) yield accurate configuration energies on untested and critical excited states, and (3) yield ab initio quality excited-state dynamics. The proposed method enables simulations of more complex systems to realistic, multi-picosecond timescales, well beyond previous attempts or expectation of human experts, and 2--3 orders-of-magnitude reduction in computational cost. While the two applications use simple evolutionary operators, in order to tackle more complex systems, their scalability and limitations have to be investigated. The second part of the thesis addresses some of the challenges involved with a successful design of genetic algorithms and genetic programming for multiscale modeling. The first issue addressed is the scalability of genetic programming, where facetwise models are built to assess the population size required by GP to ensure adequate supply of raw building blocks and also to ensure accurate decision-making between competing building blocks. This study also presents a design of competent genetic programming, where traditional fixed recombination operators are replaced by building and sampling probabilistic models of promising candidate programs. The proposed scalable GP, called extended compact GP (eCGP), combines the ideas from extended compact genetic algorithm (eCGA) and probabilistic incremental program evolution (PIPE) and adaptively identifies, propagates and exchanges important subsolutions of a search problem. Results show that eCGP scales cubically with problem size on both GP-easy and GP-hard problems. Finally, facetwise models are developed to explore limitations of scalability of MOGAs, where the scalability of multiobjective algorithms in reliably maintaining Pareto-optimal solutions is addressed. The results show that even when the building blocks are accurately identified, massive multimodality of the search problems can easily overwhelm the nicher (diversity preserving operator) and lead to exponential scale-up. Facetwise models are developed, which incorporate the combined effects of model accuracy, decision making, and sub-structure supply, as well as the effect of niching on the population sizing, to predict a limit on the growth rate of a maximum number of sub-structures that can compete in the two objectives to circumvent the failure of the niching method. The results show that if the number of competing building blocks between multiple objectives is less than the proposed limit, multiobjective GAs scale-up polynomially with the problem size on boundedly-difficult problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, R.; Harrison, D. E. Jr.
A variable time step integration algorithm for carrying out molecular dynamics simulations of atomic collision cascades is proposed which evaluates the interaction forces only once per time step. The algorithm is tested on some model problems which have exact solutions and is compared against other common methods. These comparisons show that the method has good stability and accuracy. Applications to Ar/sup +/ bombardment of Cu and Si show good accuracy and improved speed to the original method (D. E. Harrison, W. L. Gay, and H. M. Effron, J. Math. Phys. /bold 10/, 1179 (1969)).
An adaptive interpolation scheme for molecular potential energy surfaces
NASA Astrophysics Data System (ADS)
Kowalewski, Markus; Larsson, Elisabeth; Heryudono, Alfa
2016-08-01
The calculation of potential energy surfaces for quantum dynamics can be a time consuming task—especially when a high level of theory for the electronic structure calculation is required. We propose an adaptive interpolation algorithm based on polyharmonic splines combined with a partition of unity approach. The adaptive node refinement allows to greatly reduce the number of sample points by employing a local error estimate. The algorithm and its scaling behavior are evaluated for a model function in 2, 3, and 4 dimensions. The developed algorithm allows for a more rapid and reliable interpolation of a potential energy surface within a given accuracy compared to the non-adaptive version.
Hybrid stochastic simulations of intracellular reaction-diffusion systems.
Kalantzis, Georgios
2009-06-01
With the observation that stochasticity is important in biological systems, chemical kinetics have begun to receive wider interest. While the use of Monte Carlo discrete event simulations most accurately capture the variability of molecular species, they become computationally costly for complex reaction-diffusion systems with large populations of molecules. On the other hand, continuous time models are computationally efficient but they fail to capture any variability in the molecular species. In this study a hybrid stochastic approach is introduced for simulating reaction-diffusion systems. We developed an adaptive partitioning strategy in which processes with high frequency are simulated with deterministic rate-based equations, and those with low frequency using the exact stochastic algorithm of Gillespie. Therefore the stochastic behavior of cellular pathways is preserved while being able to apply it to large populations of molecules. We describe our method and demonstrate its accuracy and efficiency compared with the Gillespie algorithm for two different systems. First, a model of intracellular viral kinetics with two steady states and second, a compartmental model of the postsynaptic spine head for studying the dynamics of Ca+2 and NMDA receptors.
Cubarsi, R; Carrió, M M; Villaverde, A
2005-09-01
The in vivo proteolytic digestion of bacterial inclusion bodies (IBs) and the kinetic analysis of the resulting protein fragments is an interesting approach to investigate the molecular organization of these unconventional protein aggregates. In this work, we describe a set of mathematical instruments useful for such analysis and interpretation of observed data. These methods combine numerical estimation of digestion rate and approximation of its high-order derivatives, modelling of fragmentation events from a mixture of Poisson processes associated with differentiated protein species, differential equations techniques in order to estimate the mixture parameters, an iterative predictor-corrector algorithm for describing the flow diagram along the cascade process, as well as least squares procedures with minimum variance estimates. The models are formulated and compared with data, and successively refined to better match experimental observations. By applying such procedures as well as newer improved algorithms of formerly developed equations, it has been possible to model, for two kinds of bacterially produced aggregation prone recombinant proteins, their cascade digestion process that has revealed intriguing features of the IB-forming polypeptides.
A study on the application of topic models to motif finding algorithms.
Basha Gutierrez, Josep; Nakai, Kenta
2016-12-22
Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.
A parallel algorithm for step- and chain-growth polymerization in molecular dynamics.
de Buyl, Pierre; Nies, Erik
2015-04-07
Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.
A parallel algorithm for step- and chain-growth polymerization in molecular dynamics
NASA Astrophysics Data System (ADS)
de Buyl, Pierre; Nies, Erik
2015-04-01
Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.
Towards de novo identification of metabolites by analyzing tandem mass spectra.
Böcker, Sebastian; Rasche, Florian
2008-08-15
Mass spectrometry is among the most widely used technologies in proteomics and metabolomics. Being a high-throughput method, it produces large amounts of data that necessitates an automated analysis of the spectra. Clearly, database search methods for protein analysis can easily be adopted to analyze metabolite mass spectra. But for metabolites, de novo interpretation of spectra is even more important than for protein data, because metabolite spectra databases cover only a small fraction of naturally occurring metabolites: even the model plant Arabidopsis thaliana has a large number of enzymes whose substrates and products remain unknown. The field of bio-prospection searches biologically diverse areas for metabolites which might serve as pharmaceuticals. De novo identification of metabolite mass spectra requires new concepts and methods since, unlike proteins, metabolites possess a non-linear molecular structure. In this work, we introduce a method for fully automated de novo identification of metabolites from tandem mass spectra. Mass spectrometry data is usually assumed to be insufficient for identification of molecular structures, so we want to estimate the molecular formula of the unknown metabolite, a crucial step for its identification. The method first calculates all molecular formulas that explain the parent peak mass. Then, a graph is build where vertices correspond to molecular formulas of all peaks in the fragmentation mass spectra, whereas edges correspond to hypothetical fragmentation steps. Our algorithm afterwards calculates the maximum scoring subtree of this graph: each peak in the spectra must be scored at most once, so the subtree shall contain only one explanation per peak. Unfortunately, finding this subtree is NP-hard. We suggest three exact algorithms (including one fixed parameter tractable algorithm) as well as two heuristics to solve the problem. Tests on real mass spectra show that the FPT algorithm and the heuristics solve the problem suitably fast and provide excellent results: for all 32 test compounds the correct solution was among the top five suggestions, for 26 compounds the first suggestion of the exact algorithm was correct. http://www.bio.inf.uni-jena.de/tandemms
Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.
Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard
2015-02-01
Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.
Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D; Sebastiani, Daniel
2012-11-21
We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
NASA Astrophysics Data System (ADS)
Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D.; Sebastiani, Daniel
2012-11-01
We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
Attributed relational graphs for cell nucleus segmentation in fluorescence microscopy images.
Arslan, Salim; Ersahin, Tulin; Cetin-Atalay, Rengul; Gunduz-Demir, Cigdem
2013-06-01
More rapid and accurate high-throughput screening in molecular cellular biology research has become possible with the development of automated microscopy imaging, for which cell nucleus segmentation commonly constitutes the core step. Although several promising methods exist for segmenting the nuclei of monolayer isolated and less-confluent cells, it still remains an open problem to segment the nuclei of more-confluent cells, which tend to grow in overlayers. To address this problem, we propose a new model-based nucleus segmentation algorithm. This algorithm models how a human locates a nucleus by identifying the nucleus boundaries and piecing them together. In this algorithm, we define four types of primitives to represent nucleus boundaries at different orientations and construct an attributed relational graph on the primitives to represent their spatial relations. Then, we reduce the nucleus identification problem to finding predefined structural patterns in the constructed graph and also use the primitives in region growing to delineate the nucleus borders. Working with fluorescence microscopy images, our experiments demonstrate that the proposed algorithm identifies nuclei better than previous nucleus segmentation algorithms.
Algorithms and physical parameters involved in the calculation of model stellar atmospheres
NASA Astrophysics Data System (ADS)
Merlo, D. C.
This contribution summarizes the Doctoral Thesis presented at Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba for the degree of PhD in Astronomy. We analyze some algorithms and physical parameters involved in the calculation of model stellar atmospheres, such as atomic partition functions, functional relations connecting gaseous and electronic pressure, molecular formation, temperature distribution, chemical compositions, Gaunt factors, atomic cross-sections and scattering sources, as well as computational codes for calculating models. Special attention is paid to the integration of hydrostatic equation. We compare our results with those obtained by other authors, finding reasonable agreement. We make efforts on the implementation of methods that modify the originally adopted temperature distribution in the atmosphere, in order to obtain constant energy flux throughout. We find limitations and we correct numerical instabilities. We integrate the transfer equation solving directly the integral equation involving the source function. As a by-product, we calculate updated atomic partition functions of the light elements. Also, we discuss and enumerate carefully selected formulae for the monochromatic absorption and dispersion of some atomic and molecular species. Finally, we obtain a flexible code to calculate model stellar atmospheres.
Verhoye, E; Vandecandelaere, P; De Beenhouwer, H; Coppens, G; Cartuyvels, R; Van den Abeele, A; Frans, J; Laffut, W
2015-10-01
Despite thorough analyses of the analytical performance of Clostridium difficile tests and test algorithms, the financial impact at hospital level has not been well described. Such a model should take institution-specific variables into account, such as incidence, request behaviour and infection control policies. To calculate the total hospital costs of different test algorithms, accounting for days on which infected patients with toxigenic strains were not isolated and therefore posed an infectious risk for new/secondary nosocomial infections. A mathematical algorithm was developed to gather the above parameters using data from seven Flemish hospital laboratories (Bilulu Microbiology Study Group) (number of tests, local prevalence and hospital hygiene measures). Measures of sensitivity and specificity for the evaluated tests were taken from the literature. List prices and costs of assays were provided by the manufacturer or the institutions. The calculated cost included reagent costs, personnel costs and the financial burden following due and undue isolations and antibiotic therapies. Five different test algorithms were compared. A dynamic calculation model was constructed to evaluate the cost:benefit ratio of each algorithm for a set of institution- and time-dependent inputted variables (prevalence, cost fluctuations and test performances), making it possible to choose the most advantageous algorithm for its setting. A two-step test algorithm with concomitant glutamate dehydrogenase and toxin testing, followed by a rapid molecular assay was found to be the most cost-effective algorithm. This enabled resolution of almost all cases on the day of arrival, minimizing the number of unnecessary or missing isolations. Copyright © 2015 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
Toward Measuring Galactic Dense Molecular Gas Properties and 3D Distribution with Hi-GAL
NASA Astrophysics Data System (ADS)
Zetterlund, Erika; Glenn, Jason; Maloney, Phil
2016-01-01
The Herschel Space Observatory's submillimeter dust continuum survey Hi-GAL provides a powerful new dataset for characterizing the structure of the dense interstellar medium of the Milky Way. Hi-GAL observed a 2° wide strip covering the entire 360° of the Galactic plane in broad bands centered at 70, 160, 250, 350, and 500 μm, with angular resolution ranging from 10 to 40 arcseconds. We are adapting a molecular cloud clump-finding algorithm and a distance probability density function distance-determination method developed for the Bolocam Galactic Plane Survey (BGPS) to the Hi-GAL data. Using these methods we expect to generate a database of 105 cloud clumps, derive distance information for roughly half the clumps, and derive precise distances for approximately 20% of them. With five-color photometry and distances, we will measure the cloud clump properties, such as luminosities, physical sizes, and masses, and construct a three-dimensional map of the Milky Way's dense molecular gas distribution.The cloud clump properties and the dense gas distribution will provide critical ground truths for comparison to theoretical models of molecular cloud structure formation and galaxy evolution models that seek to emulate spiral galaxies. For example, such models cannot resolve star formation and use prescriptive recipes, such as converting a fixed fraction of interstellar gas to stars at a specified interstellar medium density threshold. The models should be compared to observed dense molecular gas properties and galactic distributions.As a pilot survey to refine the clump-finding and distance measurement algorithms developed for BGPS, we have identified molecular cloud clumps in six 2° × 2° patches of the Galactic plane, including one in the inner Galaxy along the line of sight through the Molecular Ring and the termination of the Galactic bar and one toward the outer Galaxy. Distances have been derived for the inner Galaxy clumps and compared to Bolocam Galactic Plane Survey results. We present the pilot survey clump catalog, distances, clump properties, and a comparison to BGPS.
A comparison of machine learning and Bayesian modelling for molecular serotyping.
Newton, Richard; Wernisch, Lorenz
2017-08-11
Streptococcus pneumoniae is a human pathogen that is a major cause of infant mortality. Identifying the pneumococcal serotype is an important step in monitoring the impact of vaccines used to protect against disease. Genomic microarrays provide an effective method for molecular serotyping. Previously we developed an empirical Bayesian model for the classification of serotypes from a molecular serotyping array. With only few samples available, a model driven approach was the only option. In the meanwhile, several thousand samples have been made available to us, providing an opportunity to investigate serotype classification by machine learning methods, which could complement the Bayesian model. We compare the performance of the original Bayesian model with two machine learning algorithms: Gradient Boosting Machines and Random Forests. We present our results as an example of a generic strategy whereby a preliminary probabilistic model is complemented or replaced by a machine learning classifier once enough data are available. Despite the availability of thousands of serotyping arrays, a problem encountered when applying machine learning methods is the lack of training data containing mixtures of serotypes; due to the large number of possible combinations. Most of the available training data comprises samples with only a single serotype. To overcome the lack of training data we implemented an iterative analysis, creating artificial training data of serotype mixtures by combining raw data from single serotype arrays. With the enhanced training set the machine learning algorithms out perform the original Bayesian model. However, for serotypes currently lacking sufficient training data the best performing implementation was a combination of the results of the Bayesian Model and the Gradient Boosting Machine. As well as being an effective method for classifying biological data, machine learning can also be used as an efficient method for revealing subtle biological insights, which we illustrate with an example.
Application of JAERI quantum molecular dynamics model for collisions of heavy nuclei
NASA Astrophysics Data System (ADS)
Ogawa, Tatsuhiko; Hashimoto, Shintaro; Sato, Tatsuhiko; Niita, Koji
2016-06-01
The quantum molecular dynamics (QMD) model incorporated into the general-purpose radiation transport code PHITS was revised for accurate prediction of fragment yields in peripheral collisions. For more accurate simulation of peripheral collisions, stability of the nuclei at their ground state was improved and the algorithm to reject invalid events was modified. In-medium correction on nucleon-nucleon cross sections was also considered. To clarify the effect of this improvement on fragmentation of heavy nuclei, the new QMD model coupled with a statistical decay model was used to calculate fragment production cross sections of Ag and Au targets and compared with the data of earlier measurement. It is shown that the revised version can predict cross section more accurately.
NASA Astrophysics Data System (ADS)
Waldmann, Ingo
2016-10-01
Radiative transfer retrievals have become the standard in modelling of exoplanetary transmission and emission spectra. Analysing currently available observations of exoplanetary atmospheres often invoke large and correlated parameter spaces that can be difficult to map or constrain.To address these issues, we have developed the Tau-REx (tau-retrieval of exoplanets) retrieval and the RobERt spectral recognition algorithms. Tau-REx is a bayesian atmospheric retrieval framework using Nested Sampling and cluster computing to fully map these large correlated parameter spaces. Nonetheless, data volumes can become prohibitively large and we must often select a subset of potential molecular/atomic absorbers in an atmosphere.In the era of open-source, automated and self-sufficient retrieval algorithms, such manual input should be avoided. User dependent input could, in worst case scenarios, lead to incomplete models and biases in the retrieval. The RobERt algorithm is build to address these issues. RobERt is a deep belief neural (DBN) networks trained to accurately recognise molecular signatures for a wide range of planets, atmospheric thermal profiles and compositions. Using these deep neural networks, we work towards retrieval algorithms that themselves understand the nature of the observed spectra, are able to learn from current and past data and make sensible qualitative preselections of atmospheric opacities to be used for the quantitative stage of the retrieval process.In this talk I will discuss how neural networks and Bayesian Nested Sampling can be used to solve highly degenerate spectral retrieval problems and what 'dreaming' neural networks can tell us about atmospheric characteristics.
Tabletop Molecular Communication: Text Messages through Chemical Signals
Farsad, Nariman; Guo, Weisi; Eckford, Andrew W.
2013-01-01
In this work, we describe the first modular, and programmable platform capable of transmitting a text message using chemical signalling – a method also known as molecular communication. This form of communication is attractive for applications where conventional wireless systems perform poorly, from nanotechnology to urban health monitoring. Using examples, we demonstrate the use of our platform as a testbed for molecular communication, and illustrate the features of these communication systems using experiments. By providing a simple and inexpensive means of performing experiments, our system fills an important gap in the molecular communication literature, where much current work is done in simulation with simplified system models. A key finding in this paper is that these systems are often nonlinear in practice, whereas current simulations and analysis often assume that the system is linear. However, as we show in this work, despite the nonlinearity, reliable communication is still possible. Furthermore, this work motivates future studies on more realistic modelling, analysis, and design of theoretical models and algorithms for these systems. PMID:24367571
Coastal Zone Color Scanner atmospheric correction algorithm - Multiple scattering effects
NASA Technical Reports Server (NTRS)
Gordon, Howard R.; Castano, Diego J.
1987-01-01
Errors due to multiple scattering which are expected to be encountered in application of the current Coastal Zone Color Scanner (CZCS) atmospheric correction algorithm are analyzed. The analysis is based on radiative transfer computations in model atmospheres, in which the aerosols and molecules are distributed vertically in an exponential manner, with most of the aerosol scattering located below the molecular scattering. A unique feature of the analysis is that it is carried out in scan coordinates rather than typical earth-sun coordinates, making it possible to determine the errors along typical CZCS scan lines. Information provided by the analysis makes it possible to judge the efficacy of the current algorithm with the current sensor and to estimate the impact of the algorithm-induced errors on a variety of applications.
Takeshima, T; Takahashi, T; Yamashita, J; Okada, Y; Watanabe, S
2018-05-25
Multi-emitter fitting algorithms have been developed to improve the temporal resolution of single-molecule switching nanoscopy, but the molecular density range they can analyse is narrow and the computation required is intensive, significantly limiting their practical application. Here, we propose a computationally fast method, wedged template matching (WTM), an algorithm that uses a template matching technique to localise molecules at any overlapping molecular density from sparse to ultrahigh density with subdiffraction resolution. WTM achieves the localization of overlapping molecules at densities up to 600 molecules μm -2 with a high detection sensitivity and fast computational speed. WTM also shows localization precision comparable with that of DAOSTORM (an algorithm for high-density super-resolution microscopy), at densities up to 20 molecules μm -2 , and better than DAOSTORM at higher molecular densities. The application of WTM to a high-density biological sample image demonstrated that it resolved protein dynamics from live cell images with subdiffraction resolution and a temporal resolution of several hundred milliseconds or less through a significant reduction in the number of camera images required for a high-density reconstruction. WTM algorithm is a computationally fast, multi-emitter fitting algorithm that can analyse over a wide range of molecular densities. The algorithm is available through the website. https://doi.org/10.17632/bf3z6xpn5j.1. © 2018 The Authors. Journal of Microscopy published by JohnWiley & Sons Ltd on behalf of Royal Microscopical Society.
SHARPEN-systematic hierarchical algorithms for rotamers and proteins on an extended network.
Loksha, Ilya V; Maiolo, James R; Hong, Cheng W; Ng, Albert; Snow, Christopher D
2009-04-30
Algorithms for discrete optimization of proteins play a central role in recent advances in protein structure prediction and design. We wish to improve the resources available for computational biologists to rapidly prototype such algorithms and to easily scale these algorithms to many processors. To that end, we describe the implementation and use of two new open source resources, citing potential benefits over existing software. We discuss CHOMP, a new object-oriented library for macromolecular optimization, and SHARPEN, a framework for scaling CHOMP scripts to many computers. These tools allow users to develop new algorithms for a variety of applications including protein repacking, protein-protein docking, loop rebuilding, or homology model remediation. Particular care was taken to allow modular energy function design; protein conformations may currently be scored using either the OPLSaa molecular mechanical energy function or an all-atom semiempirical energy function employed by Rosetta. (c) 2009 Wiley Periodicals, Inc.
Simakov, Nikolay A.
2010-01-01
A soft repulsion (SR) model of short range interactions between mobile ions and protein atoms is introduced in the framework of continuum representation of the protein and solvent. The Poisson-Nernst-Plank (PNP) theory of ion transport through biological channels is modified to incorporate this soft wall protein model. Two sets of SR parameters are introduced: the first is parameterized for all essential amino acid residues using all atom molecular dynamic simulations; the second is a truncated Lennard – Jones potential. We have further designed an energy based algorithm for the determination of the ion accessible volume, which is appropriate for a particular system discretization. The effects of these models of short-range interaction were tested by computing current-voltage characteristics of the α-hemolysin channel. The introduced SR potentials significantly improve prediction of channel selectivity. In addition, we studied the effect of choice of some space-dependent diffusion coefficient distributions on the predicted current-voltage properties. We conclude that the diffusion coefficient distributions largely affect total currents and have little effect on rectifications, selectivity or reversal potential. The PNP-SR algorithm is implemented in a new efficient parallel Poisson, Poisson-Boltzman and PNP equation solver, also incorporated in a graphical molecular modeling package HARLEM. PMID:21028776
Bio-inspired algorithms applied to molecular docking simulations.
Heberlé, G; de Azevedo, W F
2011-01-01
Nature as a source of inspiration has been shown to have a great beneficial impact on the development of new computational methodologies. In this scenario, analyses of the interactions between a protein target and a ligand can be simulated by biologically inspired algorithms (BIAs). These algorithms mimic biological systems to create new paradigms for computation, such as neural networks, evolutionary computing, and swarm intelligence. This review provides a description of the main concepts behind BIAs applied to molecular docking simulations. Special attention is devoted to evolutionary algorithms, guided-directed evolutionary algorithms, and Lamarckian genetic algorithms. Recent applications of these methodologies to protein targets identified in the Mycobacterium tuberculosis genome are described.
Efficient implementation of constant pH molecular dynamics on modern graphics processors.
Arthur, Evan J; Brooks, Charles L
2016-09-15
The treatment of pH sensitive ionization states for titratable residues in proteins is often omitted from molecular dynamics (MD) simulations. While static charge models can answer many questions regarding protein conformational equilibrium and protein-ligand interactions, pH-sensitive phenomena such as acid-activated chaperones and amyloidogenic protein aggregation are inaccessible to such models. Constant pH molecular dynamics (CPHMD) coupled with the Generalized Born with a Simple sWitching function (GBSW) implicit solvent model provide an accurate framework for simulating pH sensitive processes in biological systems. Although this combination has demonstrated success in predicting pKa values of protein structures, and in exploring dynamics of ionizable side-chains, its speed has been an impediment to routine application. The recent availability of low-cost graphics processing unit (GPU) chipsets with thousands of processing cores, together with the implementation of the accurate GBSW implicit solvent model on those chipsets (Arthur and Brooks, J. Comput. Chem. 2016, 37, 927), provide an opportunity to improve the speed of CPHMD and ionization modeling greatly. Here, we present a first implementation of GPU-enabled CPHMD within the CHARMM-OpenMM simulation package interface. Depending on the system size and nonbonded force cutoff parameters, we find speed increases of between one and three orders of magnitude. Additionally, the algorithm scales better with system size than the CPU-based algorithm, thus allowing for larger systems to be modeled in a cost effective manner. We anticipate that the improved performance of this methodology will open the door for broad-spread application of CPHMD in its modeling pH-mediated biological processes. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Algorithmic commonalities in the parallel environment
NASA Technical Reports Server (NTRS)
Mcanulty, Michael A.; Wainer, Michael S.
1987-01-01
The ultimate aim of this project was to analyze procedures from substantially different application areas to discover what is either common or peculiar in the process of conversion to the Massively Parallel Processor (MPP). Three areas were identified: molecular dynamic simulation, production systems (rule systems), and various graphics and vision algorithms. To date, only selected graphics procedures have been investigated. They are the most readily available, and produce the most visible results. These include simple polygon patch rendering, raycasting against a constructive solid geometric model, and stochastic or fractal based textured surface algorithms. Only the simplest of conversion strategies, mapping a major loop to the array, has been investigated so far. It is not entirely satisfactory.
An adaptive interpolation scheme for molecular potential energy surfaces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kowalewski, Markus, E-mail: mkowalew@uci.edu; Larsson, Elisabeth; Heryudono, Alfa
The calculation of potential energy surfaces for quantum dynamics can be a time consuming task—especially when a high level of theory for the electronic structure calculation is required. We propose an adaptive interpolation algorithm based on polyharmonic splines combined with a partition of unity approach. The adaptive node refinement allows to greatly reduce the number of sample points by employing a local error estimate. The algorithm and its scaling behavior are evaluated for a model function in 2, 3, and 4 dimensions. The developed algorithm allows for a more rapid and reliable interpolation of a potential energy surface within amore » given accuracy compared to the non-adaptive version.« less
An Introduction to Computational Physics
NASA Astrophysics Data System (ADS)
Pang, Tao
2010-07-01
Preface to first edition; Preface; Acknowledgements; 1. Introduction; 2. Approximation of a function; 3. Numerical calculus; 4. Ordinary differential equations; 5. Numerical methods for matrices; 6. Spectral analysis; 7. Partial differential equations; 8. Molecular dynamics simulations; 9. Modeling continuous systems; 10. Monte Carlo simulations; 11. Genetic algorithm and programming; 12. Numerical renormalization; References; Index.
Systems Biology-Driven Hypotheses Tested In Vivo: The Need to Advancing Molecular Imaging Tools.
Verma, Garima; Palombo, Alessandro; Grigioni, Mauro; La Monaca, Morena; D'Avenio, Giuseppe
2018-01-01
Processing and interpretation of biological images may provide invaluable insights on complex, living systems because images capture the overall dynamics as a "whole." Therefore, "extraction" of key, quantitative morphological parameters could be, at least in principle, helpful in building a reliable systems biology approach in understanding living objects. Molecular imaging tools for system biology models have attained widespread usage in modern experimental laboratories. Here, we provide an overview on advances in the computational technology and different instrumentations focused on molecular image processing and analysis. Quantitative data analysis through various open source software and algorithmic protocols will provide a novel approach for modeling the experimental research program. Besides this, we also highlight the predictable future trends regarding methods for automatically analyzing biological data. Such tools will be very useful to understand the detailed biological and mathematical expressions under in-silico system biology processes with modeling properties.
Silletta, Emilia V; Franzoni, María B; Monti, Gustavo A; Acosta, Rodolfo H
2018-01-01
Two-dimension (2D) Nuclear Magnetic Resonance relaxometry experiments are a powerful tool extensively used to probe the interaction among different pore structures, mostly in inorganic systems. The analysis of the collected experimental data generally consists of a 2D numerical inversion of time-domain data where T 2 -T 2 maps are generated. Through the years, different algorithms for the numerical inversion have been proposed. In this paper, two different algorithms for numerical inversion are tested and compared under different conditions of exchange dynamics; the method based on Butler-Reeds-Dawson (BRD) algorithm and the fast-iterative shrinkage-thresholding algorithm (FISTA) method. By constructing a theoretical model, the algorithms were tested for a two- and three-site porous media, varying the exchange rates parameters, the pore sizes and the signal to noise ratio. In order to test the methods under realistic experimental conditions, a challenging organic system was chosen. The molecular exchange rates of water confined in hierarchical porous polymeric networks were obtained, for a two- and three-site porous media. Data processed with the BRD method was found to be accurate only under certain conditions of the exchange parameters, while data processed with the FISTA method is precise for all the studied parameters, except when SNR conditions are extreme. Copyright © 2017 Elsevier Inc. All rights reserved.
Structure and atomic correlations in molecular systems probed by XAS reverse Monte Carlo refinement
NASA Astrophysics Data System (ADS)
Di Cicco, Andrea; Iesari, Fabio; Trapananti, Angela; D'Angelo, Paola; Filipponi, Adriano
2018-03-01
The Reverse Monte Carlo (RMC) algorithm for structure refinement has been applied to x-ray absorption spectroscopy (XAS) multiple-edge data sets for six gas phase molecular systems (SnI2, CdI2, BBr3, GaI3, GeBr4, GeI4). Sets of thousands of molecular replicas were involved in the refinement process, driven by the XAS data and constrained by available electron diffraction results. The equilibrated configurations were analysed to determine the average tridimensional structure and obtain reliable bond and bond-angle distributions. Detectable deviations from Gaussian models were found in some cases. This work shows that a RMC refinement of XAS data is able to provide geometrical models for molecular structures compatible with present experimental evidence. The validation of this approach on simple molecular systems is particularly important in view of its possible simple extension to more complex and extended systems including metal-organic complexes, biomolecules, or nanocrystalline systems.
3D molecular models of whole HIV-1 virions generated with cellPACK
Goodsell, David S.; Autin, Ludovic; Forli, Stefano; Sanner, Michel F.; Olson, Arthur J.
2014-01-01
As knowledge of individual biological processes grows, it becomes increasingly useful to frame new findings within their larger biological contexts in order to generate new systems-scale hypotheses. This report highlights two major iterations of a whole virus model of HIV-1, generated with the cellPACK software. cellPACK integrates structural and systems biology data with packing algorithms to assemble comprehensive 3D models of cell-scale structures in molecular detail. This report describes the biological data, modeling parameters and cellPACK methods used to specify and construct editable models for HIV-1. Anticipating that cellPACK interfaces under development will enable researchers from diverse backgrounds to critique and improve the biological models, we discuss how cellPACK can be used as a framework to unify different types of data across all scales of biology. PMID:25253262
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.
Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik
2008-03-01
Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
Simulation of dense amorphous polymers by generating representative atomistic models
NASA Astrophysics Data System (ADS)
Curcó, David; Alemán, Carlos
2003-08-01
A method for generating atomistic models of dense amorphous polymers is presented. The generated models can be used as starting structures of Monte Carlo and molecular dynamics simulations, but also are suitable for the direct evaluation physical properties. The method is organized in a two-step procedure. First, structures are generated using an algorithm that minimizes the torsional strain. After this, an iterative algorithm is applied to relax the nonbonding interactions. In order to check the performance of the method we examined structure-dependent properties for three polymeric systems: polyethyelene (ρ=0.85 g/cm3), poly(L,D-lactic) acid (ρ=1.25 g/cm3), and polyglycolic acid (ρ=1.50 g/cm3). The method successfully generated representative packings for such dense systems using minimum computational resources.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
Prosperi, Mattia C F; De Luca, Andrea; Di Giambenedetto, Simona; Bracciale, Laura; Fabbiani, Massimiliano; Cauda, Roberto; Salemi, Marco
2010-10-25
Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.
Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome.
Dresch, Jacqueline M; Zellers, Rowan G; Bork, Daniel K; Drewell, Robert A
2016-01-01
A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.
Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome
Dresch, Jacqueline M.; Zellers, Rowan G.; Bork, Daniel K.; Drewell, Robert A.
2016-01-01
A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development. PMID:27330274
LiCABEDS II. Modeling of ligand selectivity for G-protein-coupled cannabinoid receptors.
Ma, Chao; Wang, Lirong; Yang, Peng; Myint, Kyaw Z; Xie, Xiang-Qun
2013-01-28
The cannabinoid receptor subtype 2 (CB2) is a promising therapeutic target for blood cancer, pain relief, osteoporosis, and immune system disease. The recent withdrawal of Rimonabant, which targets another closely related cannabinoid receptor (CB1), accentuates the importance of selectivity for the development of CB2 ligands in order to minimize their effects on the CB1 receptor. In our previous study, LiCABEDS (Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps) was reported as a generic ligand classification algorithm for the prediction of categorical molecular properties. Here, we report extension of the application of LiCABEDS to the modeling of cannabinoid ligand selectivity with molecular fingerprints as descriptors. The performance of LiCABEDS was systematically compared with another popular classification algorithm, support vector machine (SVM), according to prediction precision and recall rate. In addition, the examination of LiCABEDS models revealed the difference in structure diversity of CB1 and CB2 selective ligands. The structure determination from data mining could be useful for the design of novel cannabinoid lead compounds. More importantly, the potential of LiCABEDS was demonstrated through successful identification of newly synthesized CB2 selective compounds.
Advances in molecular quantum chemistry contained in the Q-Chem 4 program package
NASA Astrophysics Data System (ADS)
Shao, Yihan; Gan, Zhengting; Epifanovsky, Evgeny; Gilbert, Andrew T. B.; Wormit, Michael; Kussmann, Joerg; Lange, Adrian W.; Behn, Andrew; Deng, Jia; Feng, Xintian; Ghosh, Debashree; Goldey, Matthew; Horn, Paul R.; Jacobson, Leif D.; Kaliman, Ilya; Khaliullin, Rustam Z.; Kuś, Tomasz; Landau, Arie; Liu, Jie; Proynov, Emil I.; Rhee, Young Min; Richard, Ryan M.; Rohrdanz, Mary A.; Steele, Ryan P.; Sundstrom, Eric J.; Woodcock, H. Lee, III; Zimmerman, Paul M.; Zuev, Dmitry; Albrecht, Ben; Alguire, Ethan; Austin, Brian; Beran, Gregory J. O.; Bernard, Yves A.; Berquist, Eric; Brandhorst, Kai; Bravaya, Ksenia B.; Brown, Shawn T.; Casanova, David; Chang, Chun-Min; Chen, Yunqing; Chien, Siu Hung; Closser, Kristina D.; Crittenden, Deborah L.; Diedenhofen, Michael; DiStasio, Robert A., Jr.; Do, Hainam; Dutoi, Anthony D.; Edgar, Richard G.; Fatehi, Shervin; Fusti-Molnar, Laszlo; Ghysels, An; Golubeva-Zadorozhnaya, Anna; Gomes, Joseph; Hanson-Heine, Magnus W. D.; Harbach, Philipp H. P.; Hauser, Andreas W.; Hohenstein, Edward G.; Holden, Zachary C.; Jagau, Thomas-C.; Ji, Hyunjun; Kaduk, Benjamin; Khistyaev, Kirill; Kim, Jaehoon; Kim, Jihan; King, Rollin A.; Klunzinger, Phil; Kosenkov, Dmytro; Kowalczyk, Tim; Krauter, Caroline M.; Lao, Ka Un; Laurent, Adèle D.; Lawler, Keith V.; Levchenko, Sergey V.; Lin, Ching Yeh; Liu, Fenglai; Livshits, Ester; Lochan, Rohini C.; Luenser, Arne; Manohar, Prashant; Manzer, Samuel F.; Mao, Shan-Ping; Mardirossian, Narbe; Marenich, Aleksandr V.; Maurer, Simon A.; Mayhall, Nicholas J.; Neuscamman, Eric; Oana, C. Melania; Olivares-Amaya, Roberto; O'Neill, Darragh P.; Parkhill, John A.; Perrine, Trilisa M.; Peverati, Roberto; Prociuk, Alexander; Rehn, Dirk R.; Rosta, Edina; Russ, Nicholas J.; Sharada, Shaama M.; Sharma, Sandeep; Small, David W.; Sodt, Alexander; Stein, Tamar; Stück, David; Su, Yu-Chuan; Thom, Alex J. W.; Tsuchimochi, Takashi; Vanovschi, Vitalii; Vogt, Leslie; Vydrov, Oleg; Wang, Tao; Watson, Mark A.; Wenzel, Jan; White, Alec; Williams, Christopher F.; Yang, Jun; Yeganeh, Sina; Yost, Shane R.; You, Zhi-Qiang; Zhang, Igor Ying; Zhang, Xing; Zhao, Yan; Brooks, Bernard R.; Chan, Garnet K. L.; Chipman, Daniel M.; Cramer, Christopher J.; Goddard, William A., III; Gordon, Mark S.; Hehre, Warren J.; Klamt, Andreas; Schaefer, Henry F., III; Schmidt, Michael W.; Sherrill, C. David; Truhlar, Donald G.; Warshel, Arieh; Xu, Xin; Aspuru-Guzik, Alán; Baer, Roi; Bell, Alexis T.; Besley, Nicholas A.; Chai, Jeng-Da; Dreuw, Andreas; Dunietz, Barry D.; Furlani, Thomas R.; Gwaltney, Steven R.; Hsu, Chao-Ping; Jung, Yousung; Kong, Jing; Lambrecht, Daniel S.; Liang, WanZhen; Ochsenfeld, Christian; Rassolov, Vitaly A.; Slipchenko, Lyudmila V.; Subotnik, Joseph E.; Van Voorhis, Troy; Herbert, John M.; Krylov, Anna I.; Gill, Peter M. W.; Head-Gordon, Martin
2015-01-01
A summary of the technical advances that are incorporated in the fourth major release of the Q-Chem quantum chemistry program is provided, covering approximately the last seven years. These include developments in density functional theory methods and algorithms, nuclear magnetic resonance (NMR) property evaluation, coupled cluster and perturbation theories, methods for electronically excited and open-shell species, tools for treating extended environments, algorithms for walking on potential surfaces, analysis tools, energy and electron transfer modelling, parallel computing capabilities, and graphical user interfaces. In addition, a selection of example case studies that illustrate these capabilities is given. These include extensive benchmarks of the comparative accuracy of modern density functionals for bonded and non-bonded interactions, tests of attenuated second order Møller-Plesset (MP2) methods for intermolecular interactions, a variety of parallel performance benchmarks, and tests of the accuracy of implicit solvation models. Some specific chemical examples include calculations on the strongly correlated Cr2 dimer, exploring zeolite-catalysed ethane dehydrogenation, energy decomposition analysis of a charged ter-molecular complex arising from glycerol photoionisation, and natural transition orbitals for a Frenkel exciton state in a nine-unit model of a self-assembling nanotube.
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
Li, Mengshan; Zhang, Huaijing; Chen, Bingsheng; Wu, Yan; Guan, Lixin
2018-03-05
The pKa value of drugs is an important parameter in drug design and pharmacology. In this paper, an improved particle swarm optimization (PSO) algorithm was proposed based on the population entropy diversity. In the improved algorithm, when the population entropy was higher than the set maximum threshold, the convergence strategy was adopted; when the population entropy was lower than the set minimum threshold the divergence strategy was adopted; when the population entropy was between the maximum and minimum threshold, the self-adaptive adjustment strategy was maintained. The improved PSO algorithm was applied in the training of radial basis function artificial neural network (RBF ANN) model and the selection of molecular descriptors. A quantitative structure-activity relationship model based on RBF ANN trained by the improved PSO algorithm was proposed to predict the pKa values of 74 kinds of neutral and basic drugs and then validated by another database containing 20 molecules. The validation results showed that the model had a good prediction performance. The absolute average relative error, root mean square error, and squared correlation coefficient were 0.3105, 0.0411, and 0.9685, respectively. The model can be used as a reference for exploring other quantitative structure-activity relationships.
NASA Astrophysics Data System (ADS)
Yang, Qian; Sing-Long, Carlos; Chen, Enze; Reed, Evan
2017-06-01
Complex chemical processes, such as the decomposition of energetic materials and the chemistry of planetary interiors, are typically studied using large-scale molecular dynamics simulations that run for weeks on high performance parallel machines. These computations may involve thousands of atoms forming hundreds of molecular species and undergoing thousands of reactions. It is natural to wonder whether this wealth of data can be utilized to build more efficient, interpretable, and predictive models. In this talk, we will use techniques from statistical learning to develop a framework for constructing Kinetic Monte Carlo (KMC) models from molecular dynamics data. We will show that our KMC models can not only extrapolate the behavior of the chemical system by as much as an order of magnitude in time, but can also be used to study the dynamics of entirely different chemical trajectories with a high degree of fidelity. Then, we will discuss three different methods for reducing our learned KMC models, including a new and efficient data-driven algorithm using L1-regularization. We demonstrate our framework throughout on a system of high-temperature high-pressure liquid methane, thought to be a major component of gas giant planetary interiors.
Karr, Jonathan R; Williams, Alex H; Zucker, Jeremy D; Raue, Andreas; Steiert, Bernhard; Timmer, Jens; Kreutz, Clemens; Wilkinson, Simon; Allgood, Brandon A; Bot, Brian M; Hoff, Bruce R; Kellen, Michael R; Covert, Markus W; Stolovitzky, Gustavo A; Meyer, Pablo
2015-05-01
Whole-cell models that explicitly represent all cellular components at the molecular level have the potential to predict phenotype from genotype. However, even for simple bacteria, whole-cell models will contain thousands of parameters, many of which are poorly characterized or unknown. New algorithms are needed to estimate these parameters and enable researchers to build increasingly comprehensive models. We organized the Dialogue for Reverse Engineering Assessments and Methods (DREAM) 8 Whole-Cell Parameter Estimation Challenge to develop new parameter estimation algorithms for whole-cell models. We asked participants to identify a subset of parameters of a whole-cell model given the model's structure and in silico "experimental" data. Here we describe the challenge, the best performing methods, and new insights into the identifiability of whole-cell models. We also describe several valuable lessons we learned toward improving future challenges. Going forward, we believe that collaborative efforts supported by inexpensive cloud computing have the potential to solve whole-cell model parameter estimation.
NASA Astrophysics Data System (ADS)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-01
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using amore » variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.« less
Bylaska, Eric J; Weare, Jonathan Q; Weare, John H
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
An Introduction to Computational Physics - 2nd Edition
NASA Astrophysics Data System (ADS)
Pang, Tao
2006-01-01
Preface to first edition; Preface; Acknowledgements; 1. Introduction; 2. Approximation of a function; 3. Numerical calculus; 4. Ordinary differential equations; 5. Numerical methods for matrices; 6. Spectral analysis; 7. Partial differential equations; 8. Molecular dynamics simulations; 9. Modeling continuous systems; 10. Monte Carlo simulations; 11. Genetic algorithm and programming; 12. Numerical renormalization; References; Index.
NASA Astrophysics Data System (ADS)
Walter, Nathan; Zhang, Yang
Nucleation and crystal growth are understood to be activated processes involving the crossing of free-energy barriers. Attempts to capture the entire crystallization process over long timescales with molecular dynamic simulations have met major obstacles because of molecular dynamics' temporal constraints. Herein, we circumvent this temporal limitation by using a brutal-force, metadynamics-like, adaptive basin-climbing algorithm and directly sample the free-energy landscape of a model liquid Argon. The algorithm biases the system to evolve from an amorphous liquid like structure towards an FCC crystal through inherent structure, and then traces back the energy barriers. Consequently, the sampled timescale is macroscopically long. We observe that the formation of a crystal involves two processes, each with a unique temperature-dependent energy barrier. One barrier corresponds to the crystal nucleus formation; the other barrier corresponds to the crystal growth. We find the two processes dominate in different temperature regimes. Compared to other computation techniques, our method requires no assumptions about the shape or chemical potential of the critical crystal nucleus. The success of this method is encouraging for studying the crystallization of more complex
Karr, Jonathan R.; Williams, Alex H.; Zucker, Jeremy D.; Raue, Andreas; Steiert, Bernhard; Timmer, Jens; Kreutz, Clemens; Wilkinson, Simon; Allgood, Brandon A.; Bot, Brian M.; Hoff, Bruce R.; Kellen, Michael R.; Covert, Markus W.; Stolovitzky, Gustavo A.; Meyer, Pablo
2015-01-01
Whole-cell models that explicitly represent all cellular components at the molecular level have the potential to predict phenotype from genotype. However, even for simple bacteria, whole-cell models will contain thousands of parameters, many of which are poorly characterized or unknown. New algorithms are needed to estimate these parameters and enable researchers to build increasingly comprehensive models. We organized the Dialogue for Reverse Engineering Assessments and Methods (DREAM) 8 Whole-Cell Parameter Estimation Challenge to develop new parameter estimation algorithms for whole-cell models. We asked participants to identify a subset of parameters of a whole-cell model given the model’s structure and in silico “experimental” data. Here we describe the challenge, the best performing methods, and new insights into the identifiability of whole-cell models. We also describe several valuable lessons we learned toward improving future challenges. Going forward, we believe that collaborative efforts supported by inexpensive cloud computing have the potential to solve whole-cell model parameter estimation. PMID:26020786
Panda, Subhamay; Kumari, Leena
2017-01-01
Serine proteases are a group of enzymes that hydrolyses the peptide bonds in proteins. In mammals, these enzymes help in the regulation of several major physiological functions such as digestion, blood clotting, responses of immune system, reproductive functions and the complement system. Serine proteases obtained from the venom of Octopodidae family is a relatively unexplored area of research. In the present work, we tried to effectively utilize comparative composite molecular modeling technique. Our key aim was to propose the first molecular model structure of unexplored serine protease 5 derived from big blue octopus. The other objective of this study was to analyze the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis with the aid of different bioinformatic tools. In the present study, molecular model has been generated with the help of I-TASSER suite. Afterwards the refined structural model was validated with standard methods. For functional annotation of protein molecule we used Protein Information Resource (PIR) database. Serine protease 5 of big blue octopus was analyzed with different bioinformatical algorithms for the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis. The functionally critical amino acids and ligand- binding site (LBS) of the proteins (modeled) were determined using the COACH program. The molecular model data in cooperation to other pertinent post model analysis data put forward molecular insight to proteolytic activity of serine protease 5, which helps in the clear understanding of procoagulant and anticoagulant characteristics of this natural lead molecule. Our approach was to investigate the octopus venom protein as a whole or a part of their structure that may result in the development of new lead molecule. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Kirkilionis, Markus; Janus, Ulrich; Sbano, Luca
2011-09-01
We model in detail a simple synthetic genetic clock that was engineered in Atkinson et al. (Cell 113(5):597-607, 2003) using Escherichia coli as a host organism. Based on this engineered clock its theoretical description uses the modelling framework presented in Kirkilionis et al. (Theory Biosci. doi: 10.1007/s12064-011-0125-0 , 2011, this volume). The main goal of this accompanying article was to illustrate that parts of the modelling process can be algorithmically automatised once the model framework we called 'average dynamics' is accepted (Sbano and Kirkilionis, WMI Preprint 7/2007, 2008c; Kirkilionis and Sbano, Adv Complex Syst 13(3):293-326, 2010). The advantage of the 'average dynamics' framework is that system components (especially in genetics) can be easier represented in the model. In particular, if once discovered and characterised, specific molecular players together with their function can be incorporated. This means that, for example, the 'gene' concept becomes more clear, for example, in the way the genetic component would react under different regulatory conditions. Using the framework it has become a realistic aim to link mathematical modelling to novel tools of bioinformatics in the future, at least if the number of regulatory units can be estimated. This should hold in any case in synthetic environments due to the fact that the different synthetic genetic components are simply known (Elowitz and Leibler, Nature 403(6767):335-338, 2000; Gardner et al., Nature 403(6767):339-342, 2000; Hasty et al., Nature 420(6912):224-230, 2002). The paper illustrates therefore as a necessary first step how a detailed modelling of molecular interactions with known molecular components leads to a dynamic mathematical model that can be compared to experimental results on various levels or scales. The different genetic modules or components are represented in different detail by model variants. We explain how the framework can be used for investigating other more complex genetic systems in terms of regulation and feedback.
Polarizable Molecular Dynamics in a Polarizable Continuum Solvent
Lipparini, Filippo; Lagardère, Louis; Raynaud, Christophe; Stamm, Benjamin; Cancès, Eric; Mennucci, Benedetta; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2015-01-01
We present for the first time scalable polarizable molecular dynamics (MD) simulations within a polarizable continuum solvent with molecular shape cavities and exact solution of the mutual polarization. The key ingredients are a very efficient algorithm for solving the equations associated with the polarizable continuum, in particular, the domain decomposition Conductor-like Screening Model (ddCOSMO), a rigorous coupling of the continuum with the polarizable force field achieved through a robust variational formulation and an effective strategy to solve the coupled equations. The coupling of ddCOSMO with non variational force fields, including AMOEBA, is also addressed. The MD simulations are feasible, for real life systems, on standard cluster nodes; a scalable parallel implementation allows for further speed up in the context of a newly developed module in Tinker, named Tinker-HP. NVE simulations are stable and long term energy conservation can be achieved. This paper is focused on the methodological developments, on the analysis of the algorithm and on the stability of the simulations; a proof-of-concept application is also presented to attest the possibilities of this newly developed technique. PMID:26516318
Diffusion in liquid Germanium using ab initio molecular dynamics
NASA Astrophysics Data System (ADS)
Kulkarni, R. V.; Aulbur, W. G.; Stroud, D.
1996-03-01
We describe the results of calculations of the self-diffusion constant of liquid Ge over a range of temperatures. The calculations are carried out using an ab initio molecular dynamics scheme which combines an LDA model for the electronic structure with the Bachelet-Hamann-Schlüter norm-conserving pseudopotentials^1. The energies associated with electronic degrees of freedom are minimized using the Williams-Soler algorithm, and ionic moves are carried out using the Verlet algorithm. We use an energy cutoff of 10 Ry, which is sufficient to give results for the lattice constant and bulk modulus of crystalline Ge to within 1% and 12% of experiment. The program output includes not only the self-diffusion constant but also the structure factor, electronic density of states, and low-frequency electrical conductivity. We will compare our results with other ab initio and semi-empirical calculations, and discuss extension to impurity diffusion. ^1 We use the ab initio molecular dynamics code fhi94md, developed at 1cm the Fritz-Haber Institute, Berlin. ^2 Work supported by NASA, Grant NAG3-1437.
Rajendiran, Nivedita; Durrant, Jacob D
2018-05-05
Molecular dynamics (MD) simulations provide critical insights into many biological mechanisms. Programs such as VMD, Chimera, and PyMOL can produce impressive simulation visualizations, but they lack many advanced rendering algorithms common in the film and video-game industries. In contrast, the modeling program Blender includes such algorithms but cannot import MD-simulation data. MD trajectories often require many gigabytes of memory/disk space, complicating Blender import. We present Pyrite, a Blender plugin that overcomes these limitations. Pyrite allows researchers to visualize MD simulations within Blender, with full access to Blender's cutting-edge rendering techniques. We expect Pyrite-generated images to appeal to students and non-specialists alike. A copy of the plugin is available at http://durrantlab.com/pyrite/, released under the terms of the GNU General Public License Version 3. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Analyzing gene expression time-courses based on multi-resolution shape mixture model.
Li, Ying; He, Ye; Zhang, Yu
2016-11-01
Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.
Nasiri, Jaber; Naghavi, Mohammad Reza; Kayvanjoo, Amir Hossein; Nasiri, Mojtaba; Ebrahimi, Mansour
2015-03-07
For the first time, prediction accuracies of some supervised and unsupervised algorithms were evaluated in an SSR-based DNA fingerprinting study of a pea collection containing 20 cultivars and 57 wild samples. In general, according to the 10 attribute weighting models, the SSR alleles of PEAPHTAP-2 and PSBLOX13.2-1 were the two most important attributes to generate discrimination among eight different species and subspecies of genus Pisum. In addition, K-Medoids unsupervised clustering run on Chi squared dataset exhibited the best prediction accuracy (83.12%), while the lowest accuracy (25.97%) gained as K-Means model ran on FCdb database. Irrespective of some fluctuations, the overall accuracies of tree induction models were significantly high for many algorithms, and the attributes PSBLOX13.2-3 and PEAPHTAP could successfully detach Pisum fulvum accessions and cultivars from the others when two selected decision trees were taken into account. Meanwhile, the other used supervised algorithms exhibited overall reliable accuracies, even though in some rare cases, they gave us low amounts of accuracies. Our results, altogether, demonstrate promising applications of both supervised and unsupervised algorithms to provide suitable data mining tools regarding accurate fingerprinting of different species and subspecies of genus Pisum, as a fundamental priority task in breeding programs of the crop. Copyright © 2015 Elsevier Ltd. All rights reserved.
Zhang, Baofeng; Kilburg, Denise; Eastman, Peter; Pande, Vijay S; Gallicchio, Emilio
2017-04-15
We present an algorithm to efficiently compute accurate volumes and surface areas of macromolecules on graphical processing unit (GPU) devices using an analytic model which represents atomic volumes by continuous Gaussian densities. The volume of the molecule is expressed by means of the inclusion-exclusion formula, which is based on the summation of overlap integrals among multiple atomic densities. The surface area of the molecule is obtained by differentiation of the molecular volume with respect to atomic radii. The many-body nature of the model makes a port to GPU devices challenging. To our knowledge, this is the first reported full implementation of this model on GPU hardware. To accomplish this, we have used recursive strategies to construct the tree of overlaps and to accumulate volumes and their gradients on the tree data structures so as to minimize memory contention. The algorithm is used in the formulation of a surface area-based non-polar implicit solvent model implemented as an open source plug-in (named GaussVol) for the popular OpenMM library for molecular mechanics modeling. GaussVol is 50 to 100 times faster than our best optimized implementation for the CPUs, achieving speeds in excess of 100 ns/day with 1 fs time-step for protein-sized systems on commodity GPUs. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
MULTIGRAIN: a smoothed particle hydrodynamic algorithm for multiple small dust grains and gas
NASA Astrophysics Data System (ADS)
Hutchison, Mark; Price, Daniel J.; Laibe, Guillaume
2018-05-01
We present a new algorithm, MULTIGRAIN, for modelling the dynamics of an entire population of small dust grains immersed in gas, typical of conditions that are found in molecular clouds and protoplanetary discs. The MULTIGRAIN method is more accurate than single-phase simulations because the gas experiences a backreaction from each dust phase and communicates this change to the other phases, thereby indirectly coupling the dust phases together. The MULTIGRAIN method is fast, explicit and low storage, requiring only an array of dust fractions and their derivatives defined for each resolution element.
Mean field analysis of algorithms for scale-free networks in molecular biology
2017-01-01
The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabási-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabási-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely P(k) ∼ k−γ, where γ is a scaling exponent. We derive mean field expressions for γ, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabási-Albert and Solé networks). PMID:29272285
Mean field analysis of algorithms for scale-free networks in molecular biology.
Konini, S; Janse van Rensburg, E J
2017-01-01
The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabási-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabási-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely P(k) ∼ k-γ, where γ is a scaling exponent. We derive mean field expressions for γ, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabási-Albert and Solé networks).
Mori, Yoshiharu; Okumura, Hisashi
2015-12-05
Simulated tempering (ST) is a useful method to enhance sampling of molecular simulations. When ST is used, the Metropolis algorithm, which satisfies the detailed balance condition, is usually applied to calculate the transition probability. Recently, an alternative method that satisfies the global balance condition instead of the detailed balance condition has been proposed by Suwa and Todo. In this study, ST method with the Suwa-Todo algorithm is proposed. Molecular dynamics simulations with ST are performed with three algorithms (the Metropolis, heat bath, and Suwa-Todo algorithms) to calculate the transition probability. Among the three algorithms, the Suwa-Todo algorithm yields the highest acceptance ratio and the shortest autocorrelation time. These suggest that sampling by a ST simulation with the Suwa-Todo algorithm is most efficient. In addition, because the acceptance ratio of the Suwa-Todo algorithm is higher than that of the Metropolis algorithm, the number of temperature states can be reduced by 25% for the Suwa-Todo algorithm when compared with the Metropolis algorithm. © 2015 Wiley Periodicals, Inc.
Macro and micro analysis of small molecule diffusion in amorphous polymers
NASA Astrophysics Data System (ADS)
Putta, Santosh Krishna
In this study, both macroscopic and microscopic numerical techniques have been explored, to model and understand the diffusion behavior of small molecules in amorphous polymers, which very often do not follow the classical Fickian law. It was attempted to understand the influence of various aspects of the molecular structure of a polymer on its macroscopic diffusion behavior. At the macroscopic level, a hybrid finite-element/finite-difference model is developed to implement the coupled diffusion and deformation constitutive equations. A viscoelasticity theory, combined with time-freevolume superposition is used to model the deformation processes. A freevolume-based model is used to model the diffusion processes. The freevolume in the polymer is used as a coupling factor between the deformation and the diffusion processes. The model is shown to qualitatively describe some of the typical non-Fickian diffusion behavior in polymers. However, it does not directly involve the microstructure of a polymer. Further, some of the input parameters to the model are difficult to obtain experimentally. A numerical microscopic approach is therefore adopted to study the molecular structure of polymers. A molecular mechanics and dynamics technique combined with a modified Rotational Isomeric State (RIS) approach, is followed to generate the molecular structure for two types of polycarbonates, and, two types of polyacrylates, starting only with their chemical structures. A new efficient 3-D algorithm for Delaunay Tessellation is developed, and, then applied to discretize the molecular structure into Delaunay Tetrahedra. By using the dicretized molecular structure, size, shape, and, connectivity of free-spaces for small molecule diffusion in the above mentioned polymers, are then studied in relation to their diffusion properties. The influence of polymer and side chain flexibility, and diffusant-diffusant and diffusant-polymer molecular interactions, is also discussed with respect to the diffusion properties.
Widlak, Piotr; Mrukwa, Grzegorz; Kalinowska, Magdalena; Pietrowska, Monika; Chekan, Mykola; Wierzgon, Janusz; Gawin, Marta; Drazek, Grzegorz; Polanska, Joanna
2016-06-01
Intra-tumor heterogeneity is a vivid problem of molecular oncology that could be addressed by imaging mass spectrometry. Here we aimed to assess molecular heterogeneity of oral squamous cell carcinoma and to detect signatures discriminating normal and cancerous epithelium. Tryptic peptides were analyzed by MALDI-IMS in tissue specimens from five patients with oral cancer. Novel algorithm of IMS data analysis was developed and implemented, which included Gaussian mixture modeling for detection of spectral components and iterative k-means algorithm for unsupervised spectra clustering performed in domain reduced to a subset of the most dispersed components. About 4% of the detected peptides showed significantly different abundances between normal epithelium and tumor, and could be considered as a molecular signature of oral cancer. Moreover, unsupervised clustering revealed two major sub-regions within expert-defined tumor areas. One of them showed molecular similarity with histologically normal epithelium. The other one showed similarity with connective tissue, yet was markedly different from normal epithelium. Pathologist's re-inspection of tissue specimens confirmed distinct features in both tumor sub-regions: foci of actual cancer cells or cancer microenvironment-related cells prevailed in corresponding areas. Hence, molecular differences detected during automated segmentation of IMS data had an apparent reflection in real structures present in tumor. © 2016 The Authors. Proteomics Published by Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Steiner, Florian; Poelking, Carl; Niedzialek, Dorota; Andrienko, Denis; Nelson, Jenny
2017-05-03
We present a multi-scale model for charge transport across grain boundaries in molecular electronic materials that incorporates packing disorder, electrostatic and polarisation effects. We choose quasi two-dimensional films of tri-isopropylsilylethynyl pentacene (TIPS-P) as a model system representative of technologically relevant crystalline organic semiconductors. We use atomistic molecular dynamics, with a force-field specific for TIPS-P, to generate and equilibrate polycrystalline two-dimensional thin films. The energy landscape is obtained by calculating contributions from electrostatic interactions and polarization. The variation in these contributions leads to energetic barriers between grains. Subsequently, charge transport is simulated using a kinetic Monte-Carlo algorithm. Two-grain systems with varied mutual orientation are studied. We find relatively little effect of long grain boundaries due to the presence of low impedance pathways. However, effects could be more pronounced for systems with limited inter-grain contact areas. Furthermore, we present a lattice model to generalize the model for small molecular systems. In the general case, depending on molecular architecture and packing, grain boundaries can result in interfacial energy barriers, traps or a combination of both with qualitatively different effects on charge transport.
D'Onofrio, David J; Abel, David L; Johnson, Donald E
2012-03-14
The fields of molecular biology and computer science have cooperated over recent years to create a synergy between the cybernetic and biosemiotic relationship found in cellular genomics to that of information and language found in computational systems. Biological information frequently manifests its "meaning" through instruction or actual production of formal bio-function. Such information is called prescriptive information (PI). PI programs organize and execute a prescribed set of choices. Closer examination of this term in cellular systems has led to a dichotomy in its definition suggesting both prescribed data and prescribed algorithms are constituents of PI. This paper looks at this dichotomy as expressed in both the genetic code and in the central dogma of protein synthesis. An example of a genetic algorithm is modeled after the ribosome, and an examination of the protein synthesis process is used to differentiate PI data from PI algorithms.
Computational Workbench for Multibody Dynamics
NASA Technical Reports Server (NTRS)
Edmonds, Karina
2007-01-01
PyCraft is a computer program that provides an interactive, workbenchlike computing environment for developing and testing algorithms for multibody dynamics. Examples of multibody dynamic systems amenable to analysis with the help of PyCraft include land vehicles, spacecraft, robots, and molecular models. PyCraft is based on the Spatial-Operator- Algebra (SOA) formulation for multibody dynamics. The SOA operators enable construction of simple and compact representations of complex multibody dynamical equations. Within the Py-Craft computational workbench, users can, essentially, use the high-level SOA operator notation to represent the variety of dynamical quantities and algorithms and to perform computations interactively. PyCraft provides a Python-language interface to underlying C++ code. Working with SOA concepts, a user can create and manipulate Python-level operator classes in order to implement and evaluate new dynamical quantities and algorithms. During use of PyCraft, virtually all SOA-based algorithms are available for computational experiments.
Pteros: fast and easy to use open-source C++ library for molecular analysis.
Yesylevskyy, Semen O
2012-07-15
An open-source Pteros library for molecular modeling and analysis of molecular dynamics trajectories for C++ programming language is introduced. Pteros provides a number of routine analysis operations ranging from reading and writing trajectory files and geometry transformations to structural alignment and computation of nonbonded interaction energies. The library features asynchronous trajectory reading and parallel execution of several analysis routines, which greatly simplifies development of computationally intensive trajectory analysis algorithms. Pteros programming interface is very simple and intuitive while the source code is well documented and easily extendible. Pteros is available for free under open-source Artistic License from http://sourceforge.net/projects/pteros/. Copyright © 2012 Wiley Periodicals, Inc.
`Inter-Arrival Time' Inspired Algorithm and its Application in Clustering and Molecular Phylogeny
NASA Astrophysics Data System (ADS)
Kolekar, Pandurang S.; Kale, Mohan M.; Kulkarni-Kale, Urmila
2010-10-01
Bioinformatics, being multidisciplinary field, involves applications of various methods from allied areas of Science for data mining using computational approaches. Clustering and molecular phylogeny is one of the key areas in Bioinformatics, which help in study of classification and evolution of organisms. Molecular phylogeny algorithms can be divided into distance based and character based methods. But most of these methods are dependent on pre-alignment of sequences and become computationally intensive with increase in size of data and hence demand alternative efficient approaches. `Inter arrival time distribution' (IATD) is a popular concept in the theory of stochastic system modeling but its potential in molecular data analysis has not been fully explored. The present study reports application of IATD in Bioinformatics for clustering and molecular phylogeny. The proposed method provides IATDs of nucleotides in genomic sequences. The distance function based on statistical parameters of IATDs is proposed and distance matrix thus obtained is used for the purpose of clustering and molecular phylogeny. The method is applied on a dataset of 3' non-coding region sequences (NCR) of Dengue virus type 3 (DENV-3), subtype III, reported in 2008. The phylogram thus obtained revealed the geographical distribution of DENV-3 isolates. Sri Lankan DENV-3 isolates were further observed to be clustered in two sub-clades corresponding to pre and post Dengue hemorrhagic fever emergence groups. These results are consistent with those reported earlier, which are obtained using pre-aligned sequence data as an input. These findings encourage applications of the IATD based method in molecular phylogenetic analysis in particular and data mining in general.
Optimizing legacy molecular dynamics software with directive-based offload
NASA Astrophysics Data System (ADS)
Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.
2015-10-01
Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.
Noureldine, Salem I; Najafian, Alireza; Aragon Han, Patricia; Olson, Matthew T; Genther, Dane J; Schneider, Eric B; Prescott, Jason D; Agrawal, Nishant; Mathur, Aarti; Zeiger, Martha A; Tufano, Ralph P
2016-07-01
Diagnostic molecular testing is used in the workup of thyroid nodules. While these tests appear to be promising in more definitively assigning a risk of malignancy, their effect on surgical decision making has yet to be demonstrated. To investigate the effect of diagnostic molecular profiling of thyroid nodules on the surgical decision-making process. A surgical management algorithm was developed and published after peer review that incorporated individual Bethesda System for Reporting Thyroid Cytopathology classifications with clinical, laboratory, and radiological results. This algorithm was created to formalize the decision-making process selected herein in managing patients with thyroid nodules. Between April 1, 2014, and March 31, 2015, a prospective study of patients who had undergone diagnostic molecular testing of a thyroid nodule before being seen for surgical consultation was performed. The recommended management undertaken by the surgeon was then prospectively compared with the corresponding one in the algorithm. Patients with thyroid nodules who did not undergo molecular testing and were seen for surgical consultation during the same period served as a control group. All pertinent treatment options were presented to each patient, and any deviation from the algorithm was recorded prospectively. To evaluate the appropriateness of any change (deviation) in management, the surgical histopathology diagnosis was correlated with the surgery performed. The study cohort comprised 140 patients who underwent molecular testing. Their mean (SD) age was 50.3 (14.6) years, and 75.0% (105 of 140) were female. Over a 1-year period, 20.3% (140 of 688) had undergone diagnostic molecular testing before surgical consultation, and 79.7% (548 of 688) had not undergone molecular testing. The surgical management deviated from the treatment algorithm in 12.9% (18 of 140) with molecular testing and in 10.2% (56 of 548) without molecular testing (P = .37). In the group with molecular testing, the surgical management plan of only 7.9% (11 of 140) was altered as a result of the molecular test. All but 1 of those patients were found to be overtreated relative to the surgical histopathology analysis. Molecular testing did not significantly affect the surgical decision-making process in this study. Among patients whose treatment was altered based on these markers, there was evidence of overtreatment.
Kros, Johan M; Huizer, Karin; Hernández-Laín, Aurelio; Marucci, Gianluca; Michotte, Alex; Pollo, Bianca; Rushing, Elisabeth J; Ribalta, Teresa; French, Pim; Jaminé, David; Bekka, Nawal; Lacombe, Denis; van den Bent, Martin J; Gorlia, Thierry
2015-06-10
With the rapid discovery of prognostic and predictive molecular parameters for glioma, the status of histopathology in the diagnostic process should be scrutinized. Our project aimed to construct a diagnostic algorithm for gliomas based on molecular and histologic parameters with independent prognostic values. The pathology slides of 636 patients with gliomas who had been included in EORTC 26951 and 26882 trials were reviewed using virtual microscopy by a panel of six neuropathologists who independently scored 18 histologic features and provided an overall diagnosis. The molecular data for IDH1, 1p/19q loss, EGFR amplification, loss of chromosome 10 and chromosome arm 10q, gain of chromosome 7, and hypermethylation of the promoter of MGMT were available for some of the cases. The slides were divided in discovery (n = 426) and validation sets (n = 210). The diagnostic algorithm resulting from analysis of the discovery set was validated in the latter. In 66% of cases, consensus of overall diagnosis was present. A diagnostic algorithm consisting of two molecular markers and one consensus histologic feature was created by conditional inference tree analysis. The order of prognostic significance was: 1p/19q loss, EGFR amplification, and astrocytic morphology, which resulted in the identification of four diagnostic nodes. Validation of the nodes in the validation set confirmed the prognostic value (P < .001). We succeeded in the creation of a timely diagnostic algorithm for anaplastic glioma based on multivariable analysis of consensus histopathology and molecular parameters. © 2015 by American Society of Clinical Oncology.
NASA Astrophysics Data System (ADS)
Liang, Wenkel
This dissertation consists of two general parts: (I) developments of optimization algorithms (both nuclear and electronic degrees of freedom) for time-independent molecules and (II) novel methods, first-principle theories and applications in time dependent molecular structure modeling. In the first part, we discuss in specific two new algorithms for static geometry optimization, the eigenspace update (ESU) method in nonredundant internal coordinate that exhibits an enhanced performace with up to a factor of 3 savings in computational cost for large-sized molecular systems; the Car-Parrinello density matrix search (CP-DMS) method that enables direct minimization of the SCF energy as an effective alternative to conventional diagonalization approach. For the second part, we consider the time dependence and first presents two nonadiabatic dynamic studies that model laser controlled molecular photo-dissociation for qualitative understandings of intense laser-molecule interaction, using ab initio direct Ehrenfest dynamics scheme implemented with real-time time-dependent density functional theory (RT-TDDFT) approach developed in our group. Furthermore, we place our special interest on the nonadiabatic electronic dynamics in the ultrafast time scale, and presents (1) a novel technique that can not only obtain energies but also the electron densities of doubly excited states within a single determinant framework, by combining methods of CP-DMS with RT-TDDFT; (2) a solvated first-principles electronic dynamics method by incorporating the polarizable continuum solvation model (PCM) to RT-TDDFT, which is found to be very effective in describing the dynamical solvation effect in the charge transfer process and yields a consistent absorption spectrum in comparison to the conventional linear response results in solution. (3) applications of the PCM-RT-TDDFT method to study the intramolecular charge-transfer (CT) dynamics in a C60 derivative. Such work provides insights into the characteristics of ultrafast dynamics in photoexcited fullerene derivatives, and aids in the rational design for pre-dissociative exciton in the intramolecular CT process in organic solar cells.
Xie, Huiding; Chen, Lijun; Zhang, Jianqiang; Xie, Xiaoguang; Qiu, Kaixiong; Fu, Jijun
2015-01-01
B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q2 = 0.621, r2pred = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained. PMID:26035757
Xie, Huiding; Chen, Lijun; Zhang, Jianqiang; Xie, Xiaoguang; Qiu, Kaixiong; Fu, Jijun
2015-05-29
B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q(2) = 0.621, r(2)(pred) = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained.
Multi-scale genetic dynamic modelling I : an algorithm to compute generators.
Kirkilionis, Markus; Janus, Ulrich; Sbano, Luca
2011-09-01
We present a new approach or framework to model dynamic regulatory genetic activity. The framework is using a multi-scale analysis based upon generic assumptions on the relative time scales attached to the different transitions of molecular states defining the genetic system. At micro-level such systems are regulated by the interaction of two kinds of molecular players: macro-molecules like DNA or polymerases, and smaller molecules acting as transcription factors. The proposed genetic model then represents the larger less abundant molecules with a finite discrete state space, for example describing different conformations of these molecules. This is in contrast to the representations of the transcription factors which are-like in classical reaction kinetics-represented by their particle number only. We illustrate the method by considering the genetic activity associated to certain configurations of interacting genes that are fundamental to modelling (synthetic) genetic clocks. A largely unknown question is how different molecular details incorporated via this more realistic modelling approach lead to different macroscopic regulatory genetic models which dynamical behaviour might-in general-be different for different model choices. The theory will be applied to a real synthetic clock in a second accompanying article (Kirkilioniset al., Theory Biosci, 2011).
Fast parallel molecular algorithms for DNA-based computation: factoring integers.
Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui
2005-06-01
The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
NASA Astrophysics Data System (ADS)
Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.
2016-09-01
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Network-based machine learning and graph theory algorithms for precision oncology.
Zhang, Wei; Chien, Jeremy; Yong, Jeongsik; Kuang, Rui
2017-01-01
Network-based analytics plays an increasingly important role in precision oncology. Growing evidence in recent studies suggests that cancer can be better understood through mutated or dysregulated pathways or networks rather than individual mutations and that the efficacy of repositioned drugs can be inferred from disease modules in molecular networks. This article reviews network-based machine learning and graph theory algorithms for integrative analysis of personal genomic data and biomedical knowledge bases to identify tumor-specific molecular mechanisms, candidate targets and repositioned drugs for personalized treatment. The review focuses on the algorithmic design and mathematical formulation of these methods to facilitate applications and implementations of network-based analysis in the practice of precision oncology. We review the methods applied in three scenarios to integrate genomic data and network models in different analysis pipelines, and we examine three categories of network-based approaches for repositioning drugs in drug-disease-gene networks. In addition, we perform a comprehensive subnetwork/pathway analysis of mutations in 31 cancer genome projects in the Cancer Genome Atlas and present a detailed case study on ovarian cancer. Finally, we discuss interesting observations, potential pitfalls and future directions in network-based precision oncology.
Fast Dating Using Least-Squares Criteria and Algorithms.
To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier
2016-01-01
Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley-Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley-Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Fast Dating Using Least-Squares Criteria and Algorithms
To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier
2016-01-01
Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. PMID:26424727
A fast reconstruction algorithm for fluorescence optical diffusion tomography based on preiteration.
Song, Xiaolei; Xiong, Xiaoyun; Bai, Jing
2007-01-01
Fluorescence optical diffusion tomography in the near-infrared (NIR) bandwidth is considered to be one of the most promising ways for noninvasive molecular-based imaging. Many reconstructive approaches to it utilize iterative methods for data inversion. However, they are time-consuming and they are far from meeting the real-time imaging demands. In this work, a fast preiteration algorithm based on the generalized inverse matrix is proposed. This method needs only one step of matrix-vector multiplication online, by pushing the iteration process to be executed offline. In the preiteration process, the second-order iterative format is employed to exponentially accelerate the convergence. Simulations based on an analytical diffusion model show that the distribution of fluorescent yield can be well estimated by this algorithm and the reconstructed speed is remarkably increased.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov; Weare, Jonathan Q., E-mail: weare@uchicago.edu; Weare, John H., E-mail: jweare@ucsd.edu
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for themore » trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.« less
Adaptive selection and validation of models of complex systems in the presence of uncertainty
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farrell-Maupin, Kathryn; Oden, J. T.
This study describes versions of OPAL, the Occam-Plausibility Algorithm in which the use of Bayesian model plausibilities is replaced with information theoretic methods, such as the Akaike Information Criterion and the Bayes Information Criterion. Applications to complex systems of coarse-grained molecular models approximating atomistic models of polyethylene materials are described. All of these model selection methods take into account uncertainties in the model, the observational data, the model parameters, and the predicted quantities of interest. A comparison of the models chosen by Bayesian model selection criteria and those chosen by the information-theoretic criteria is given.
Adaptive selection and validation of models of complex systems in the presence of uncertainty
Farrell-Maupin, Kathryn; Oden, J. T.
2017-08-01
This study describes versions of OPAL, the Occam-Plausibility Algorithm in which the use of Bayesian model plausibilities is replaced with information theoretic methods, such as the Akaike Information Criterion and the Bayes Information Criterion. Applications to complex systems of coarse-grained molecular models approximating atomistic models of polyethylene materials are described. All of these model selection methods take into account uncertainties in the model, the observational data, the model parameters, and the predicted quantities of interest. A comparison of the models chosen by Bayesian model selection criteria and those chosen by the information-theoretic criteria is given.
Quantum Dynamics in Continuum for Proton Transport I: Basic Formulation.
Chen, Duan; Wei, Guo-Wei
2013-01-01
Proton transport is one of the most important and interesting phenomena in living cells. The present work proposes a multiscale/multiphysics model for the understanding of the molecular mechanism of proton transport in transmembrane proteins. We describe proton dynamics quantum mechanically via a density functional approach while implicitly model other solvent ions as a dielectric continuum to reduce the number of degrees of freedom. The densities of all other ions in the solvent are assumed to obey the Boltzmann distribution. The impact of protein molecular structure and its charge polarization on the proton transport is considered explicitly at the atomic level. We formulate a total free energy functional to put proton kinetic and potential energies as well as electrostatic energy of all ions on an equal footing. The variational principle is employed to derive nonlinear governing equations for the proton transport system. Generalized Poisson-Boltzmann equation and Kohn-Sham equation are obtained from the variational framework. Theoretical formulations for the proton density and proton conductance are constructed based on fundamental principles. The molecular surface of the channel protein is utilized to split the discrete protein domain and the continuum solvent domain, and facilitate the multiscale discrete/continuum/quantum descriptions. A number of mathematical algorithms, including the Dirichlet to Neumann mapping, matched interface and boundary method, Gummel iteration, and Krylov space techniques are utilized to implement the proposed model in a computationally efficient manner. The Gramicidin A (GA) channel is used to demonstrate the performance of the proposed proton transport model and validate the efficiency of proposed mathematical algorithms. The electrostatic characteristics of the GA channel is analyzed with a wide range of model parameters. The proton conductances are studied over a number of applied voltages and reference concentrations. A comparison with experimental data verifies the present model predictions and validates the proposed model.
The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.
Tang, Zaixiang; Shen, Yueping; Zhang, Xinyan; Yi, Nengjun
2017-01-01
Large-scale "omics" data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, there are considerable challenges in analyzing high-dimensional molecular data, including the large number of potential molecular predictors, limited number of samples, and small effect of each predictor. We propose new Bayesian hierarchical generalized linear models, called spike-and-slab lasso GLMs, for prognostic prediction and detection of associated genes using large-scale molecular data. The proposed model employs a spike-and-slab mixture double-exponential prior for coefficients that can induce weak shrinkage on large coefficients, and strong shrinkage on irrelevant coefficients. We have developed a fast and stable algorithm to fit large-scale hierarchal GLMs by incorporating expectation-maximization (EM) steps into the fast cyclic coordinate descent algorithm. The proposed approach integrates nice features of two popular methods, i.e., penalized lasso and Bayesian spike-and-slab variable selection. The performance of the proposed method is assessed via extensive simulation studies. The results show that the proposed approach can provide not only more accurate estimates of the parameters, but also better prediction. We demonstrate the proposed procedure on two cancer data sets: a well-known breast cancer data set consisting of 295 tumors, and expression data of 4919 genes; and the ovarian cancer data set from TCGA with 362 tumors, and expression data of 5336 genes. Our analyses show that the proposed procedure can generate powerful models for predicting outcomes and detecting associated genes. The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Copyright © 2017 by the Genetics Society of America.
Oscillatory regulation of Hes1: Discrete stochastic delay modelling and simulation.
Barrio, Manuel; Burrage, Kevin; Leier, André; Tian, Tianhai
2006-09-08
Discrete stochastic simulations are a powerful tool for understanding the dynamics of chemical kinetics when there are small-to-moderate numbers of certain molecular species. In this paper we introduce delays into the stochastic simulation algorithm, thus mimicking delays associated with transcription and translation. We then show that this process may well explain more faithfully than continuous deterministic models the observed sustained oscillations in expression levels of hes1 mRNA and Hes1 protein.
Hybrid deterministic/stochastic simulation of complex biochemical systems.
Lecca, Paola; Bagagiolo, Fabio; Scarpa, Marina
2017-11-21
In a biological cell, cellular functions and the genetic regulatory apparatus are implemented and controlled by complex networks of chemical reactions involving genes, proteins, and enzymes. Accurate computational models are indispensable means for understanding the mechanisms behind the evolution of a complex system, not always explored with wet lab experiments. To serve their purpose, computational models, however, should be able to describe and simulate the complexity of a biological system in many of its aspects. Moreover, it should be implemented by efficient algorithms requiring the shortest possible execution time, to avoid enlarging excessively the time elapsing between data analysis and any subsequent experiment. Besides the features of their topological structure, the complexity of biological networks also refers to their dynamics, that is often non-linear and stiff. The stiffness is due to the presence of molecular species whose abundance fluctuates by many orders of magnitude. A fully stochastic simulation of a stiff system is computationally time-expensive. On the other hand, continuous models are less costly, but they fail to capture the stochastic behaviour of small populations of molecular species. We introduce a new efficient hybrid stochastic-deterministic computational model and the software tool MoBioS (MOlecular Biology Simulator) implementing it. The mathematical model of MoBioS uses continuous differential equations to describe the deterministic reactions and a Gillespie-like algorithm to describe the stochastic ones. Unlike the majority of current hybrid methods, the MoBioS algorithm divides the reactions' set into fast reactions, moderate reactions, and slow reactions and implements a hysteresis switching between the stochastic model and the deterministic model. Fast reactions are approximated as continuous-deterministic processes and modelled by deterministic rate equations. Moderate reactions are those whose reaction waiting time is greater than the fast reaction waiting time but smaller than the slow reaction waiting time. A moderate reaction is approximated as a stochastic (deterministic) process if it was classified as a stochastic (deterministic) process at the time at which it crosses the threshold of low (high) waiting time. A Gillespie First Reaction Method is implemented to select and execute the slow reactions. The performances of MoBios were tested on a typical example of hybrid dynamics: that is the DNA transcription regulation. The simulated dynamic profile of the reagents' abundance and the estimate of the error introduced by the fully deterministic approach were used to evaluate the consistency of the computational model and that of the software tool.
Simulated quantum computation of molecular energies.
Aspuru-Guzik, Alán; Dutoi, Anthony D; Love, Peter J; Head-Gordon, Martin
2005-09-09
The calculation time for the energy of atoms and molecules scales exponentially with system size on a classical computer but polynomially using quantum algorithms. We demonstrate that such algorithms can be applied to problems of chemical interest using modest numbers of quantum bits. Calculations of the water and lithium hydride molecular ground-state energies have been carried out on a quantum computer simulator using a recursive phase-estimation algorithm. The recursive algorithm reduces the number of quantum bits required for the readout register from about 20 to 4. Mappings of the molecular wave function to the quantum bits are described. An adiabatic method for the preparation of a good approximate ground-state wave function is described and demonstrated for a stretched hydrogen molecule. The number of quantum bits required scales linearly with the number of basis functions, and the number of gates required grows polynomially with the number of quantum bits.
Single-pass incremental force updates for adaptively restrained molecular dynamics.
Singh, Krishna Kant; Redon, Stephane
2018-03-30
Adaptively restrained molecular dynamics (ARMD) allows users to perform more integration steps in wall-clock time by switching on and off positional degrees of freedoms. This article presents new, single-pass incremental force updates algorithms to efficiently simulate a system using ARMD. We assessed different algorithms for speedup measurements and implemented them in the LAMMPS MD package. We validated the single-pass incremental force update algorithm on four different benchmarks using diverse pair potentials. The proposed algorithm allows us to perform simulation of a system faster than traditional MD in both NVE and NVT ensembles. Moreover, ARMD using the new single-pass algorithm speeds up the convergence of observables in wall-clock time. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Molecular dynamics simulations on networks of heparin and collagen.
Kulke, Martin; Geist, Norman; Friedrichs, Wenke; Langel, Walter
2017-06-01
Synthetic scaffolds containing collagen (Type I) are of increasing interest for bone tissue engineering, especially for highly porous biomaterials in combination with glycosaminoglycans. In experiments the integration of heparin during the fibrillogenesis resulted in different types of collagen fibrils, but models for this aggregation on a molecular scale were only tentative. We conducted molecular dynamic simulations investigating the binding of heparin to collagen and the influence of the telopeptides during collagen aggregation. This aims at explaining experimental findings on a molecular level. Novel structures for N- and C-telopeptides were developed with the TIGER2 replica exchange algorithm and dihedral principle component analysis. We present an extended statistical analysis of the mainly electrostatic interaction between heparin and collagen and identify several binding sites. Finally, we propose a molecular mechanism for the influence of glycosaminoglycans on the morphology of collagen fibrils. Proteins 2017; 85:1119-1130. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Cho-Vega, Jeong Hee
2016-07-01
Atypical spitzoid tumors are a morphologically diverse group of rare melanocytic lesions most frequently seen in children and young adults. As atypical spitzoid tumors bear striking resemblance to Spitz nevus and spitzoid melanomas clinically and histopathologically, it is crucial to determine its malignant potential and predict its clinical behavior. To date, many researchers have attempted to differentiate atypical spitzoid tumors from unequivocal melanomas based on morphological, immonohistochemical, and molecular diagnostic differences. A diagnostic algorithm is proposed here to assess the malignant potential of atypical spitzoid tumors by using a combination of immunohistochemical and cytogenetic/molecular tests. Together with classical morphological evaluation, this algorithm includes a set of immunohistochemistry assays (p16(Ink4a), a dual-color Ki67/MART-1, and HMB45), fluorescence in situ hybridization (FISH) with five probes (6p25, 8q24, 11q13, CEN9, and 9p21), and an array-based comparative genomic hybridization. This review discusses details of the algorithm, the rationale of each test used in the algorithm, and utility of this algorithm in routine dermatopathology practice. This algorithmic approach will provide a comprehensive diagnostic tool that complements conventional histological criteria and will significantly contribute to improve the diagnosis and prediction of the clinical behavior of atypical spitzoid tumors.
Chattopadhyay, Aditya; Zheng, Min; Waller, Mark Paul; Priyakumar, U Deva
2018-05-23
Knowledge of the structure and dynamics of biomolecules is essential for elucidating the underlying mechanisms of biological processes. Given the stochastic nature of many biological processes, like protein unfolding, it's almost impossible that two independent simulations will generate the exact same sequence of events, which makes direct analysis of simulations difficult. Statistical models like Markov Chains, transition networks etc. help in shedding some light on the mechanistic nature of such processes by predicting long-time dynamics of these systems from short simulations. However, such methods fall short in analyzing trajectories with partial or no temporal information, for example, replica exchange molecular dynamics or Monte Carlo simulations. In this work we propose a probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data. A suitable vector representation was chosen to represent each frame in the macromolecular trajectory (as a series of interaction and conformational energies) and dimensionality reduction was performed using principal component analysis (PCA). The trajectory was then clustered using a density-based clustering algorithm, where each cluster represents a metastable state on the potential energy surface (PES) of the biomolecule under study. A graph was created with these clusters as nodes with the edges learnt using an iterative expectation maximization algorithm. The most reactive path is conceived as the widest path along this graph. We have tested our method on RNA hairpin unfolding trajectory in aqueous urea solution. Our method makes the understanding of the mechanism of unfolding in RNA hairpin molecule more tractable. As this method doesn't rely on temporal data it can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).
Modeling and simulation of the debonding process of composite solid propellants
NASA Astrophysics Data System (ADS)
Feng, Tao; Xu, Jin-sheng; Han, Long; Chen, Xiong
2017-07-01
In order to study the damage evolution law of composite solid propellants, the molecular dynamics particle filled algorithm was used to establish the mesoscopic structure model of HTPB(Hydroxyl-terminated polybutadiene) propellants. The cohesive element method was employed for the adhesion interface between AP(Ammonium perchlorate) particle and HTPB matrix and the bilinear cohesive zone model was used to describe the mechanical response of the interface elements. The inversion analysis method based on Hooke-Jeeves optimization algorithm was employed to identify the parameters of cohesive zone model(CZM) of the particle/binder interface. Then, the optimized parameters were applied to the commercial finite element software ABAQUS to simulate the damage evolution process for AP particle and HTPB matrix, including the initiation, development, gathering and macroscopic crack. Finally, the stress-strain simulation curve was compared with the experiment curves. The result shows that the bilinear cohesive zone model can accurately describe the debonding and fracture process between the AP particles and HTPB matrix under the uniaxial tension loading.
Camacho, Carlos J
2005-08-01
The CAPRI-II experiment added an extra level of complexity to the problem of predicting protein-protein interactions by including 5 targets for which participants had to build or complete the 3-dimensional (3D) structure of either the receptor or ligand based on the structure of a close homolog. In this article, we describe how modeling key side-chains using molecular dynamics (MD) in explicit solvent improved the recognition of the binding region of a free energy- based computational docking method. In particular, we show that MD is able to predict with relatively high accuracy the rotamer conformation of the anchor side-chains important for molecular recognition as suggested by Rajamani et al. (Proc Natl Acad Sci USA 2004;101:11287-11292). As expected, the conformations are some of the most common rotamers for the given residue, while latch side-chains that undergo induced fit upon binding are forced into less common conformations. Using these models as starting conformations in conjunction with the rigid-body docking server ClusPro and the flexible docking algorithm SmoothDock, we produced valuable predictions for 6 of the 9 targets in CAPRI-II, missing only the 3 targets that underwent significant structural rearrangements upon binding. We also show that our free energy- based scoring function, consisting of the sum of van der Waals, Coulombic electrostatic with a distance-dependent dielectric, and desolvation free energy successfully discriminates the nativelike conformation of our submitted predictions. The latter emphasizes the critical role that thermodynamics plays on our methodology, and validates the generality of the algorithm to predict protein interactions.
Engelhardt, Benjamin; Kschischo, Maik; Fröhlich, Holger
2017-06-01
Ordinary differential equations (ODEs) are a popular approach to quantitatively model molecular networks based on biological knowledge. However, such knowledge is typically restricted. Wrongly modelled biological mechanisms as well as relevant external influence factors that are not included into the model are likely to manifest in major discrepancies between model predictions and experimental data. Finding the exact reasons for such observed discrepancies can be quite challenging in practice. In order to address this issue, we suggest a Bayesian approach to estimate hidden influences in ODE-based models. The method can distinguish between exogenous and endogenous hidden influences. Thus, we can detect wrongly specified as well as missed molecular interactions in the model. We demonstrate the performance of our Bayesian dynamic elastic-net with several ordinary differential equation models from the literature, such as human JAK-STAT signalling, information processing at the erythropoietin receptor, isomerization of liquid α -Pinene, G protein cycling in yeast and UV-B triggered signalling in plants. Moreover, we investigate a set of commonly known network motifs and a gene-regulatory network. Altogether our method supports the modeller in an algorithmic manner to identify possible sources of errors in ODE-based models on the basis of experimental data. © 2017 The Author(s).
Visualizing functional motions of membrane transporters with molecular dynamics simulations.
Shaikh, Saher A; Li, Jing; Enkavi, Giray; Wen, Po-Chao; Huang, Zhijian; Tajkhorshid, Emad
2013-01-29
Computational modeling and molecular simulation techniques have become an integral part of modern molecular research. Various areas of molecular sciences continue to benefit from, indeed rely on, the unparalleled spatial and temporal resolutions offered by these technologies, to provide a more complete picture of the molecular problems at hand. Because of the continuous development of more efficient algorithms harvesting ever-expanding computational resources, and the emergence of more advanced and novel theories and methodologies, the scope of computational studies has expanded significantly over the past decade, now including much larger molecular systems and far more complex molecular phenomena. Among the various computer modeling techniques, the application of molecular dynamics (MD) simulation and related techniques has particularly drawn attention in biomolecular research, because of the ability of the method to describe the dynamical nature of the molecular systems and thereby to provide a more realistic representation, which is often needed for understanding fundamental molecular properties. The method has proven to be remarkably successful in capturing molecular events and structural transitions highly relevant to the function and/or physicochemical properties of biomolecular systems. Herein, after a brief introduction to the method of MD, we use a number of membrane transport proteins studied in our laboratory as examples to showcase the scope and applicability of the method and its power in characterizing molecular motions of various magnitudes and time scales that are involved in the function of this important class of membrane proteins.
Visualizing Functional Motions of Membrane Transporters with Molecular Dynamics Simulations
2013-01-01
Computational modeling and molecular simulation techniques have become an integral part of modern molecular research. Various areas of molecular sciences continue to benefit from, indeed rely on, the unparalleled spatial and temporal resolutions offered by these technologies, to provide a more complete picture of the molecular problems at hand. Because of the continuous development of more efficient algorithms harvesting ever-expanding computational resources, and the emergence of more advanced and novel theories and methodologies, the scope of computational studies has expanded significantly over the past decade, now including much larger molecular systems and far more complex molecular phenomena. Among the various computer modeling techniques, the application of molecular dynamics (MD) simulation and related techniques has particularly drawn attention in biomolecular research, because of the ability of the method to describe the dynamical nature of the molecular systems and thereby to provide a more realistic representation, which is often needed for understanding fundamental molecular properties. The method has proven to be remarkably successful in capturing molecular events and structural transitions highly relevant to the function and/or physicochemical properties of biomolecular systems. Herein, after a brief introduction to the method of MD, we use a number of membrane transport proteins studied in our laboratory as examples to showcase the scope and applicability of the method and its power in characterizing molecular motions of various magnitudes and time scales that are involved in the function of this important class of membrane proteins. PMID:23298176
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.
Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong
2016-01-01
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
NASA Astrophysics Data System (ADS)
Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.
2017-08-01
Molecular dynamics simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules, but they are limited by the time scale barrier. That is, we may not obtain properties' efficiently because we need to run microseconds or longer simulations using femtosecond time steps. To overcome this time scale barrier, we can use the weighted ensemble (WE) method, a powerful enhanced sampling method that efficiently samples thermodynamic and kinetic properties. However, the WE method requires an appropriate partitioning of phase space into discrete macrostates, which can be problematic when we have a high-dimensional collective space or when little is known a priori about the molecular system. Hence, we developed a new WE-based method, called the "Concurrent Adaptive Sampling (CAS) algorithm," to tackle these issues. The CAS algorithm is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective variables and adaptive macrostates to enhance the sampling in the high-dimensional space. This is especially useful for systems in which we do not know what the right reaction coordinates are, in which case we can use many collective variables to sample conformations and pathways. In addition, a clustering technique based on the committor function is used to accelerate sampling the slowest process in the molecular system. In this paper, we introduce the new method and show results from two-dimensional models and bio-molecules, specifically penta-alanine and a triazine trimer.
Good, Andrew C; Hermsmeier, Mark A
2007-01-01
Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.
Quantum autoencoders for efficient compression of quantum data
NASA Astrophysics Data System (ADS)
Romero, Jonathan; Olson, Jonathan P.; Aspuru-Guzik, Alan
2017-12-01
Classical autoencoders are neural networks that can learn efficient low-dimensional representations of data in higher-dimensional space. The task of an autoencoder is, given an input x, to map x to a lower dimensional point y such that x can likely be recovered from y. The structure of the underlying autoencoder network can be chosen to represent the data on a smaller dimension, effectively compressing the input. Inspired by this idea, we introduce the model of a quantum autoencoder to perform similar tasks on quantum data. The quantum autoencoder is trained to compress a particular data set of quantum states, where a classical compression algorithm cannot be employed. The parameters of the quantum autoencoder are trained using classical optimization algorithms. We show an example of a simple programmable circuit that can be trained as an efficient autoencoder. We apply our model in the context of quantum simulation to compress ground states of the Hubbard model and molecular Hamiltonians.
Pérez-Garrido, Alfonso; Morales Helguera, Aliuska; Abellán Guillén, Adela; Cordeiro, M Natália D S; Garrido Escudero, Amalio
2009-01-15
This paper reports a QSAR study for predicting the complexation of a large and heterogeneous variety of substances (233 organic compounds) with beta-cyclodextrins (beta-CDs). Several different theoretical molecular descriptors, calculated solely from the molecular structure of the compounds under investigation, and an efficient variable selection procedure, like the Genetic Algorithm, led to models with satisfactory global accuracy and predictivity. But the best-final QSAR model is based on Topological descriptors meanwhile offering a reasonable interpretation. This QSAR model was able to explain ca. 84% of the variance in the experimental activity, and displayed very good internal cross-validation statistics and predictivity on external data. It shows that the driving forces for CD complexation are mainly hydrophobic and steric (van der Waals) interactions. Thus, the results of our study provide a valuable tool for future screening and priority testing of beta-CDs guest molecules.
NASA Astrophysics Data System (ADS)
Meng, Luming; Sheong, Fu Kit; Zeng, Xiangze; Zhu, Lizhe; Huang, Xuhui
2017-07-01
Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
An infrastructure to mine molecular descriptors for ligand selection on virtual screening.
Seus, Vinicius Rosa; Perazzo, Giovanni Xavier; Winck, Ana T; Werhli, Adriano V; Machado, Karina S
2014-01-01
The receptor-ligand interaction evaluation is one important step in rational drug design. The databases that provide the structures of the ligands are growing on a daily basis. This makes it impossible to test all the ligands for a target receptor. Hence, a ligand selection before testing the ligands is needed. One possible approach is to evaluate a set of molecular descriptors. With the aim of describing the characteristics of promising compounds for a specific receptor we introduce a data warehouse-based infrastructure to mine molecular descriptors for virtual screening (VS). We performed experiments that consider as target the receptor HIV-1 protease and different compounds for this protein. A set of 9 molecular descriptors are taken as the predictive attributes and the free energy of binding is taken as a target attribute. By applying the J48 algorithm over the data we obtain decision tree models that achieved up to 84% of accuracy. The models indicate which molecular descriptors and their respective values are relevant to influence good FEB results. Using their rules we performed ligand selection on ZINC database. Our results show important reduction in ligands selection to be applied in VS experiments; for instance, the best selection model picked only 0.21% of the total amount of drug-like ligands.
On-the-fly Numerical Surface Integration for Finite-Difference Poisson-Boltzmann Methods.
Cai, Qin; Ye, Xiang; Wang, Jun; Luo, Ray
2011-11-01
Most implicit solvation models require the definition of a molecular surface as the interface that separates the solute in atomic detail from the solvent approximated as a continuous medium. Commonly used surface definitions include the solvent accessible surface (SAS), the solvent excluded surface (SES), and the van der Waals surface. In this study, we present an efficient numerical algorithm to compute the SES and SAS areas to facilitate the applications of finite-difference Poisson-Boltzmann methods in biomolecular simulations. Different from previous numerical approaches, our algorithm is physics-inspired and intimately coupled to the finite-difference Poisson-Boltzmann methods to fully take advantage of its existing data structures. Our analysis shows that the algorithm can achieve very good agreement with the analytical method in the calculation of the SES and SAS areas. Specifically, in our comprehensive test of 1,555 molecules, the average unsigned relative error is 0.27% in the SES area calculations and 1.05% in the SAS area calculations at the grid spacing of 1/2Å. In addition, a systematic correction analysis can be used to improve the accuracy for the coarse-grid SES area calculations, with the average unsigned relative error in the SES areas reduced to 0.13%. These validation studies indicate that the proposed algorithm can be applied to biomolecules over a broad range of sizes and structures. Finally, the numerical algorithm can also be adapted to evaluate the surface integral of either a vector field or a scalar field defined on the molecular surface for additional solvation energetics and force calculations.
Krishnaraj, R Navanietha; Chandran, Saravanan; Pal, Parimal; Berchmans, Sheela
2013-12-01
There is an immense interest among the researchers to identify new herbicides which are effective against the herbs without affecting the environment. In this work, photosynthetic pigments are used as the ligands to predict their herbicidal activity. The enzyme 5-enolpyruvylshikimate-3-phosphate (EPSP) synthase is a good target for the herbicides. Homology modeling of the target enzyme is done using Modeler 9.11 and the model is validated. Docking studies were performed with AutoDock Vina algorithm to predict the binding of the natural pigments such as β-carotene, chlorophyll a, chlorophyll b, phycoerythrin and phycocyanin to the target. β-carotene, phycoerythrin and phycocyanin have higher binding energies indicating the herbicidal activity of the pigments. This work reports a procedure to screen herbicides with computational molecular approach. These pigments will serve as potential bioherbicides in the future.
A generalization of algebraic surface drawing
NASA Technical Reports Server (NTRS)
Blinn, J. F.
1982-01-01
An implicit surface mathematical description of three-dimensional space is defined in terms of all points which satisfy some equation F(x, y, z) equals 0. This form is ideal for space-shaded picture drawing, where the coordinates are substituted for x and y and the equation is solved for z. A new algorithm is presented which is applicable to functional forms other than those of first- and second-order polynomial functions, such as the summation of several Gaussian density distributions. The algorithm was created in order to model electron density maps of molecular structures, but is shown to be capable of generating shapes of esthetic interest.
Coarse-grained molecular dynamics simulations of polymerization with forward and backward reactions.
Krajniak, Jakub; Zhang, Zidan; Pandiyan, Sudharsan; Nies, Eric; Samaey, Giovanni
2018-06-11
We develop novel parallel algorithms that allow molecular dynamics simulations in which byproduct molecules are created and removed because of the chemical reactions during the molecular dynamics simulation. To prevent large increases in the potential energy, we introduce the byproduct molecules smoothly by changing the non-bonded interactions gradually. To simulate complete equilibrium reactions, we allow the byproduct molecules attack and destroy created bonds. Modeling of such reactions are, for instance, important to study the pore formation due to the presence of e.g. water molecules or development of polymer morphology during the process of splitting off byproduct molecules. Another concept that could be studied is the degradation of polymeric materials, a very important topic in a recycling of polymer waste. We illustrate the method by simulating the polymerization of polyethylene terephthalate (PET) at the coarse-grained level as an example of a polycondensation reaction with water as a byproduct. The algorithms are implemented in a publicly available software package and are easily accessible using a domain-specific language that describes chemical reactions in an input configuration file. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Drawert, Brian; Engblom, Stefan; Hellander, Andreas
2012-06-22
Experiments in silico using stochastic reaction-diffusion models have emerged as an important tool in molecular systems biology. Designing computational software for such applications poses several challenges. Firstly, realistic lattice-based modeling for biological applications requires a consistent way of handling complex geometries, including curved inner- and outer boundaries. Secondly, spatiotemporal stochastic simulations are computationally expensive due to the fast time scales of individual reaction- and diffusion events when compared to the biological phenomena of actual interest. We therefore argue that simulation software needs to be both computationally efficient, employing sophisticated algorithms, yet in the same time flexible in order to meet present and future needs of increasingly complex biological modeling. We have developed URDME, a flexible software framework for general stochastic reaction-transport modeling and simulation. URDME uses Unstructured triangular and tetrahedral meshes to resolve general geometries, and relies on the Reaction-Diffusion Master Equation formalism to model the processes under study. An interface to a mature geometry and mesh handling external software (Comsol Multiphysics) provides for a stable and interactive environment for model construction. The core simulation routines are logically separated from the model building interface and written in a low-level language for computational efficiency. The connection to the geometry handling software is realized via a Matlab interface which facilitates script computing, data management, and post-processing. For practitioners, the software therefore behaves much as an interactive Matlab toolbox. At the same time, it is possible to modify and extend URDME with newly developed simulation routines. Since the overall design effectively hides the complexity of managing the geometry and meshes, this means that newly developed methods may be tested in a realistic setting already at an early stage of development. In this paper we demonstrate, in a series of examples with high relevance to the molecular systems biology community, that the proposed software framework is a useful tool for both practitioners and developers of spatial stochastic simulation algorithms. Through the combined efforts of algorithm development and improved modeling accuracy, increasingly complex biological models become feasible to study through computational methods. URDME is freely available at http://www.urdme.org.
Gruendling, Till; Guilhaus, Michael; Barner-Kowollik, Christopher
2008-09-15
We report on the successful application of size exclusion chromatography (SEC) combined with electrospray ionization mass spectrometry (ESI-MS) and refractive index (RI) detection for the determination of accurate molecular weight distributions of synthetic polymers, corrected for chromatographic band broadening. The presented method makes use of the ability of ESI-MS to accurately depict the peak profiles and retention volumes of individual oligomers eluting from the SEC column, whereas quantitative information on the absolute concentration of oligomers is obtained from the RI-detector only. A sophisticated computational algorithm based on the maximum entropy principle is used to process the data gained by both detectors, yielding an accurate molecular weight distribution, corrected for chromatographic band broadening. Poly(methyl methacrylate) standards with molecular weights up to 10 kDa serve as model compounds. Molecular weight distributions (MWDs) obtained by the maximum entropy procedure are compared to MWDs, which were calculated by a conventional calibration of the SEC-retention time axis with peak retention data obtained from the mass spectrometer. Comparison showed that for the employed chromatographic system, distributions below 7 kDa were only weakly influenced by chromatographic band broadening. However, the maximum entropy algorithm could successfully correct the MWD of a 10 kDa standard for band broadening effects. Molecular weight averages were between 5 and 14% lower than the manufacturer stated data obtained by classical means of calibration. The presented method demonstrates a consistent approach for analyzing data obtained by coupling mass spectrometric detectors and concentration sensitive detectors to polymer liquid chromatography.
Mixed QM/MM molecular electrostatic potentials.
Hernández, B; Luque, F J; Orozco, M
2000-05-01
A new method is presented for the calculation of the Molecular Electrostatic Potential (MEP) in large systems. Based on the mixed Quantum Mechanics/Molecular Mechanics (QM/MM) approach, the method assumes both a quantum and classical description for the molecule, and the calculation of the MEP in the space surrounding the molecule is made using this dual treatment. The MEP at points close to the molecule is computed using a full QM formalism, while a pure classical evaluation of the MEP is used for points located at large distances from the molecule. The algorithm allows the user to select the desired level of accuracy in the MEP, so that the definition of the regions where the MEP is computed at the classical or QM levels is adjusted automatically. The potential use of this QM/MM MEP in molecular modeling studies is discussed.
Molecular Phylogenetics: Concepts for a Newcomer.
Ajawatanawong, Pravech
Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.
Hierarchical graphs for rule-based modeling of biochemical systems
2011-01-01
Background In rule-based modeling, graphs are used to represent molecules: a colored vertex represents a component of a molecule, a vertex attribute represents the internal state of a component, and an edge represents a bond between components. Components of a molecule share the same color. Furthermore, graph-rewriting rules are used to represent molecular interactions. A rule that specifies addition (removal) of an edge represents a class of association (dissociation) reactions, and a rule that specifies a change of a vertex attribute represents a class of reactions that affect the internal state of a molecular component. A set of rules comprises an executable model that can be used to determine, through various means, the system-level dynamics of molecular interactions in a biochemical system. Results For purposes of model annotation, we propose the use of hierarchical graphs to represent structural relationships among components and subcomponents of molecules. We illustrate how hierarchical graphs can be used to naturally document the structural organization of the functional components and subcomponents of two proteins: the protein tyrosine kinase Lck and the T cell receptor (TCR) complex. We also show that computational methods developed for regular graphs can be applied to hierarchical graphs. In particular, we describe a generalization of Nauty, a graph isomorphism and canonical labeling algorithm. The generalized version of the Nauty procedure, which we call HNauty, can be used to assign canonical labels to hierarchical graphs or more generally to graphs with multiple edge types. The difference between the Nauty and HNauty procedures is minor, but for completeness, we provide an explanation of the entire HNauty algorithm. Conclusions Hierarchical graphs provide more intuitive formal representations of proteins and other structured molecules with multiple functional components than do the regular graphs of current languages for specifying rule-based models, such as the BioNetGen language (BNGL). Thus, the proposed use of hierarchical graphs should promote clarity and better understanding of rule-based models. PMID:21288338
A continuous stochastic model for non-equilibrium dense gases
NASA Astrophysics Data System (ADS)
Sadr, M.; Gorji, M. H.
2017-12-01
While accurate simulations of dense gas flows far from the equilibrium can be achieved by direct simulation adapted to the Enskog equation, the significant computational demand required for collisions appears as a major constraint. In order to cope with that, an efficient yet accurate solution algorithm based on the Fokker-Planck approximation of the Enskog equation is devised in this paper; the approximation is very much associated with the Fokker-Planck model derived from the Boltzmann equation by Jenny et al. ["A solution algorithm for the fluid dynamic equations based on a stochastic model for molecular motion," J. Comput. Phys. 229, 1077-1098 (2010)] and Gorji et al. ["Fokker-Planck model for computational studies of monatomic rarefied gas flows," J. Fluid Mech. 680, 574-601 (2011)]. The idea behind these Fokker-Planck descriptions is to project the dynamics of discrete collisions implied by the molecular encounters into a set of continuous Markovian processes subject to the drift and diffusion. Thereby, the evolution of particles representing the governing stochastic process becomes independent from each other and thus very efficient numerical schemes can be constructed. By close inspection of the Enskog operator, it is observed that the dense gas effects contribute further to the advection of molecular quantities. That motivates a modelling approach where the dense gas corrections can be cast in the extra advection of particles. Therefore, the corresponding Fokker-Planck approximation is derived such that the evolution in the physical space accounts for the dense effects present in the pressure, stress tensor, and heat fluxes. Hence the consistency between the devised Fokker-Planck approximation and the Enskog operator is shown for the velocity moments up to the heat fluxes. For validation studies, a homogeneous gas inside a box besides Fourier, Couette, and lid-driven cavity flow setups is considered. The results based on the Fokker-Planck model are compared with respect to benchmark simulations, where good agreement is found for the flow field along with the transport properties.
Recent progress and future directions in protein-protein docking.
Ritchie, David W
2008-02-01
This article gives an overview of recent progress in protein-protein docking and it identifies several directions for future research. Recent results from the CAPRI blind docking experiments show that docking algorithms are steadily improving in both reliability and accuracy. Current docking algorithms employ a range of efficient search and scoring strategies, including e.g. fast Fourier transform correlations, geometric hashing, and Monte Carlo techniques. These approaches can often produce a relatively small list of up to a few thousand orientations, amongst which a near-native binding mode is often observed. However, despite the use of improved scoring functions which typically include models of desolvation, hydrophobicity, and electrostatics, current algorithms still have difficulty in identifying the correct solution from the list of false positives, or decoys. Nonetheless, significant progress is being made through better use of bioinformatics, biochemical, and biophysical information such as e.g. sequence conservation analysis, protein interaction databases, alanine scanning, and NMR residual dipolar coupling restraints to help identify key binding residues. Promising new approaches to incorporate models of protein flexibility during docking are being developed, including the use of molecular dynamics snapshots, rotameric and off-rotamer searches, internal coordinate mechanics, and principal component analysis based techniques. Some investigators now use explicit solvent models in their docking protocols. Many of these approaches can be computationally intensive, although new silicon chip technologies such as programmable graphics processor units are beginning to offer competitive alternatives to conventional high performance computer systems. As cryo-EM techniques improve apace, docking NMR and X-ray protein structures into low resolution EM density maps is helping to bridge the resolution gap between these complementary techniques. The use of symmetry and fragment assembly constraints are also helping to make possible docking-based predictions of large multimeric protein complexes. In the near future, the closer integration of docking algorithms with protein interface prediction software, structural databases, and sequence analysis techniques should help produce better predictions of protein interaction networks and more accurate structural models of the fundamental molecular interactions within the cell.
Bertaccini, Edward J.; Yoluk, Ozge; Lindahl, Erik R.; Trudell, James R.
2013-01-01
Background Anesthetics mediate portions of their activity via modulation of the γ-aminobutyric acid receptor (GABAaR). While its molecular structure remains unknown, significant progress has been made towards understanding its interactions with anesthetics via molecular modeling. Methods The structure of the torpedo acetylcholine receptor (nAChRα), the structures of the α4 and β2 subunits of the human nAChR, the structures of the eukaryotic glutamate-gated chloride channel (GluCl), and the prokaryotic pH sensing channels, from Gloeobacter violaceus and Erwinia chrysanthemi, were aligned with the SAlign and 3DMA algorithms. A multiple sequence alignment from these structures and those of the GABAaR was performed with ClustalW. The Modeler and Rosetta algorithms independently created three-dimensional constructs of the GABAaR from the GluCl template. The CDocker algorithm docked a congeneric series of propofol derivatives into the binding pocket and scored calculated binding affinities for correlation with known GABAaR potentiation EC50’s. Results Multiple structure alignments of templates revealed a clear consensus of residue locations relevant to anesthetic effects except for torpedo nAChR. Within the GABAaR models generated from GluCl, the residues notable for modulating anesthetic action within transmembrane segments 1, 2, and 3 converged on the intersubunit interface between alpha and beta subunits. Docking scores of a propofol derivative series into this binding site showed strong linear correlation with GABAaR potentiation EC50. Conclusion Consensus structural alignment based on homologous templates revealed an intersubunit anesthetic binding cavity within the transmembrane domain of the GABAaR, which showed correlation of ligand docking scores with experimentally measured GABAaR potentiation. PMID:23770602
Bertaccini, Edward J; Yoluk, Ozge; Lindahl, Erik R; Trudell, James R
2013-11-01
Anesthetics mediate portions of their activity via modulation of the γ-aminobutyric acid receptor (GABAaR). Although its molecular structure remains unknown, significant progress has been made toward understanding its interactions with anesthetics via molecular modeling. The structure of the torpedo acetylcholine receptor (nAChRα), the structures of the α4 and β2 subunits of the human nAChR, the structures of the eukaryotic glutamate-gated chloride channel (GluCl), and the prokaryotic pH-sensing channels, from Gloeobacter violaceus and Erwinia chrysanthemi, were aligned with the SAlign and 3DMA algorithms. A multiple sequence alignment from these structures and those of the GABAaR was performed with ClustalW. The Modeler and Rosetta algorithms independently created three-dimensional constructs of the GABAaR from the GluCl template. The CDocker algorithm docked a congeneric series of propofol derivatives into the binding pocket and scored calculated binding affinities for correlation with known GABAaR potentiation EC50s. Multiple structure alignments of templates revealed a clear consensus of residue locations relevant to anesthetic effects except for torpedo nAChR. Within the GABAaR models generated from GluCl, the residues notable for modulating anesthetic action within transmembrane segments 1, 2, and 3 converged on the intersubunit interface between α and β subunits. Docking scores of a propofol derivative series into this binding site showed strong linear correlation with GABAaR potentiation EC50. Consensus structural alignment based on homologous templates revealed an intersubunit anesthetic binding cavity within the transmembrane domain of the GABAaR, which showed a correlation of ligand docking scores with experimentally measured GABAaR potentiation.
Oscillatory Regulation of Hes1: Discrete Stochastic Delay Modelling and Simulation
Barrio, Manuel; Burrage, Kevin; Leier, André; Tian, Tianhai
2006-01-01
Discrete stochastic simulations are a powerful tool for understanding the dynamics of chemical kinetics when there are small-to-moderate numbers of certain molecular species. In this paper we introduce delays into the stochastic simulation algorithm, thus mimicking delays associated with transcription and translation. We then show that this process may well explain more faithfully than continuous deterministic models the observed sustained oscillations in expression levels of hes1 mRNA and Hes1 protein. PMID:16965175
Xu, Dong; Zhang, Jian; Roy, Ambrish; Zhang, Yang
2011-01-01
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline. PMID:22069036
Jacob, Alexandre; Pratuangdejkul, Jaturong; Buffet, Sébastien; Launay, Jean-Marie; Manivet, Philippe
2009-04-01
We have broken old surviving dogmas and concepts used in computational chemistry and created an efficient in silico ADME-T pharmacological properties modeling and prediction toolbox for any xenobiotic. With the help of an innovative and pragmatic approach combining various in silico techniques, like molecular modeling, quantum chemistry and in-house developed algorithms, the interactions between drugs and those enzymes, transporters and receptors involved in their biotransformation can be studied. ADME-T pharmacological parameters can then be predicted after in vitro and in vivo validations of in silico models.
Kazachenko, Sergey; Giovinazzo, Mark; Hall, Kyle Wm; Cann, Natalie M
2015-09-15
A custom code for molecular dynamics simulations has been designed to run on CUDA-enabled NVIDIA graphics processing units (GPUs). The double-precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse-grained and atomistic models, holonomic constraints, Nosé-Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard-Jones and Gay-Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n-hexane/2-propanol mixture; and a liquid crystal mesogen, 2-(4-butyloxyphenyl)-5-octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33-119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69-287 fold improvement and three GPUs yield a 101-377 fold speedup. © 2015 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Arias, E.; Florez, E.; Pérez-Torres, J. F.
2017-06-01
A new algorithm for the determination of equilibrium structures suitable for metal nanoclusters is proposed. The algorithm performs a stochastic search of the minima associated with the nuclear potential energy function restricted to a sphere (similar to the Thomson problem), in order to guess configurations of the nuclear positions. Subsequently, the guessed configurations are further optimized driven by the total energy function using the conventional gradient descent method. This methodology is equivalent to using the valence shell electron pair repulsion model in guessing initial configurations in the traditional molecular quantum chemistry. The framework is illustrated in several clusters of increasing complexity: Cu7, Cu9, and Cu11 as benchmark systems, and Cu38 and Ni9 as novel systems. New equilibrium structures for Cu9, Cu11, Cu38, and Ni9 are reported.
Zhou, Ruhong
2004-05-01
A highly parallel replica exchange method (REM) that couples with a newly developed molecular dynamics algorithm particle-particle particle-mesh Ewald (P3ME)/RESPA has been proposed for efficient sampling of protein folding free energy landscape. The algorithm is then applied to two separate protein systems, beta-hairpin and a designed protein Trp-cage. The all-atom OPLSAA force field with an explicit solvent model is used for both protein folding simulations. Up to 64 replicas of solvated protein systems are simulated in parallel over a wide range of temperatures. The combined trajectories in temperature and configurational space allow a replica to overcome free energy barriers present at low temperatures. These large scale simulations reveal detailed results on folding mechanisms, intermediate state structures, thermodynamic properties and the temperature dependences for both protein systems.
Arias, E; Florez, E; Pérez-Torres, J F
2017-06-28
A new algorithm for the determination of equilibrium structures suitable for metal nanoclusters is proposed. The algorithm performs a stochastic search of the minima associated with the nuclear potential energy function restricted to a sphere (similar to the Thomson problem), in order to guess configurations of the nuclear positions. Subsequently, the guessed configurations are further optimized driven by the total energy function using the conventional gradient descent method. This methodology is equivalent to using the valence shell electron pair repulsion model in guessing initial configurations in the traditional molecular quantum chemistry. The framework is illustrated in several clusters of increasing complexity: Cu 7 , Cu 9 , and Cu 11 as benchmark systems, and Cu 38 and Ni 9 as novel systems. New equilibrium structures for Cu 9 , Cu 11 , Cu 38 , and Ni 9 are reported.
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.
Algorithms of GPU-enabled reactive force field (ReaxFF) molecular dynamics.
Zheng, Mo; Li, Xiaoxia; Guo, Li
2013-04-01
Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10-50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.'s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps' C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations. Copyright © 2013 Elsevier Inc. All rights reserved.
Li, Hongzhi; Zhong, Ziyan; Li, Lin; Gao, Rui; Cui, Jingxia; Gao, Ting; Hu, Li Hong; Lu, Yinghua; Su, Zhong-Min; Li, Hui
2015-05-30
A cascaded model is proposed to establish the quantitative structure-activity relationship (QSAR) between the overall power conversion efficiency (PCE) and quantum chemical molecular descriptors of all-organic dye sensitizers. The cascaded model is a two-level network in which the outputs of the first level (JSC, VOC, and FF) are the inputs of the second level, and the ultimate end-point is the overall PCE of dye-sensitized solar cells (DSSCs). The model combines quantum chemical methods and machine learning methods, further including quantum chemical calculations, data division, feature selection, regression, and validation steps. To improve the efficiency of the model and reduce the redundancy and noise of the molecular descriptors, six feature selection methods (multiple linear regression, genetic algorithms, mean impact value, forward selection, backward elimination, and +n-m algorithm) are used with the support vector machine. The best established cascaded model predicts the PCE values of DSSCs with a MAE of 0.57 (%), which is about 10% of the mean value PCE (5.62%). The validation parameters according to the OECD principles are R(2) (0.75), Q(2) (0.77), and Qcv2 (0.76), which demonstrate the great goodness-of-fit, predictivity, and robustness of the model. Additionally, the applicability domain of the cascaded QSAR model is defined for further application. This study demonstrates that the established cascaded model is able to effectively predict the PCE for organic dye sensitizers with very low cost and relatively high accuracy, providing a useful tool for the design of dye sensitizers with high PCE. © 2015 Wiley Periodicals, Inc.
An algorithm for automated layout of process description maps drawn in SBGN.
Genc, Begum; Dogrusoz, Ugur
2016-01-01
Evolving technology has increased the focus on genomics. The combination of today's advanced techniques with decades of molecular biology research has yielded huge amounts of pathway data. A standard, named the Systems Biology Graphical Notation (SBGN), was recently introduced to allow scientists to represent biological pathways in an unambiguous, easy-to-understand and efficient manner. Although there are a number of automated layout algorithms for various types of biological networks, currently none specialize on process description (PD) maps as defined by SBGN. We propose a new automated layout algorithm for PD maps drawn in SBGN. Our algorithm is based on a force-directed automated layout algorithm called Compound Spring Embedder (CoSE). On top of the existing force scheme, additional heuristics employing new types of forces and movement rules are defined to address SBGN-specific rules. Our algorithm is the only automatic layout algorithm that properly addresses all SBGN rules for drawing PD maps, including placement of substrates and products of process nodes on opposite sides, compact tiling of members of molecular complexes and extensively making use of nested structures (compound nodes) to properly draw cellular locations and molecular complex structures. As demonstrated experimentally, the algorithm results in significant improvements over use of a generic layout algorithm such as CoSE in addressing SBGN rules on top of commonly accepted graph drawing criteria. An implementation of our algorithm in Java is available within ChiLay library (https://github.com/iVis-at-Bilkent/chilay). ugur@cs.bilkent.edu.tr or dogrusoz@cbio.mskcc.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
An algorithm for automated layout of process description maps drawn in SBGN
Genc, Begum; Dogrusoz, Ugur
2016-01-01
Motivation: Evolving technology has increased the focus on genomics. The combination of today’s advanced techniques with decades of molecular biology research has yielded huge amounts of pathway data. A standard, named the Systems Biology Graphical Notation (SBGN), was recently introduced to allow scientists to represent biological pathways in an unambiguous, easy-to-understand and efficient manner. Although there are a number of automated layout algorithms for various types of biological networks, currently none specialize on process description (PD) maps as defined by SBGN. Results: We propose a new automated layout algorithm for PD maps drawn in SBGN. Our algorithm is based on a force-directed automated layout algorithm called Compound Spring Embedder (CoSE). On top of the existing force scheme, additional heuristics employing new types of forces and movement rules are defined to address SBGN-specific rules. Our algorithm is the only automatic layout algorithm that properly addresses all SBGN rules for drawing PD maps, including placement of substrates and products of process nodes on opposite sides, compact tiling of members of molecular complexes and extensively making use of nested structures (compound nodes) to properly draw cellular locations and molecular complex structures. As demonstrated experimentally, the algorithm results in significant improvements over use of a generic layout algorithm such as CoSE in addressing SBGN rules on top of commonly accepted graph drawing criteria. Availability and implementation: An implementation of our algorithm in Java is available within ChiLay library (https://github.com/iVis-at-Bilkent/chilay). Contact: ugur@cs.bilkent.edu.tr or dogrusoz@cbio.mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26363029
Optimizing legacy molecular dynamics software with directive-based offload
Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; ...
2015-05-14
The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also resultmore » in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.« less
Discrete Optimization of Electronic Hyperpolarizabilities in a Chemical Subspace
2009-05-01
molecular design. Methods for optimization in discrete spaces have been studied extensively and recently reviewed ( 5). Optimization methods include...integer programming, as in branch-and-bound techniques (including dead-end elimination [ 6]), simulated annealing ( 7), and genetic algorithms ( 8...These algorithms have found renewed interest and application in molecular and materials design (9- 12) . Recently, new approaches have been
Efficient molecular dynamics simulations with many-body potentials on graphics processing units
NASA Astrophysics Data System (ADS)
Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari
2017-09-01
Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).
In Silico Synthesis of Synthetic Receptors: A Polymerization Algorithm.
Cowen, Todd; Busato, Mirko; Karim, Kal; Piletsky, Sergey A
2016-12-01
Molecularly imprinted polymer (MIP) synthetic receptors have proposed and applied applications in chemical extraction, sensors, assays, catalysis, targeted drug delivery, and direct inhibition of harmful chemicals and pathogens. However, they rely heavily on effective design for success. An algorithm has been written which mimics radical polymerization atomistically, accounting for chemical and spatial discrimination, hybridization, and geometric optimization. Synthetic ephedrine receptors were synthesized in silico to demonstrate the accuracy of the algorithm in reproducing polymers structures at the atomic level. Comparative analysis in the design of a synthetic ephedrine receptor demonstrates that the new method can effectively identify affinity trends and binding site selectivities where commonly used alternative methods cannot. This new method is believed to generate the most realistic models of MIPs thus produced. This suggests that the algorithm could be a powerful new tool in the design and analysis of various polymers, including MIPs, with significant implications in areas of biotechnology, biomimetics, and the materials sciences more generally. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Technical Reports Server (NTRS)
Wan, Zhengming; Dozier, Jeff
1992-01-01
The effect of temperature-dependent molecular absorption coefficients on thermal infrared spectral signatures measured from satellite sensors is investigated by comparing results from the atmospheric transmission and radiance codes LOWTRAN and MODTRAN and the accurate multiple scattering radiative transfer model ATRAD for different atmospheric profiles. The sensors considered include the operational NOAA AVHRR and two research instruments planned for NASA's Earth Observing System (EOS): MODIS-N (Moderate Resolution Imaging Spectrometer-Nadir-Mode) and ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer). The difference in band transmittance is as large as 6 percent for some thermal bands within atmospheric windows and more than 30 percent near the edges of these atmospheric windows. The effect of temperature-dependent molecular absorption coefficients on satellite measurements of sea-surface temperature can exceed 0.6 K. Quantitative comparison and factor analysis indicate that more accurate measurements of molecular absorption coefficients and better radiative transfer simulation methods are needed to achieve SST accuracy of 0.3 K, as required for global numerical models of climate, and to develop land-surface temperature algorithms at the 1-K accuracy level.
Charting molecular free-energy landscapes with an atlas of collective variables
NASA Astrophysics Data System (ADS)
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2016-11-01
Collective variables (CVs) are a fundamental tool to understand molecular flexibility, to compute free energy landscapes, and to enhance sampling in molecular dynamics simulations. However, identifying suitable CVs is challenging, and is increasingly addressed with systematic data-driven manifold learning techniques. Here, we provide a flexible framework to model molecular systems in terms of a collection of locally valid and partially overlapping CVs: an atlas of CVs. The specific motivation for such a framework is to enhance the applicability and robustness of CVs based on manifold learning methods, which fail in the presence of periodicities in the underlying conformational manifold. More generally, using an atlas of CVs rather than a single chart may help us better describe different regions of conformational space. We develop the statistical mechanics foundation for our multi-chart description and propose an algorithmic implementation. The resulting atlas of data-based CVs are then used to enhance sampling and compute free energy surfaces in two model systems, alanine dipeptide and β-D-glucopyranose, whose conformational manifolds have toroidal and spherical topologies.
Kärkkäinen, Hanni P; Sillanpää, Mikko J
2013-09-04
Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed.
Kärkkäinen, Hanni P.; Sillanpää, Mikko J.
2013-01-01
Because of the increased availability of genome-wide sets of molecular markers along with reduced cost of genotyping large samples of individuals, genomic estimated breeding values have become an essential resource in plant and animal breeding. Bayesian methods for breeding value estimation have proven to be accurate and efficient; however, the ever-increasing data sets are placing heavy demands on the parameter estimation algorithms. Although a commendable number of fast estimation algorithms are available for Bayesian models of continuous Gaussian traits, there is a shortage for corresponding models of discrete or censored phenotypes. In this work, we consider a threshold approach of binary, ordinal, and censored Gaussian observations for Bayesian multilocus association models and Bayesian genomic best linear unbiased prediction and present a high-speed generalized expectation maximization algorithm for parameter estimation under these models. We demonstrate our method with simulated and real data. Our example analyses suggest that the use of the extra information present in an ordered categorical or censored Gaussian data set, instead of dichotomizing the data into case-control observations, increases the accuracy of genomic breeding values predicted by Bayesian multilocus association models or by Bayesian genomic best linear unbiased prediction. Furthermore, the example analyses indicate that the correct threshold model is more accurate than the directly used Gaussian model with a censored Gaussian data, while with a binary or an ordinal data the superiority of the threshold model could not be confirmed. PMID:23821618
Post-processing interstitialcy diffusion from molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Bhardwaj, U.; Bukkuru, S.; Warrier, M.
2016-01-01
An algorithm to rigorously trace the interstitialcy diffusion trajectory in crystals is developed. The algorithm incorporates unsupervised learning and graph optimization which obviate the need to input extra domain specific information depending on crystal or temperature of the simulation. The algorithm is implemented in a flexible framework as a post-processor to molecular dynamics (MD) simulations. We describe in detail the reduction of interstitialcy diffusion into known computational problems of unsupervised clustering and graph optimization. We also discuss the steps, computational efficiency and key components of the algorithm. Using the algorithm, thermal interstitialcy diffusion from low to near-melting point temperatures is studied. We encapsulate the algorithms in a modular framework with functionality to calculate diffusion coefficients, migration energies and other trajectory properties. The study validates the algorithm by establishing the conformity of output parameters with experimental values and provides detailed insights for the interstitialcy diffusion mechanism. The algorithm along with the help of supporting visualizations and analysis gives convincing details and a new approach to quantifying diffusion jumps, jump-lengths, time between jumps and to identify interstitials from lattice atoms.
Post-processing interstitialcy diffusion from molecular dynamics simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhardwaj, U., E-mail: haptork@gmail.com; Bukkuru, S.; Warrier, M.
2016-01-15
An algorithm to rigorously trace the interstitialcy diffusion trajectory in crystals is developed. The algorithm incorporates unsupervised learning and graph optimization which obviate the need to input extra domain specific information depending on crystal or temperature of the simulation. The algorithm is implemented in a flexible framework as a post-processor to molecular dynamics (MD) simulations. We describe in detail the reduction of interstitialcy diffusion into known computational problems of unsupervised clustering and graph optimization. We also discuss the steps, computational efficiency and key components of the algorithm. Using the algorithm, thermal interstitialcy diffusion from low to near-melting point temperatures ismore » studied. We encapsulate the algorithms in a modular framework with functionality to calculate diffusion coefficients, migration energies and other trajectory properties. The study validates the algorithm by establishing the conformity of output parameters with experimental values and provides detailed insights for the interstitialcy diffusion mechanism. The algorithm along with the help of supporting visualizations and analysis gives convincing details and a new approach to quantifying diffusion jumps, jump-lengths, time between jumps and to identify interstitials from lattice atoms. -- Graphical abstract:.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
He, Hongxing; Fang, Hengrui; Miller, Mitchell D.
2016-07-15
An iterative transform algorithm is proposed to improve the conventional molecular-replacement method for solving the phase problem in X-ray crystallography. Several examples of successful trial calculations carried out with real diffraction data are presented. An iterative transform method proposed previously for direct phasing of high-solvent-content protein crystals is employed for enhancing the molecular-replacement (MR) algorithm in protein crystallography. Target structures that are resistant to conventional MR due to insufficient similarity between the template and target structures might be tractable with this modified phasing method. Trial calculations involving three different structures are described to test and illustrate the methodology. The relationshipmore » of the approach to PHENIX Phaser-MR and MR-Rosetta is discussed.« less
Tham, S Y; Agatonovic-Kustrin, S
2002-05-15
Quantitative structure-retention relationship(QSRR) method was used to model reversed-phase high-performance liquid chromatography (RP-HPLC) separation of 18 selected amino acids. Retention data for phenylthiocarbamyl (PTC) amino acids derivatives were obtained using gradient elution on ODS column with mobile phase of varying acetonitrile, acetate buffer and containing 0.5 ml/l of triethylamine (TEA). Molecular structure of each amino acid was encoded with 36 calculated molecular descriptors. The correlation between the molecular descriptors and the retention time of the compounds in the calibration set was established using the genetic neural network method. A genetic algorithm (GA) was used to select important molecular descriptors and supervised artificial neural network (ANN) was used to correlate mobile phase composition and selected descriptors with the experimentally derived retention times. Retention time values were used as the network's output and calculated molecular descriptors and mobile phase composition as the inputs. The best model with five input descriptors was chosen, and the significance of the selected descriptors for amino acid separation was examined. Results confirmed the dominant role of the organic modifier in such chromatographic systems in addition to lipophilicity (log P) and molecular size and shape (topological indices) of investigated solutes.
ATK-ForceField: a new generation molecular dynamics software package
NASA Astrophysics Data System (ADS)
Schneider, Julian; Hamaekers, Jan; Chill, Samuel T.; Smidstrup, Søren; Bulin, Johannes; Thesen, Ralph; Blom, Anders; Stokbro, Kurt
2017-12-01
ATK-ForceField is a software package for atomistic simulations using classical interatomic potentials. It is implemented as a part of the Atomistix ToolKit (ATK), which is a Python programming environment that makes it easy to create and analyze both standard and highly customized simulations. This paper will focus on the atomic interaction potentials, molecular dynamics, and geometry optimization features of the software, however, many more advanced modeling features are available. The implementation details of these algorithms and their computational performance will be shown. We present three illustrative examples of the types of calculations that are possible with ATK-ForceField: modeling thermal transport properties in a silicon germanium crystal, vapor deposition of selenium molecules on a selenium surface, and a simulation of creep in a copper polycrystal.
Visibility Equalizer Cutaway Visualization of Mesoscopic Biological Models.
Le Muzic, M; Mindek, P; Sorger, J; Autin, L; Goodsell, D; Viola, I
2016-06-01
In scientific illustrations and visualization, cutaway views are often employed as an effective technique for occlusion management in densely packed scenes. We propose a novel method for authoring cutaway illustrations of mesoscopic biological models. In contrast to the existing cutaway algorithms, we take advantage of the specific nature of the biological models. These models consist of thousands of instances with a comparably smaller number of different types. Our method constitutes a two stage process. In the first step, clipping objects are placed in the scene, creating a cutaway visualization of the model. During this process, a hierarchical list of stacked bars inform the user about the instance visibility distribution of each individual molecular type in the scene. In the second step, the visibility of each molecular type is fine-tuned through these bars, which at this point act as interactive visibility equalizers. An evaluation of our technique with domain experts confirmed that our equalizer-based approach for visibility specification was valuable and effective for both, scientific and educational purposes.
Visibility Equalizer Cutaway Visualization of Mesoscopic Biological Models
Le Muzic, M.; Mindek, P.; Sorger, J.; Autin, L.; Goodsell, D.; Viola, I.
2017-01-01
In scientific illustrations and visualization, cutaway views are often employed as an effective technique for occlusion management in densely packed scenes. We propose a novel method for authoring cutaway illustrations of mesoscopic biological models. In contrast to the existing cutaway algorithms, we take advantage of the specific nature of the biological models. These models consist of thousands of instances with a comparably smaller number of different types. Our method constitutes a two stage process. In the first step, clipping objects are placed in the scene, creating a cutaway visualization of the model. During this process, a hierarchical list of stacked bars inform the user about the instance visibility distribution of each individual molecular type in the scene. In the second step, the visibility of each molecular type is fine-tuned through these bars, which at this point act as interactive visibility equalizers. An evaluation of our technique with domain experts confirmed that our equalizer-based approach for visibility specification was valuable and effective for both, scientific and educational purposes. PMID:28344374
Dutheil, Julien; Gaillard, Sylvain; Bazin, Eric; Glémin, Sylvain; Ranwez, Vincent; Galtier, Nicolas; Belkhir, Khalid
2006-04-04
A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/output methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus. Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.
Mocellin, Simone; Shrager, Jeff; Scolyer, Richard; Pasquali, Sandro; Verdi, Daunia; Marincola, Francesco M.; Briarava, Marta; Gobbel, Randy; Rossi, Carlo; Nitti, Donato
2010-01-01
Background The efficacy of current anticancer treatments is far from satisfactory and many patients still die of their disease. A general agreement exists on the urgency of developing molecularly targeted therapies, although their implementation in the clinical setting is in its infancy. In fact, despite the wealth of preclinical studies addressing these issues, the difficulty of testing each targeted therapy hypothesis in the clinical arena represents an intrinsic obstacle. As a consequence, we are witnessing a paradoxical situation where most hypotheses about the molecular and cellular biology of cancer remain clinically untested and therefore do not translate into a therapeutic benefit for patients. Objective To present a computational method aimed to comprehensively exploit the scientific knowledge in order to foster the development of personalized cancer treatment by matching the patient's molecular profile with the available evidence on targeted therapy. Methods To this aim we focused on melanoma, an increasingly diagnosed malignancy for which the need for novel therapeutic approaches is paradigmatic since no effective treatment is available in the advanced setting. Relevant data were manually extracted from peer-reviewed full-text original articles describing any type of anti-melanoma targeted therapy tested in any type of experimental or clinical model. To this purpose, Medline, Embase, Cancerlit and the Cochrane databases were searched. Results and Conclusions We created a manually annotated database (Targeted Therapy Database, TTD) where the relevant data are gathered in a formal representation that can be computationally analyzed. Dedicated algorithms were set up for the identification of the prevalent therapeutic hypotheses based on the available evidence and for ranking treatments based on the molecular profile of individual patients. In this essay we describe the principles and computational algorithms of an original method developed to fully exploit the available knowledge on cancer biology with the ultimate goal of fruitfully driving both preclinical and clinical research on anticancer targeted therapy. In the light of its theoretical nature, the prediction performance of this model must be validated before it can be implemented in the clinical setting. PMID:20706624
Mocellin, Simone; Shrager, Jeff; Scolyer, Richard; Pasquali, Sandro; Verdi, Daunia; Marincola, Francesco M; Briarava, Marta; Gobbel, Randy; Rossi, Carlo; Nitti, Donato
2010-08-10
The efficacy of current anticancer treatments is far from satisfactory and many patients still die of their disease. A general agreement exists on the urgency of developing molecularly targeted therapies, although their implementation in the clinical setting is in its infancy. In fact, despite the wealth of preclinical studies addressing these issues, the difficulty of testing each targeted therapy hypothesis in the clinical arena represents an intrinsic obstacle. As a consequence, we are witnessing a paradoxical situation where most hypotheses about the molecular and cellular biology of cancer remain clinically untested and therefore do not translate into a therapeutic benefit for patients. To present a computational method aimed to comprehensively exploit the scientific knowledge in order to foster the development of personalized cancer treatment by matching the patient's molecular profile with the available evidence on targeted therapy. To this aim we focused on melanoma, an increasingly diagnosed malignancy for which the need for novel therapeutic approaches is paradigmatic since no effective treatment is available in the advanced setting. Relevant data were manually extracted from peer-reviewed full-text original articles describing any type of anti-melanoma targeted therapy tested in any type of experimental or clinical model. To this purpose, Medline, Embase, Cancerlit and the Cochrane databases were searched. We created a manually annotated database (Targeted Therapy Database, TTD) where the relevant data are gathered in a formal representation that can be computationally analyzed. Dedicated algorithms were set up for the identification of the prevalent therapeutic hypotheses based on the available evidence and for ranking treatments based on the molecular profile of individual patients. In this essay we describe the principles and computational algorithms of an original method developed to fully exploit the available knowledge on cancer biology with the ultimate goal of fruitfully driving both preclinical and clinical research on anticancer targeted therapy. In the light of its theoretical nature, the prediction performance of this model must be validated before it can be implemented in the clinical setting.
Simulation of meso-damage of refractory based on cohesion model and molecular dynamics method
NASA Astrophysics Data System (ADS)
Zhao, Jiuling; Shang, Hehao; Zhu, Zhaojun; Zhang, Guoxing; Duan, Leiguang; Sun, Xinya
2018-06-01
In order to describe the meso-damage of the refractories more accurately, and to study of the relationship between the mesostructured of the refractories and the macro-mechanics, this paper takes the magnesia-carbon refractories as the research object and uses the molecular dynamics method to instead the traditional sequential algorithm to establish the meso-particles filling model including small and large particles. Finally, the finite element software-ABAQUS is used to conducts numerical simulation on the meso-damage evolution process of refractory materials. From the results, the process of initiation and propagation of microscopic interface cracks can be observed intuitively, and the macroscopic stress-strain curve of the refractory material is obtained. The results show that the combination of molecular dynamics modeling and the use of Python in the interface to insert the cohesive element numerical simulation, obtaining of more accurate interface parameters through parameter inversion, can be more accurate to observe the interface of the meso-damage evolution process and effective to consider the effect of the mesostructured of the refractory material on its macroscopic mechanical properties.
Perspective: Reaches of chemical physics in biology.
Gruebele, Martin; Thirumalai, D
2013-09-28
Chemical physics as a discipline contributes many experimental tools, algorithms, and fundamental theoretical models that can be applied to biological problems. This is especially true now as the molecular level and the systems level descriptions begin to connect, and multi-scale approaches are being developed to solve cutting edge problems in biology. In some cases, the concepts and tools got their start in non-biological fields, and migrated over, such as the idea of glassy landscapes, fluorescence spectroscopy, or master equation approaches. In other cases, the tools were specifically developed with biological physics applications in mind, such as modeling of single molecule trajectories or super-resolution laser techniques. In this introduction to the special topic section on chemical physics of biological systems, we consider a wide range of contributions, all the way from the molecular level, to molecular assemblies, chemical physics of the cell, and finally systems-level approaches, based on the contributions to this special issue. Chemical physicists can look forward to an exciting future where computational tools, analytical models, and new instrumentation will push the boundaries of biological inquiry.
Density functional simulations as a tool to probe molecular interactions in wet supercritical CO2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Glezakou, Vassiliki Alexandra; McGrail, B. Peter
2013-06-03
Recent advances in mixed Gaussian and plane wave algorithms have made possible the effective use of density functional theory (DFT) in ab initio molecular dynamics (AIMD) simulations for large and chemically complex models of condensed phase materials. In this chapter, we are reviewing recent progress on the modeling and characterization of co-sequestration processes and reactivity in wet supercritical CO2 (sc-CO2). We examine the molecular transformations of mineral and metal components of a sequestration system in contact with water-bearing scCO2 media and aim to establish a reliable correspondence between experimental observations and theory models with predictive ability and transferability of resultsmore » in large scale geomechanical simulators. This work is funded by the Department of Energy, Office of Fossil Energy. A portion of the research was performed using EMSL, a national scientific user facility sponsored by the Department of Energy’s Office of Biological and Environmental Research located at Pacific Northwest National Laboratory. The Pacific Norhtwest National Laboratory (PNNL) is operated by Battelle for DOE under contract DE-AC06-76RL01830.« less
Perspective: Reaches of chemical physics in biology
Gruebele, Martin; Thirumalai, D.
2013-01-01
Chemical physics as a discipline contributes many experimental tools, algorithms, and fundamental theoretical models that can be applied to biological problems. This is especially true now as the molecular level and the systems level descriptions begin to connect, and multi-scale approaches are being developed to solve cutting edge problems in biology. In some cases, the concepts and tools got their start in non-biological fields, and migrated over, such as the idea of glassy landscapes, fluorescence spectroscopy, or master equation approaches. In other cases, the tools were specifically developed with biological physics applications in mind, such as modeling of single molecule trajectories or super-resolution laser techniques. In this introduction to the special topic section on chemical physics of biological systems, we consider a wide range of contributions, all the way from the molecular level, to molecular assemblies, chemical physics of the cell, and finally systems-level approaches, based on the contributions to this special issue. Chemical physicists can look forward to an exciting future where computational tools, analytical models, and new instrumentation will push the boundaries of biological inquiry. PMID:24089712
Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean
2010-11-01
Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.
GARN: Sampling RNA 3D Structure Space with Game Theory and Knowledge-Based Scoring Strategies.
Boudard, Mélanie; Bernauer, Julie; Barth, Dominique; Cohen, Johanne; Denise, Alain
2015-01-01
Cellular processes involve large numbers of RNA molecules. The functions of these RNA molecules and their binding to molecular machines are highly dependent on their 3D structures. One of the key challenges in RNA structure prediction and modeling is predicting the spatial arrangement of the various structural elements of RNA. As RNA folding is generally hierarchical, methods involving coarse-grained models hold great promise for this purpose. We present here a novel coarse-grained method for sampling, based on game theory and knowledge-based potentials. This strategy, GARN (Game Algorithm for RNa sampling), is often much faster than previously described techniques and generates large sets of solutions closely resembling the native structure. GARN is thus a suitable starting point for the molecular modeling of large RNAs, particularly those with experimental constraints. GARN is available from: http://garn.lri.fr/.
Multiresolution molecular mechanics: Implementation and efficiency
NASA Astrophysics Data System (ADS)
Biyikli, Emre; To, Albert C.
2017-01-01
Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
Fast surface-based travel depth estimation algorithm for macromolecule surface shape description.
Giard, Joachim; Alface, Patrice Rondao; Gala, Jean-Luc; Macq, Benoît
2011-01-01
Travel Depth, introduced by Coleman and Sharp in 2006, is a physical interpretation of molecular depth, a term frequently used to describe the shape of a molecular active site or binding site. Travel Depth can be seen as the physical distance a solvent molecule would have to travel from a point of the surface, i.e., the Solvent-Excluded Surface (SES), to its convex hull. Existing algorithms providing an estimation of the Travel Depth are based on a regular sampling of the molecule volume and the use of the Dijkstra's shortest path algorithm. Since Travel Depth is only defined on the molecular surface, this volume-based approach is characterized by a large computational complexity due to the processing of unnecessary samples lying inside or outside the molecule. In this paper, we propose a surface-based approach that restricts the processing to data defined on the SES. This algorithm significantly reduces the complexity of Travel Depth estimation and makes possible the analysis of large macromolecule surface shape description with high resolution. Experimental results show that compared to existing methods, the proposed algorithm achieves accurate estimations with considerably reduced processing times.
A multilevel-skin neighbor list algorithm for molecular dynamics simulation
NASA Astrophysics Data System (ADS)
Zhang, Chenglong; Zhao, Mingcan; Hou, Chaofeng; Ge, Wei
2018-01-01
Searching of the interaction pairs and organization of the interaction processes are important steps in molecular dynamics (MD) algorithms and are critical to the overall efficiency of the simulation. Neighbor lists are widely used for these steps, where thicker skin can reduce the frequency of list updating but is discounted by more computation in distance check for the particle pairs. In this paper, we propose a new neighbor-list-based algorithm with a precisely designed multilevel skin which can reduce unnecessary computation on inter-particle distances. The performance advantages over traditional methods are then analyzed against the main simulation parameters on Intel CPUs and MICs (many integrated cores), and are clearly demonstrated. The algorithm can be generalized for various discrete simulations using neighbor lists.
Accelerating molecular dynamic simulation on the cell processor and Playstation 3.
Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S
2009-01-30
Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.
Reactivity of fluoroalkanes in reactions of coordinated molecular decomposition
NASA Astrophysics Data System (ADS)
Pokidova, T. S.; Denisov, E. T.
2017-08-01
Experimental results on the coordinated molecular decomposition of RF fluoroalkanes to olefin and HF are analyzed using the model of intersecting parabolas (IPM). The kinetic parameters are calculated to allow estimates of the activation energy ( E) and rate constant ( k) of these reactions, based on enthalpy and IPM algorithms. Parameters E and k are found for the first time for eight RF decomposition reactions. The factors that affect activation energy E of RF decomposition (the enthalpy of the reaction, the electronegativity of the atoms of reaction centers, and the dipole-dipole interaction of polar groups) are determined. The values of E and k for reverse reactions of addition are estimated.
NASA Astrophysics Data System (ADS)
Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.
2018-02-01
In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.
NASA Astrophysics Data System (ADS)
Isobe, Masaharu
Hard sphere/disk systems are among the simplest models and have been used to address numerous fundamental problems in the field of statistical physics. The pioneering numerical works on the solid-fluid phase transition based on Monte Carlo (MC) and molecular dynamics (MD) methods published in 1957 represent historical milestones, which have had a significant influence on the development of computer algorithms and novel tools to obtain physical insights. This chapter addresses the works of Alder's breakthrough regarding hard sphere/disk simulation: (i) event-driven molecular dynamics, (ii) long-time tail, (iii) molasses tail, and (iv) two-dimensional melting/crystallization. From a numerical viewpoint, there are serious issues that must be overcome for further breakthrough. Here, we present a brief review of recent progress in this area.
Pteros 2.0: Evolution of the fast parallel molecular analysis library for C++ and python.
Yesylevskyy, Semen O
2015-07-15
Pteros is the high-performance open-source library for molecular modeling and analysis of molecular dynamics trajectories. Starting from version 2.0 Pteros is available for C++ and Python programming languages with very similar interfaces. This makes it suitable for writing complex reusable programs in C++ and simple interactive scripts in Python alike. New version improves the facilities for asynchronous trajectory reading and parallel execution of analysis tasks by introducing analysis plugins which could be written in either C++ or Python in completely uniform way. The high level of abstraction provided by analysis plugins greatly simplifies prototyping and implementation of complex analysis algorithms. Pteros is available for free under Artistic License from http://sourceforge.net/projects/pteros/. © 2015 Wiley Periodicals, Inc.
A novel integrated framework and improved methodology of computer-aided drug design.
Chen, Calvin Yu-Chian
2013-01-01
Computer-aided drug design (CADD) is a critical initiating step of drug development, but a single model capable of covering all designing aspects remains to be elucidated. Hence, we developed a drug design modeling framework that integrates multiple approaches, including machine learning based quantitative structure-activity relationship (QSAR) analysis, 3D-QSAR, Bayesian network, pharmacophore modeling, and structure-based docking algorithm. Restrictions for each model were defined for improved individual and overall accuracy. An integration method was applied to join the results from each model to minimize bias and errors. In addition, the integrated model adopts both static and dynamic analysis to validate the intermolecular stabilities of the receptor-ligand conformation. The proposed protocol was applied to identifying HER2 inhibitors from traditional Chinese medicine (TCM) as an example for validating our new protocol. Eight potent leads were identified from six TCM sources. A joint validation system comprised of comparative molecular field analysis, comparative molecular similarity indices analysis, and molecular dynamics simulation further characterized the candidates into three potential binding conformations and validated the binding stability of each protein-ligand complex. The ligand pathway was also performed to predict the ligand "in" and "exit" from the binding site. In summary, we propose a novel systematic CADD methodology for the identification, analysis, and characterization of drug-like candidates.
Applications of genetic programming in cancer research.
Worzel, William P; Yu, Jianjun; Almal, Arpit A; Chinnaiyan, Arul M
2009-02-01
The theory of Darwinian evolution is the fundamental keystones of modern biology. Late in the last century, computer scientists began adapting its principles, in particular natural selection, to complex computational challenges, leading to the emergence of evolutionary algorithms. The conceptual model of selective pressure and recombination in evolutionary algorithms allow scientists to efficiently search high dimensional space for solutions to complex problems. In the last decade, genetic programming has been developed and extensively applied for analysis of molecular data to classify cancer subtypes and characterize the mechanisms of cancer pathogenesis and development. This article reviews current successes using genetic programming and discusses its potential impact in cancer research and treatment in the near future.
Differential correlation for sequencing data.
Siska, Charlotte; Kechris, Katerina
2017-01-19
Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple -omics studies.
Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles
2004-07-15
Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kevrekidis, Ioannis G.
The work explored the linking of modern developing machine learning techniques (manifold learning and in particular diffusion maps) with traditional PDE modeling/discretization/scientific computation techniques via the equation-free methodology developed by the PI. The result (in addition to several PhD degrees, two of them by CSGF Fellows) was a sequence of strong developments - in part on the algorithmic side, linking data mining with scientific computing, and in part on applications, ranging from PDE discretizations to molecular dynamics and complex network dynamics.
NASA Astrophysics Data System (ADS)
Huang, Junqi; Goltz, Mark N.
2017-06-01
To greatly simplify their solution, the equations describing radial advective/dispersive transport to an extraction well in a porous medium typically neglect molecular diffusion. While this simplification is appropriate to simulate transport in the saturated zone, it can result in significant errors when modeling gas phase transport in the vadose zone, as might be applied when simulating a soil vapor extraction (SVE) system to remediate vadose zone contamination. A new analytical solution for the equations describing radial gas phase transport of a sorbing contaminant to an extraction well is presented. The equations model advection, dispersion (including both mechanical dispersion and molecular diffusion), and rate-limited mass transfer of dissolved, separate phase, and sorbed contaminants into the gas phase. The model equations are analytically solved by using the Laplace transform with respect to time. The solutions are represented by confluent hypergeometric functions in the Laplace domain. The Laplace domain solutions are then evaluated using a numerical Laplace inversion algorithm. The solutions can be used to simulate the spatial distribution and the temporal evolution of contaminant concentrations during operation of a soil vapor extraction well. Results of model simulations show that the effect of gas phase molecular diffusion upon concentrations at the extraction well is relatively small, although the effect upon the distribution of concentrations in space is significant. This study provides a tool that can be useful in designing SVE remediation strategies, as well as verifying numerical models used to simulate SVE system performance.
Coutinho, Rita; Clear, Andrew James; Owen, Andrew; Wilson, Andrew; Matthews, Janet; Lee, Abigail; Alvarez, Rute; da Silva, Maria Gomes; Cabeçadas, José; Calaminici, Maria; Gribben, John G.
2014-01-01
Purpose The opportunity to improve therapeutic choices on the basis of molecular features of the tumour cells is on the horizon in Diffuse Large B-cell Lymphoma (DLBCL). Agents such as bortezomib exhibit selective activity against the poor outcome activated B-cell type DLBCL. In order for targeted therapies to succeed in this disease, robust strategies that segregate patients into molecular groups with high reliability are needed. While molecular studies are considered gold standard, several immunohistochemistry (IHC) algorithms have been published that claim to be able to stratify patients according to their cell-of-origin and to be relevant for patient outcome. However results are poorly reproducible by independent groups. Experimental design We investigated nine IHC algorithms for molecular classification in a dataset of DLBCL diagnostic biopsies, incorporating immunostaining for CD10, BCL6, BCL2, MUM1, FOXP1, GCET1 and LMO2. IHC profiles were assessed and agreed among three expert observers. A consensus matrix based on all scoring combinations and the number of subjects for each combination allowed to assess reliability. The survival impact of individual markers and classifiers was evaluated using Kaplan-Meier curves and the log-rank test. Results The concordance in patient’s classification across the different algorithms was low. Only 4% the tumors have been classified as GCB and 21% as ABC/non-GCB by all methods. None of the algorithms provided prognostic information in the R-CHOP treated cohort. Conclusion Further work is required to standardize IHC algorithms for DLBCL cell-of-origin classification for these to be considered reliable alternatives to molecular-based methods to be used for clinical decisions. PMID:24122791
Coutinho, Rita; Clear, Andrew James; Owen, Andrew; Wilson, Andrew; Matthews, Janet; Lee, Abigail; Alvarez, Rute; Gomes da Silva, Maria; Cabeçadas, José; Calaminici, Maria; Gribben, John G
2013-12-15
The opportunity to improve therapeutic choices on the basis of molecular features of the tumor cells is on the horizon in diffuse large B-cell lymphoma (DLBCL). Agents such as bortezomib exhibit selective activity against the poor outcome activated B-cell type (ABC) DLBCL. In order for targeted therapies to succeed in this disease, robust strategies that segregate patients into molecular groups with high reliability are needed. Although molecular studies are considered gold standard, several immunohistochemistry (IHC) algorithms have been published that claim to be able to stratify patients according to their cell-of-origin and to be relevant for patient outcome. However, results are poorly reproducible by independent groups. We investigated nine IHC algorithms for molecular classification in a dataset of DLBCL diagnostic biopsies, incorporating immunostaining for CD10, BCL6, BCL2, MUM1, FOXP1, GCET1, and LMO2. IHC profiles were assessed and agreed among three expert observers. A consensus matrix based on all scoring combinations and the number of subjects for each combination allowed us to assess reliability. The survival impact of individual markers and classifiers was evaluated using Kaplan-Meier curves and the log-rank test. The concordance in patient's classification across the different algorithms was low. Only 4% of the tumors have been classified as germinal center B-cell type (GCB) and 21% as ABC/non-GCB by all methods. None of the algorithms provided prognostic information in the R-CHOP (rituximab plus cyclophosphamide-adriamycin-vincristine-prednisone)-treated cohort. Further work is required to standardize IHC algorithms for DLBCL cell-of-origin classification for these to be considered reliable alternatives to molecular-based methods to be used for clinical decisions. ©2013 AACR.
Molecular surface mesh generation by filtering electron density map.
Giard, Joachim; Macq, Benoît
2010-01-01
Bioinformatics applied to macromolecules are now widely spread and in continuous expansion. In this context, representing external molecular surface such as the Van der Waals Surface or the Solvent Excluded Surface can be useful for several applications. We propose a fast and parameterizable algorithm giving good visual quality meshes representing molecular surfaces. It is obtained by isosurfacing a filtered electron density map. The density map is the result of the maximum of Gaussian functions placed around atom centers. This map is filtered by an ideal low-pass filter applied on the Fourier Transform of the density map. Applying the marching cubes algorithm on the inverse transform provides a mesh representation of the molecular surface.
Service-based analysis of biological pathways
Zheng, George; Bouguettaya, Athman
2009-01-01
Background Computer-based pathway discovery is concerned with two important objectives: pathway identification and analysis. Conventional mining and modeling approaches aimed at pathway discovery are often effective at achieving either objective, but not both. Such limitations can be effectively tackled leveraging a Web service-based modeling and mining approach. Results Inspired by molecular recognitions and drug discovery processes, we developed a Web service mining tool, named PathExplorer, to discover potentially interesting biological pathways linking service models of biological processes. The tool uses an innovative approach to identify useful pathways based on graph-based hints and service-based simulation verifying user's hypotheses. Conclusion Web service modeling of biological processes allows the easy access and invocation of these processes on the Web. Web service mining techniques described in this paper enable the discovery of biological pathways linking these process service models. Algorithms presented in this paper for automatically highlighting interesting subgraph within an identified pathway network enable the user to formulate hypothesis, which can be tested out using our simulation algorithm that are also described in this paper. PMID:19796403
NASA Astrophysics Data System (ADS)
Zimoń, M. J.; Prosser, R.; Emerson, D. R.; Borg, M. K.; Bray, D. J.; Grinberg, L.; Reese, J. M.
2016-11-01
Filtering of particle-based simulation data can lead to reduced computational costs and enable more efficient information transfer in multi-scale modelling. This paper compares the effectiveness of various signal processing methods to reduce numerical noise and capture the structures of nano-flow systems. In addition, a novel combination of these algorithms is introduced, showing the potential of hybrid strategies to improve further the de-noising performance for time-dependent measurements. The methods were tested on velocity and density fields, obtained from simulations performed with molecular dynamics and dissipative particle dynamics. Comparisons between the algorithms are given in terms of performance, quality of the results and sensitivity to the choice of input parameters. The results provide useful insights on strategies for the analysis of particle-based data and the reduction of computational costs in obtaining ensemble solutions.
Structural alignment of protein descriptors - a combinatorial model.
Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek
2016-09-17
Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).
Design Principles of Regulatory Networks: Searching for the Molecular Algorithms of the Cell
Lim, Wendell A.; Lee, Connie M.; Tang, Chao
2013-01-01
A challenge in biology is to understand how complex molecular networks in the cell execute sophisticated regulatory functions. Here we explore the idea that there are common and general principles that link network structures to biological functions, principles that constrain the design solutions that evolution can converge upon for accomplishing a given cellular task. We describe approaches for classifying networks based on abstract architectures and functions, rather than on the specific molecular components of the networks. For any common regulatory task, can we define the space of all possible molecular solutions? Such inverse approaches might ultimately allow the assembly of a design table of core molecular algorithms that could serve as a guide for building synthetic networks and modulating disease networks. PMID:23352241
Al Nasr, Kamal; Ranjan, Desh; Zubair, Mohammad; Chen, Lin; He, Jing
2014-01-01
Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question in de novo modeling from cryo-EM images is to determine the match between the detected secondary structures from the image and those on the protein sequence. We formulate this matching problem into a constrained graph problem and present an O(Δ(2)N(2)2(N)) algorithm to this NP-Hard problem. The algorithm incorporates the dynamic programming approach into a constrained K-shortest path algorithm. Our method, DP-TOSS, has been tested using α-proteins with maximum 33 helices and α-β proteins up to five helices and 12 β-strands. The correct match was ranked within the top 35 for 19 of the 20 α-proteins and all nine α-β proteins tested. The results demonstrate that DP-TOSS improves accuracy, time and memory space in deriving the topologies of the secondary structure elements for proteins with a large number of secondary structures and a complex skeleton.
A fast parallel clustering algorithm for molecular simulation trajectories.
Zhao, Yutong; Sheong, Fu Kit; Sun, Jian; Sander, Pedro; Huang, Xuhui
2013-01-15
We implemented a GPU-powered parallel k-centers algorithm to perform clustering on the conformations of molecular dynamics (MD) simulations. The algorithm is up to two orders of magnitude faster than the CPU implementation. We tested our algorithm on four protein MD simulation datasets ranging from the small Alanine Dipeptide to a 370-residue Maltose Binding Protein (MBP). It is capable of grouping 250,000 conformations of the MBP into 4000 clusters within 40 seconds. To achieve this, we effectively parallelized the code on the GPU and utilize the triangle inequality of metric spaces. Furthermore, the algorithm's running time is linear with respect to the number of cluster centers. In addition, we found the triangle inequality to be less effective in higher dimensions and provide a mathematical rationale. Finally, using Alanine Dipeptide as an example, we show a strong correlation between cluster populations resulting from the k-centers algorithm and the underlying density. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
Evaluating data mining algorithms using molecular dynamics trajectories.
Tatsis, Vasileios A; Tjortjis, Christos; Tzirakis, Panagiotis
2013-01-01
Molecular dynamics simulations provide a sample of a molecule's conformational space. Experiments on the mus time scale, resulting in large amounts of data, are nowadays routine. Data mining techniques such as classification provide a way to analyse such data. In this work, we evaluate and compare several classification algorithms using three data sets which resulted from computer simulations, of a potential enzyme mimetic biomolecule. We evaluated 65 classifiers available in the well-known data mining toolkit Weka, using 'classification' errors to assess algorithmic performance. Results suggest that: (i) 'meta' classifiers perform better than the other groups, when applied to molecular dynamics data sets; (ii) Random Forest and Rotation Forest are the best classifiers for all three data sets; and (iii) classification via clustering yields the highest classification error. Our findings are consistent with bibliographic evidence, suggesting a 'roadmap' for dealing with such data.
Tsai, Chen-An; Lee, Kuan-Ting; Liu, Jen-Pei
2016-01-01
A key feature of precision medicine is that it takes individual variability at the genetic or molecular level into account in determining the best treatment for patients diagnosed with diseases detected by recently developed novel biotechnologies. The enrichment design is an efficient design that enrolls only the patients testing positive for specific molecular targets and randomly assigns them for the targeted treatment or the concurrent control. However there is no diagnostic device with perfect accuracy and precision for detecting molecular targets. In particular, the positive predictive value (PPV) can be quite low for rare diseases with low prevalence. Under the enrichment design, some patients testing positive for specific molecular targets may not have the molecular targets. The efficacy of the targeted therapy may be underestimated in the patients that actually do have the molecular targets. To address the loss of efficiency due to misclassification error, we apply the discrete mixture modeling for time-to-event data proposed by Eng and Hanlon [8] to develop an inferential procedure, based on the Cox proportional hazard model, for treatment effects of the targeted treatment effect for the true-positive patients with the molecular targets. Our proposed procedure incorporates both inaccuracy of diagnostic devices and uncertainty of estimated accuracy measures. We employed the expectation-maximization algorithm in conjunction with the bootstrap technique for estimation of the hazard ratio and its estimated variance. We report the results of simulation studies which empirically investigated the performance of the proposed method. Our proposed method is illustrated by a numerical example.
Bastida, Jose Maria; González-Porras, Jose Ramon; Jiménez, Cristina; Benito, Rocio; Ordoñez, Gonzalo R; Álvarez-Román, Maria Teresa; Fontecha, M Elena; Janusz, Kamila; Castillo, David; Fisac, Rosa María; García-Frade, Luis Javier; Aguilar, Carlos; Martínez, María Paz; Bermejo, Nuria; Herrero, Sonia; Balanzategui, Ana; Martin-Antorán, Jose Manuel; Ramos, Rafael; Cebeiro, Maria Jose; Pardal, Emilia; Aguilera, Carmen; Pérez-Gutierrez, Belen; Prieto, Manuel; Riesco, Susana; Mendoza, Maria Carmen; Benito, Ana; Hortal Benito-Sendin, Ana; Jiménez-Yuste, Víctor; Hernández-Rivas, Jesus Maria; García-Sanz, Ramon; González-Díaz, Marcos; Sarasquete, Maria Eugenia
2017-01-05
Currently, molecular diagnosis of haemophilia A and B (HA and HB) highlights the excess risk-inhibitor development associated with specific mutations, and enables carrier testing of female relatives and prenatal or preimplantation genetic diagnosis. Molecular testing for HA also helps distinguish it from von Willebrand disease (VWD). Next-generation sequencing (NGS) allows simultaneous investigation of several complete genes, even though they may span very extensive regions. This study aimed to evaluate the usefulness of a molecular algorithm employing an NGS approach for sequencing the complete F8, F9 and VWF genes. The proposed algorithm includes the detection of inversions of introns 1 and 22, an NGS custom panel (the entire F8, F9 and VWF genes), and multiplex ligation-dependent probe amplification (MLPA) analysis. A total of 102 samples (97 FVIII- and FIX-deficient patients, and five female carriers) were studied. IVS-22 screening identified 11 out of 20 severe HA patients and one female carrier. IVS-1 analysis did not reveal any alterations. The NGS approach gave positive results in 88 cases, allowing the differential diagnosis of mild/moderate HA and VWD in eight cases. MLPA confirmed one large exon deletion. Only one case did have no pathogenic variants. The proposed algorithm had an overall success rate of 99 %. In conclusion, our evaluation demonstrates that this algorithm can reliably identify pathogenic variants and diagnose patients with HA, HB or VWD.
Understanding phylogenetic incongruence: lessons from phyllostomid bats
Dávalos, Liliana M; Cirranello, Andrea L; Geisler, Jonathan H; Simmons, Nancy B
2012-01-01
All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar-feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species-rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar-feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well-studied organisms such as phyllostomid bats. PMID:22891620
Loeffler, Troy David; Chan, Henry; Narayanan, Badri; Cherukara, Mathew J; Gray, Stephen K; Sankaranarayanan, Subramanian K R S
2018-06-20
Coarse-grained molecular dynamics (MD) simulations represent a powerful approach to simulate longer time scale and larger length scale phenomena than those accessible to all-atom models. The gain in efficiency, however, comes at the cost of atomistic details. The reverse transformation, also known as back-mapping, of coarse grained beads into their atomistic constituents represents a major challenge. Most existing approaches are limited to specific molecules or specific force-fields and often rely on running a long time atomistic MD of the back-mapped configuration to arrive at an optimal solution. Such approaches are problematic when dealing with systems with high diffusion barriers. Here, we introduce a new extension of the configurational-bias-Monte-Carlo (CBMC) algorithm, which we term the crystalline-configurational-bias-Monte-Carlo (C-CBMC) algortihm, that allows rapid and efficient conversion of a coarse-grained model back into its atomistic representation. Although the method is generic, we use a coarse-grained water model as a representative example and demonstrate the back-mapping or reverse transformation for model systems ranging from the ice-liquid water interface to amorphous and crystalline ice configurations. A series of simulations using the TIP4P/Ice model are performed to compare the new CBMC method to several other standard Monte Carlo and Molecular Dynamics based back-mapping techniques. In all the cases, the C-CBMC algorithm is able to find optimal hydrogen bonded configuration many thousand evaluations/steps sooner than the other methods compared within this paper. For crystalline ice structures such as a hexagonal, cubic, and cubic-hexagonal stacking disorder structures, the C-CBMC was able to find structures that were between 0.05 and 0.1 eV/water molecule lower in energy than the ground state energies predicted by the other methods. Detailed analysis of the atomistic structures show a significantly better global hydrogen positioning when contrasted with the existing simpler back-mapping methods. Our results demonstrate the efficiency and efficacy of our new back-mapping approach, especially for crystalline systems where simple force-field based relaxations have a tendency to get trapped in local minima.
Predicting Protein Structure Using Parallel Genetic Algorithms.
1994-12-01
Molecular dynamics attempts to simulate the protein folding process. However, the time steps required for this simulation are on the order of one...harmonics. These two factors have limited molecular dynamics simulations to less than a few nanoseconds (10-9 sec), even on today’s fastest supercomputers...By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H
NASA Astrophysics Data System (ADS)
Doytchinova, Irini A.; Walshe, Valerie; Borrow, Persephone; Flower, Darren R.
2005-03-01
The affinities of 177 nonameric peptides binding to the HLA-A*0201 molecule were measured using a FACS-based MHC stabilisation assay and analysed using chemometrics. Their structures were described by global and local descriptors, QSAR models were derived by genetic algorithm, stepwise regression and PLS. The global molecular descriptors included molecular connectivity χ indices, κ shape indices, E-state indices, molecular properties like molecular weight and log P, and three-dimensional descriptors like polarizability, surface area and volume. The local descriptors were of two types. The first used a binary string to indicate the presence of each amino acid type at each position of the peptide. The second was also position-dependent but used five z-scales to describe the main physicochemical properties of the amino acids forming the peptides. The models were developed using a representative training set of 131 peptides and validated using an independent test set of 46 peptides. It was found that the global descriptors could not explain the variance in the training set nor predict the affinities of the test set accurately. Both types of local descriptors gave QSAR models with better explained variance and predictive ability. The results suggest that, in their interactions with the MHC molecule, the peptide acts as a complicated ensemble of multiple amino acids mutually potentiating each other.
Cao, Qi; Leung, K M
2014-09-22
Reliable computer models for the prediction of chemical biodegradability from molecular descriptors and fingerprints are very important for making health and environmental decisions. Coupling of the differential evolution (DE) algorithm with the support vector classifier (SVC) in order to optimize the main parameters of the classifier resulted in an improved classifier called the DE-SVC, which is introduced in this paper for use in chemical biodegradability studies. The DE-SVC was applied to predict the biodegradation of chemicals on the basis of extensive sample data sets and known structural features of molecules. Our optimization experiments showed that DE can efficiently find the proper parameters of the SVC. The resulting classifier possesses strong robustness and reliability compared with grid search, genetic algorithm, and particle swarm optimization methods. The classification experiments conducted here showed that the DE-SVC exhibits better classification performance than models previously used for such studies. It is a more effective and efficient prediction model for chemical biodegradability.
Gueddida, Saber; Yan, Zeyin; Kibalin, Iurii; Voufack, Ariste Bolivard; Claiser, Nicolas; Souhassou, Mohamed; Lecomte, Claude; Gillon, Béatrice; Gillet, Jean-Michel
2018-04-28
In this paper, we propose a simple cluster model with limited basis sets to reproduce the unpaired electron distributions in a YTiO 3 ferromagnetic crystal. The spin-resolved one-electron-reduced density matrix is reconstructed simultaneously from theoretical magnetic structure factors and directional magnetic Compton profiles using our joint refinement algorithm. This algorithm is guided by the rescaling of basis functions and the adjustment of the spin population matrix. The resulting spin electron density in both position and momentum spaces from the joint refinement model is in agreement with theoretical and experimental results. Benefits brought from magnetic Compton profiles to the entire spin density matrix are illustrated. We studied the magnetic properties of the YTiO 3 crystal along the Ti-O 1 -Ti bonding. We found that the basis functions are mostly rescaled by means of magnetic Compton profiles, while the molecular occupation numbers are mainly modified by the magnetic structure factors.
Neural network error correction for solving coupled ordinary differential equations
NASA Technical Reports Server (NTRS)
Shelton, R. O.; Darsey, J. A.; Sumpter, B. G.; Noid, D. W.
1992-01-01
A neural network is presented to learn errors generated by a numerical algorithm for solving coupled nonlinear differential equations. The method is based on using a neural network to correctly learn the error generated by, for example, Runge-Kutta on a model molecular dynamics (MD) problem. The neural network programs used in this study were developed by NASA. Comparisons are made for training the neural network using backpropagation and a new method which was found to converge with fewer iterations. The neural net programs, the MD model and the calculations are discussed.
Equation-free multiscale computation: algorithms and applications.
Kevrekidis, Ioannis G; Samaey, Giovanni
2009-01-01
In traditional physicochemical modeling, one derives evolution equations at the (macroscopic, coarse) scale of interest; these are used to perform a variety of tasks (simulation, bifurcation analysis, optimization) using an arsenal of analytical and numerical techniques. For many complex systems, however, although one observes evolution at a macroscopic scale of interest, accurate models are only given at a more detailed (fine-scale, microscopic) level of description (e.g., lattice Boltzmann, kinetic Monte Carlo, molecular dynamics). Here, we review a framework for computer-aided multiscale analysis, which enables macroscopic computational tasks (over extended spatiotemporal scales) using only appropriately initialized microscopic simulation on short time and length scales. The methodology bypasses the derivation of macroscopic evolution equations when these equations conceptually exist but are not available in closed form-hence the term equation-free. We selectively discuss basic algorithms and underlying principles and illustrate the approach through representative applications. We also discuss potential difficulties and outline areas for future research.
NASA Astrophysics Data System (ADS)
Artrith, Nongnuch; Urban, Alexander; Ceder, Gerbrand
2018-06-01
The atomistic modeling of amorphous materials requires structure sizes and sampling statistics that are challenging to achieve with first-principles methods. Here, we propose a methodology to speed up the sampling of amorphous and disordered materials using a combination of a genetic algorithm and a specialized machine-learning potential based on artificial neural networks (ANNs). We show for the example of the amorphous LiSi alloy that around 1000 first-principles calculations are sufficient for the ANN-potential assisted sampling of low-energy atomic configurations in the entire amorphous LixSi phase space. The obtained phase diagram is validated by comparison with the results from an extensive sampling of LixSi configurations using molecular dynamics simulations and a general ANN potential trained to ˜45 000 first-principles calculations. This demonstrates the utility of the approach for the first-principles modeling of amorphous materials.
Deutsch, Maxime; Claiser, Nicolas; Pillet, Sébastien; Chumakov, Yurii; Becker, Pierre; Gillet, Jean Michel; Gillon, Béatrice; Lecomte, Claude; Souhassou, Mohamed
2012-11-01
New crystallographic tools were developed to access a more precise description of the spin-dependent electron density of magnetic crystals. The method combines experimental information coming from high-resolution X-ray diffraction (XRD) and polarized neutron diffraction (PND) in a unified model. A new algorithm that allows for a simultaneous refinement of the charge- and spin-density parameters against XRD and PND data is described. The resulting software MOLLYNX is based on the well known Hansen-Coppens multipolar model, and makes it possible to differentiate the electron spins. This algorithm is validated and demonstrated with a molecular crystal formed by a bimetallic chain, MnCu(pba)(H(2)O)(3)·2H(2)O, for which XRD and PND data are available. The joint refinement provides a more detailed description of the spin density than the refinement from PND data alone.
Predicting DNA hybridization kinetics from sequence
NASA Astrophysics Data System (ADS)
Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu
2018-01-01
Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.
Cost-effectiveness of WHO-Recommended Algorithms for TB Case Finding at Ethiopian HIV Clinics.
Adelman, Max W; McFarland, Deborah A; Tsegaye, Mulugeta; Aseffa, Abraham; Kempker, Russell R; Blumberg, Henry M
2018-01-01
The World Health Organization (WHO) recommends active tuberculosis (TB) case finding and a rapid molecular diagnostic test (Xpert MTB/RIF) to detect TB among people living with HIV (PLHIV) in high-burden settings. Information on the cost-effectiveness of these recommended strategies is crucial for their implementation. We conducted a model-based cost-effectiveness analysis comparing 2 algorithms for TB screening and diagnosis at Ethiopian HIV clinics: (1) WHO-recommended symptom screen combined with Xpert for PLHIV with a positive symptom screen and (2) current recommended practice algorithm (CRPA; based on symptom screening, smear microscopy, and clinical TB diagnosis). Our primary outcome was US$ per disability-adjusted life-year (DALY) averted. Secondary outcomes were additional true-positive diagnoses, and false-negative and false-positive diagnoses averted. Compared with CRPA, combining a WHO-recommended symptom screen with Xpert was highly cost-effective (incremental cost of $5 per DALY averted). Among a cohort of 15 000 PLHIV with a TB prevalence of 6% (900 TB cases), this algorithm detected 8 more true-positive cases than CRPA, and averted 2045 false-positive and 8 false-negative diagnoses compared with CRPA. The WHO-recommended algorithm was marginally costlier ($240 000) than CRPA ($239 000). In sensitivity analysis, the symptom screen/Xpert algorithm was dominated at low Xpert sensitivity (66%). In this model-based analysis, combining a WHO-recommended symptom screen with Xpert for TB diagnosis among PLHIV was highly cost-effective ($5 per DALY averted) and more sensitive than CRPA in a high-burden, resource-limited setting.
NASA Astrophysics Data System (ADS)
Feskov, Serguei V.; Ivanov, Anatoly I.
2018-03-01
An approach to the construction of diabatic free energy surfaces (FESs) for ultrafast electron transfer (ET) in a supramolecule with an arbitrary number of electron localization centers (redox sites) is developed, supposing that the reorganization energies for the charge transfers and shifts between all these centers are known. Dimensionality of the coordinate space required for the description of multistage ET in this supramolecular system is shown to be equal to N - 1, where N is the number of the molecular centers involved in the reaction. The proposed algorithm of FES construction employs metric properties of the coordinate space, namely, relation between the solvent reorganization energy and the distance between the two FES minima. In this space, the ET reaction coordinate zn n' associated with electron transfer between the nth and n'th centers is calculated through the projection to the direction, connecting the FES minima. The energy-gap reaction coordinates zn n' corresponding to different ET processes are not in general orthogonal so that ET between two molecular centers can create nonequilibrium distribution, not only along its own reaction coordinate but along other reaction coordinates too. This results in the influence of the preceding ET steps on the kinetics of the ensuing ET. It is important for the ensuing reaction to be ultrafast to proceed in parallel with relaxation along the ET reaction coordinates. Efficient algorithms for numerical simulation of multistage ET within the stochastic point-transition model are developed. The algorithms are based on the Brownian simulation technique with the recrossing-event detection procedure. The main advantages of the numerical method are (i) its computational complexity is linear with respect to the number of electronic states involved and (ii) calculations can be naturally parallelized up to the level of individual trajectories. The efficiency of the proposed approach is demonstrated for a model supramolecular system involving four redox centers.
Learning reduced kinetic Monte Carlo models of complex chemistry from molecular dynamics.
Yang, Qian; Sing-Long, Carlos A; Reed, Evan J
2017-08-01
We propose a novel statistical learning framework for automatically and efficiently building reduced kinetic Monte Carlo (KMC) models of large-scale elementary reaction networks from data generated by a single or few molecular dynamics simulations (MD). Existing approaches for identifying species and reactions from molecular dynamics typically use bond length and duration criteria, where bond duration is a fixed parameter motivated by an understanding of bond vibrational frequencies. In contrast, we show that for highly reactive systems, bond duration should be a model parameter that is chosen to maximize the predictive power of the resulting statistical model. We demonstrate our method on a high temperature, high pressure system of reacting liquid methane, and show that the learned KMC model is able to extrapolate more than an order of magnitude in time for key molecules. Additionally, our KMC model of elementary reactions enables us to isolate the most important set of reactions governing the behavior of key molecules found in the MD simulation. We develop a new data-driven algorithm to reduce the chemical reaction network which can be solved either as an integer program or efficiently using L1 regularization, and compare our results with simple count-based reduction. For our liquid methane system, we discover that rare reactions do not play a significant role in the system, and find that less than 7% of the approximately 2000 reactions observed from molecular dynamics are necessary to reproduce the molecular concentration over time of methane. The framework described in this work paves the way towards a genomic approach to studying complex chemical systems, where expensive MD simulation data can be reused to contribute to an increasingly large and accurate genome of elementary reactions and rates.
Learning reduced kinetic Monte Carlo models of complex chemistry from molecular dynamics
Sing-Long, Carlos A.
2017-01-01
We propose a novel statistical learning framework for automatically and efficiently building reduced kinetic Monte Carlo (KMC) models of large-scale elementary reaction networks from data generated by a single or few molecular dynamics simulations (MD). Existing approaches for identifying species and reactions from molecular dynamics typically use bond length and duration criteria, where bond duration is a fixed parameter motivated by an understanding of bond vibrational frequencies. In contrast, we show that for highly reactive systems, bond duration should be a model parameter that is chosen to maximize the predictive power of the resulting statistical model. We demonstrate our method on a high temperature, high pressure system of reacting liquid methane, and show that the learned KMC model is able to extrapolate more than an order of magnitude in time for key molecules. Additionally, our KMC model of elementary reactions enables us to isolate the most important set of reactions governing the behavior of key molecules found in the MD simulation. We develop a new data-driven algorithm to reduce the chemical reaction network which can be solved either as an integer program or efficiently using L1 regularization, and compare our results with simple count-based reduction. For our liquid methane system, we discover that rare reactions do not play a significant role in the system, and find that less than 7% of the approximately 2000 reactions observed from molecular dynamics are necessary to reproduce the molecular concentration over time of methane. The framework described in this work paves the way towards a genomic approach to studying complex chemical systems, where expensive MD simulation data can be reused to contribute to an increasingly large and accurate genome of elementary reactions and rates. PMID:28989618
Learning reduced kinetic Monte Carlo models of complex chemistry from molecular dynamics
Yang, Qian; Sing-Long, Carlos A.; Reed, Evan J.
2017-06-19
Here, we propose a novel statistical learning framework for automatically and efficiently building reduced kinetic Monte Carlo (KMC) models of large-scale elementary reaction networks from data generated by a single or few molecular dynamics simulations (MD). Existing approaches for identifying species and reactions from molecular dynamics typically use bond length and duration criteria, where bond duration is a fixed parameter motivated by an understanding of bond vibrational frequencies. Conversely, we show that for highly reactive systems, bond duration should be a model parameter that is chosen to maximize the predictive power of the resulting statistical model. We demonstrate our methodmore » on a high temperature, high pressure system of reacting liquid methane, and show that the learned KMC model is able to extrapolate more than an order of magnitude in time for key molecules. Additionally, our KMC model of elementary reactions enables us to isolate the most important set of reactions governing the behavior of key molecules found in the MD simulation. We develop a new data-driven algorithm to reduce the chemical reaction network which can be solved either as an integer program or efficiently using L1 regularization, and compare our results with simple count-based reduction. For our liquid methane system, we discover that rare reactions do not play a significant role in the system, and find that less than 7% of the approximately 2000 reactions observed from molecular dynamics are necessary to reproduce the molecular concentration over time of methane. Furthermore, we describe a framework in this work that paves the way towards a genomic approach to studying complex chemical systems, where expensive MD simulation data can be reused to contribute to an increasingly large and accurate genome of elementary reactions and rates.« less
Schiffmann, Christoph; Sebastiani, Daniel
2011-05-10
We present an algorithmic extension of a numerical optimization scheme for analytic capping potentials for use in mixed quantum-classical (quantum mechanical/molecular mechanical, QM/MM) ab initio calculations. Our goal is to minimize bond-cleavage-induced perturbations in the electronic structure, measured by means of a suitable penalty functional. The optimization algorithm-a variant of the artificial bee colony (ABC) algorithm, which relies on swarm intelligence-couples deterministic (downhill gradient) and stochastic elements to avoid local minimum trapping. The ABC algorithm outperforms the conventional downhill gradient approach, if the penalty hypersurface exhibits wiggles that prevent a straight minimization pathway. We characterize the optimized capping potentials by computing NMR chemical shifts. This approach will increase the accuracy of QM/MM calculations of complex biomolecules.
Global modeling of thermospheric airglow in the far ultraviolet
NASA Astrophysics Data System (ADS)
Solomon, Stanley C.
2017-07-01
The Global Airglow (GLOW) model has been updated and extended to calculate thermospheric emissions in the far ultraviolet, including sources from daytime photoelectron-driven processes, nighttime recombination radiation, and auroral excitation. It can be run using inputs from empirical models of the neutral atmosphere and ionosphere or from numerical general circulation models of the coupled ionosphere-thermosphere system. It uses a solar flux module, photoelectron generation routine, and the Nagy-Banks two-stream electron transport algorithm to simultaneously handle energetic electron distributions from photon and auroral electron sources. It contains an ion-neutral chemistry module that calculates excited and ionized species densities and the resulting airglow volume emission rates. This paper describes the inputs, algorithms, and code structure of the model and demonstrates example outputs for daytime and auroral cases. Simulations of far ultraviolet emissions by the atomic oxygen doublet at 135.6 nm and the molecular nitrogen Lyman-Birge-Hopfield bands, as viewed from geostationary orbit, are shown, and model calculations are compared to limb-scan observations by the Global Ultraviolet Imager on the TIMED satellite. The GLOW model code is provided to the community through an open-source academic research license.
Li, Kenli; Zou, Shuting; Xv, Jin
2008-01-01
Elliptic curve cryptographic algorithms convert input data to unrecognizable encryption and the unrecognizable data back again into its original decrypted form. The security of this form of encryption hinges on the enormous difficulty that is required to solve the elliptic curve discrete logarithm problem (ECDLP), especially over GF(2(n)), n in Z+. This paper describes an effective method to find solutions to the ECDLP by means of a molecular computer. We propose that this research accomplishment would represent a breakthrough for applied biological computation and this paper demonstrates that in principle this is possible. Three DNA-based algorithms: a parallel adder, a parallel multiplier, and a parallel inverse over GF(2(n)) are described. The biological operation time of all of these algorithms is polynomial with respect to n. Considering this analysis, cryptography using a public key might be less secure. In this respect, a principal contribution of this paper is to provide enhanced evidence of the potential of molecular computing to tackle such ambitious computations.
Li, Kenli; Zou, Shuting; Xv, Jin
2008-01-01
Elliptic curve cryptographic algorithms convert input data to unrecognizable encryption and the unrecognizable data back again into its original decrypted form. The security of this form of encryption hinges on the enormous difficulty that is required to solve the elliptic curve discrete logarithm problem (ECDLP), especially over GF(2n), n ∈ Z+. This paper describes an effective method to find solutions to the ECDLP by means of a molecular computer. We propose that this research accomplishment would represent a breakthrough for applied biological computation and this paper demonstrates that in principle this is possible. Three DNA-based algorithms: a parallel adder, a parallel multiplier, and a parallel inverse over GF(2n) are described. The biological operation time of all of these algorithms is polynomial with respect to n. Considering this analysis, cryptography using a public key might be less secure. In this respect, a principal contribution of this paper is to provide enhanced evidence of the potential of molecular computing to tackle such ambitious computations. PMID:18431451
Liu, Xiaofeng; Bai, Fang; Ouyang, Sisheng; Wang, Xicheng; Li, Honglin; Jiang, Hualiang
2009-03-31
Conformation generation is a ubiquitous problem in molecule modelling. Many applications require sampling the broad molecular conformational space or perceiving the bioactive conformers to ensure success. Numerous in silico methods have been proposed in an attempt to resolve the problem, ranging from deterministic to non-deterministic and systemic to stochastic ones. In this work, we described an efficient conformation sampling method named Cyndi, which is based on multi-objective evolution algorithm. The conformational perturbation is subjected to evolutionary operation on the genome encoded with dihedral torsions. Various objectives are designated to render the generated Pareto optimal conformers to be energy-favoured as well as evenly scattered across the conformational space. An optional objective concerning the degree of molecular extension is added to achieve geometrically extended or compact conformations which have been observed to impact the molecular bioactivity (J Comput -Aided Mol Des 2002, 16: 105-112). Testing the performance of Cyndi against a test set consisting of 329 small molecules reveals an average minimum RMSD of 0.864 A to corresponding bioactive conformations, indicating Cyndi is highly competitive against other conformation generation methods. Meanwhile, the high-speed performance (0.49 +/- 0.18 seconds per molecule) renders Cyndi to be a practical toolkit for conformational database preparation and facilitates subsequent pharmacophore mapping or rigid docking. The copy of precompiled executable of Cyndi and the test set molecules in mol2 format are accessible in Additional file 1. On the basis of MOEA algorithm, we present a new, highly efficient conformation generation method, Cyndi, and report the results of validation and performance studies comparing with other four methods. The results reveal that Cyndi is capable of generating geometrically diverse conformers and outperforms other four multiple conformer generators in the case of reproducing the bioactive conformations against 329 structures. The speed advantage indicates Cyndi is a powerful alternative method for extensive conformational sampling and large-scale conformer database preparation.
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
NASA Astrophysics Data System (ADS)
Masoumi, Massoud; Raissi, Farshid; Ahmadian, Mahmoud; Keshavarzi, Parviz
2006-01-01
We are proposing that the recently proposed semiconductor-nanowire-molecular architecture (CMOL) is an optimum platform to realize encryption algorithms. The basic modules for the advanced encryption standard algorithm (Rijndael) have been designed using CMOL architecture. The performance of this design has been evaluated with respect to chip area and speed. It is observed that CMOL provides considerable improvement over implementation with regular CMOS architecture even with a 20% defect rate. Pseudo-optimum gate placement and routing are provided for Rijndael building blocks and the possibility of designing high speed, attack tolerant and long key encryptions are discussed.
MaMiCo: Transient multi-instance molecular-continuum flow simulation on supercomputers
NASA Astrophysics Data System (ADS)
Neumann, Philipp; Bian, Xin
2017-11-01
We present extensions of the macro-micro-coupling tool MaMiCo, which was designed to couple continuum fluid dynamics solvers with discrete particle dynamics. To enable local extraction of smooth flow field quantities especially on rather short time scales, sampling over an ensemble of molecular dynamics simulations is introduced. We provide details on these extensions including the transient coupling algorithm, open boundary forcing, and multi-instance sampling. Furthermore, we validate the coupling in Couette flow using different particle simulation software packages and particle models, i.e. molecular dynamics and dissipative particle dynamics. Finally, we demonstrate the parallel scalability of the molecular-continuum simulations by using up to 65 536 compute cores of the supercomputer Shaheen II located at KAUST. Program Files doi:http://dx.doi.org/10.17632/w7rgdrhb85.1 Licensing provisions: BSD 3-clause Programming language: C, C++ External routines/libraries: For compiling: SCons, MPI (optional) Subprograms used: ESPResSo, LAMMPS, ls1 mardyn, waLBerla For installation procedures of the MaMiCo interfaces, see the README files in the respective code directories located in coupling/interface/impl. Journal reference of previous version: P. Neumann, H. Flohr, R. Arora, P. Jarmatz, N. Tchipev, H.-J. Bungartz. MaMiCo: Software design for parallel molecular-continuum flow simulations, Computer Physics Communications 200: 324-335, 2016 Does the new version supersede the previous version?: Yes. The functionality of the previous version is completely retained in the new version. Nature of problem: Coupled molecular-continuum simulation for multi-resolution fluid dynamics: parts of the domain are resolved by molecular dynamics or another particle-based solver whereas large parts are covered by a mesh-based CFD solver, e.g. a lattice Boltzmann automaton. Solution method: We couple existing MD and CFD solvers via MaMiCo (macro-micro coupling tool). Data exchange and coupling algorithmics are abstracted and incorporated in MaMiCo. Once an algorithm is set up in MaMiCo, it can be used and extended, even if other solvers are used (as soon as the respective interfaces are implemented/available). Reasons for the new version: We have incorporated a new algorithm to simulate transient molecular-continuum systems and to automatically sample data over multiple MD runs that can be executed simultaneously (on, e.g., a compute cluster). MaMiCo has further been extended by an interface to incorporate boundary forcing to account for open molecular dynamics boundaries. Besides support for coupling with various MD and CFD frameworks, the new version contains a test case that allows to run molecular-continuum Couette flow simulations out-of-the-box. No external tools or simulation codes are required anymore. However, the user is free to switch from the included MD simulation package to LAMMPS. For details on how to run the transient Couette problem, see the file README in the folder coupling/tests, Remark on MaMiCo V1.1. Summary of revisions: Open boundary forcing; Multi-instance MD sampling; support for transient molecular-continuum systems Restrictions: Currently, only single-centered systems are supported. For access to the LAMMPS-based implementation of DPD boundary forcing, please contact Xin Bian, xin.bian@tum.de. Additional comments: Please see file license_mamico.txt for further details regarding distribution and advertising of this software.
Samant, Asawari; Ogunnaike, Babatunde A; Vlachos, Dionisios G
2007-05-24
The fundamental role that intrinsic stochasticity plays in cellular functions has been shown via numerous computational and experimental studies. In the face of such evidence, it is important that intracellular networks are simulated with stochastic algorithms that can capture molecular fluctuations. However, separation of time scales and disparity in species population, two common features of intracellular networks, make stochastic simulation of such networks computationally prohibitive. While recent work has addressed each of these challenges separately, a generic algorithm that can simultaneously tackle disparity in time scales and population scales in stochastic systems is currently lacking. In this paper, we propose the hybrid, multiscale Monte Carlo (HyMSMC) method that fills in this void. The proposed HyMSMC method blends stochastic singular perturbation concepts, to deal with potential stiffness, with a hybrid of exact and coarse-grained stochastic algorithms, to cope with separation in population sizes. In addition, we introduce the computational singular perturbation (CSP) method as a means of systematically partitioning fast and slow networks and computing relaxation times for convergence. We also propose a new criteria of convergence of fast networks to stochastic low-dimensional manifolds, which further accelerates the algorithm. We use several prototype and biological examples, including a gene expression model displaying bistability, to demonstrate the efficiency, accuracy and applicability of the HyMSMC method. Bistable models serve as stringent tests for the success of multiscale MC methods and illustrate limitations of some literature methods.
Network Security via Biometric Recognition of Patterns of Gene Expression
NASA Technical Reports Server (NTRS)
Shaw, Harry C.
2016-01-01
Molecular biology provides the ability to implement forms of information and network security completely outside the bounds of legacy security protocols and algorithms. This paper addresses an approach which instantiates the power of gene expression for security. Molecular biology provides a rich source of gene expression and regulation mechanisms, which can be adopted to use in the information and electronic communication domains. Conventional security protocols are becoming increasingly vulnerable due to more intensive, highly capable attacks on the underlying mathematics of cryptography. Security protocols are being undermined by social engineering and substandard implementations by IT (Information Technology) organizations. Molecular biology can provide countermeasures to these weak points with the current security approaches. Future advances in instruments for analyzing assays will also enable this protocol to advance from one of cryptographic algorithms to an integrated system of cryptographic algorithms and real-time assays of gene expression products.
Network Security via Biometric Recognition of Patterns of Gene Expression
NASA Technical Reports Server (NTRS)
Shaw, Harry C.
2016-01-01
Molecular biology provides the ability to implement forms of information and network security completely outside the bounds of legacy security protocols and algorithms. This paper addresses an approach which instantiates the power of gene expression for security. Molecular biology provides a rich source of gene expression and regulation mechanisms, which can be adopted to use in the information and electronic communication domains. Conventional security protocols are becoming increasingly vulnerable due to more intensive, highly capable attacks on the underlying mathematics of cryptography. Security protocols are being undermined by social engineering and substandard implementations by IT organizations. Molecular biology can provide countermeasures to these weak points with the current security approaches. Future advances in instruments for analyzing assays will also enable this protocol to advance from one of cryptographic algorithms to an integrated system of cryptographic algorithms and real-time expression and assay of gene expression products.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao
2014-02-01
We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
Amber Plug-In for Protein Shop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oliva, Ricardo
2004-05-10
The Amber Plug-in for ProteinShop has two main components: an AmberEngine library to compute the protein energy models, and a module to solve the energy minimization problem using an optimization algorithm in the OPTI-+ library. Together, these components allow the visualization of the protein folding process in ProteinShop. AmberEngine is a object-oriented library to compute molecular energies based on the Amber model. The main class is called ProteinEnergy. Its main interface methods are (1) "init" to initialize internal variables needed to compute the energy. (2) "eval" to evaluate the total energy given a vector of coordinates. Additional methods allow themore » user to evaluate the individual components of the energy model (bond, angle, dihedral, non-bonded-1-4, and non-bonded energies) and to obtain the energy of each individual atom. The Amber Engine library source code includes examples and test routines that illustrate the use of the library in stand alone programs. The energy minimization module uses the AmberEngine library and the nonlinear optimization library OPT++. OPT++ is open source software available under the GNU Lesser General Public License. The minimization module currently makes use of the LBFGS optimization algorithm in OPT++ to perform the energy minimization. Future releases may give the user a choice of other algorithms available in OPT++.« less
The Aquarius Salinity Retrieval Algorithm
NASA Technical Reports Server (NTRS)
Meissner, Thomas; Wentz, Frank; Hilburn, Kyle; Lagerloef, Gary; Le Vine, David
2012-01-01
The first part of this presentation gives an overview over the Aquarius salinity retrieval algorithm. The instrument calibration [2] converts Aquarius radiometer counts into antenna temperatures (TA). The salinity retrieval algorithm converts those TA into brightness temperatures (TB) at a flat ocean surface. As a first step, contributions arising from the intrusion of solar, lunar and galactic radiation are subtracted. The antenna pattern correction (APC) removes the effects of cross-polarization contamination and spillover. The Aquarius radiometer measures the 3rd Stokes parameter in addition to vertical (v) and horizontal (h) polarizations, which allows for an easy removal of ionospheric Faraday rotation. The atmospheric absorption at L-band is almost entirely due to molecular oxygen, which can be calculated based on auxiliary input fields from numerical weather prediction models and then successively removed from the TB. The final step in the TA to TB conversion is the correction for the roughness of the sea surface due to wind, which is addressed in more detail in section 3. The TB of the flat ocean surface can now be matched to a salinity value using a surface emission model that is based on a model for the dielectric constant of sea water [3], [4] and an auxiliary field for the sea surface temperature. In the current processing only v-pol TB are used for this last step.
Reactivity of bromoalkanes in reactions of coordinated molecular decay
NASA Astrophysics Data System (ADS)
Pokidova, T. S.; Denisov, E. T.
2016-09-01
The results from experiments on reactions of the coordinated molecular decay of RBr bromoalkanes on olefin and HBr are analyzed using the model of intersecting parabolas (MIP). Kinetic parameters within the MIP are calculated from the experimental data, enabling calculation of the activation energies ( E) and rate constants ( k) of such reactions, based on the enthalphy of the reaction and the MIP algorithms. The factors affecting the E of the RBr decay reaction are established: the enthalphy of the reaction, triplet repulsion, the energy of radical R• stabilization, the presence of a π bond adjacent to the reaction center, and the dipole-dipole interaction of polar groups. The energy spectrum of the partial energies of activation is constructed for the reaction of coordinated molecular decay of RBr, and the E and k of inverse addition reactions are evaluated.
On models of the genetic code generated by binary dichotomic algorithms.
Gumbel, Markus; Fimmel, Elena; Danielli, Alberto; Strüngmann, Lutz
2015-02-01
In this paper we introduce the concept of a BDA-generated model of the genetic code which is based on binary dichotomic algorithms (BDAs). A BDA-generated model is based on binary dichotomic algorithms (BDAs). Such a BDA partitions the set of 64 codons into two disjoint classes of size 32 each and provides a generalization of known partitions like the Rumer dichotomy. We investigate what partitions can be generated when a set of different BDAs is applied sequentially to the set of codons. The search revealed that these models are able to generate code tables with very different numbers of classes ranging from 2 to 64. We have analyzed whether there are models that map the codons to their amino acids. A perfect matching is not possible. However, we present models that describe the standard genetic code with only few errors. There are also models that map all 64 codons uniquely to 64 classes showing that BDAs can be used to identify codons precisely. This could serve as a basis for further mathematical analysis using coding theory, for example. The hypothesis that BDAs might reflect a molecular mechanism taking place in the decoding center of the ribosome is discussed. The scan demonstrated that binary dichotomic partitions are able to model different aspects of the genetic code very well. The search was performed with our tool Beady-A. This software is freely available at http://mi.informatik.hs-mannheim.de/beady-a. It requires a JVM version 6 or higher. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
He, Dan; Kuhn, David; Parida, Laxmi
2016-06-15
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.
Finite element model updating using the shadow hybrid Monte Carlo technique
NASA Astrophysics Data System (ADS)
Boulkaibet, I.; Mthembu, L.; Marwala, T.; Friswell, M. I.; Adhikari, S.
2015-02-01
Recent research in the field of finite element model updating (FEM) advocates the adoption of Bayesian analysis techniques to dealing with the uncertainties associated with these models. However, Bayesian formulations require the evaluation of the Posterior Distribution Function which may not be available in analytical form. This is the case in FEM updating. In such cases sampling methods can provide good approximations of the Posterior distribution when implemented in the Bayesian context. Markov Chain Monte Carlo (MCMC) algorithms are the most popular sampling tools used to sample probability distributions. However, the efficiency of these algorithms is affected by the complexity of the systems (the size of the parameter space). The Hybrid Monte Carlo (HMC) offers a very important MCMC approach to dealing with higher-dimensional complex problems. The HMC uses the molecular dynamics (MD) steps as the global Monte Carlo (MC) moves to reach areas of high probability where the gradient of the log-density of the Posterior acts as a guide during the search process. However, the acceptance rate of HMC is sensitive to the system size as well as the time step used to evaluate the MD trajectory. To overcome this limitation we propose the use of the Shadow Hybrid Monte Carlo (SHMC) algorithm. The SHMC algorithm is a modified version of the Hybrid Monte Carlo (HMC) and designed to improve sampling for large-system sizes and time steps. This is done by sampling from a modified Hamiltonian function instead of the normal Hamiltonian function. In this paper, the efficiency and accuracy of the SHMC method is tested on the updating of two real structures; an unsymmetrical H-shaped beam structure and a GARTEUR SM-AG19 structure and is compared to the application of the HMC algorithm on the same structures.
Prior knowledge guided active modules identification: an integrated multi-objective approach.
Chen, Weiqi; Liu, Jing; He, Shan
2017-03-14
Active module, defined as an area in biological network that shows striking changes in molecular activity or phenotypic signatures, is important to reveal dynamic and process-specific information that is correlated with cellular or disease states. A prior information guided active module identification approach is proposed to detect modules that are both active and enriched by prior knowledge. We formulate the active module identification problem as a multi-objective optimisation problem, which consists two conflicting objective functions of maximising the coverage of known biological pathways and the activity of the active module simultaneously. Network is constructed from protein-protein interaction database. A beta-uniform-mixture model is used to estimate the distribution of p-values and generate scores for activity measurement from microarray data. A multi-objective evolutionary algorithm is used to search for Pareto optimal solutions. We also incorporate a novel constraints based on algebraic connectivity to ensure the connectedness of the identified active modules. Application of proposed algorithm on a small yeast molecular network shows that it can identify modules with high activities and with more cross-talk nodes between related functional groups. The Pareto solutions generated by the algorithm provides solutions with different trade-off between prior knowledge and novel information from data. The approach is then applied on microarray data from diclofenac-treated yeast cells to build network and identify modules to elucidate the molecular mechanisms of diclofenac toxicity and resistance. Gene ontology analysis is applied to the identified modules for biological interpretation. Integrating knowledge of functional groups into the identification of active module is an effective method and provides a flexible control of balance between pure data-driven method and prior information guidance.
Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun
2017-11-27
Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.
Helaers, Raphaël; Milinkovitch, Michel C
2010-07-15
The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org.
2010-01-01
Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org. PMID:20633263
Symbol interval optimization for molecular communication with drift.
Kim, Na-Rae; Eckford, Andrew W; Chae, Chan-Byoung
2014-09-01
In this paper, we propose a symbol interval optimization algorithm in molecular communication with drift. Proper symbol intervals are important in practical communication systems since information needs to be sent as fast as possible with low error rates. There is a trade-off, however, between symbol intervals and inter-symbol interference (ISI) from Brownian motion. Thus, we find proper symbol interval values considering the ISI inside two kinds of blood vessels, and also suggest no ISI system for strong drift models. Finally, an isomer-based molecule shift keying (IMoSK) is applied to calculate achievable data transmission rates (achievable rates, hereafter). Normalized achievable rates are also obtained and compared in one-symbol ISI and no ISI systems.
O'Hara, F. Patrick; Suaya, Jose A.; Ray, G. Thomas; Baxter, Roger; Brown, Megan L.; Mera, Robertino M.; Close, Nicole M.; Thomas, Elizabeth
2016-01-01
A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants. PMID:26669861
O'Hara, F Patrick; Suaya, Jose A; Ray, G Thomas; Baxter, Roger; Brown, Megan L; Mera, Robertino M; Close, Nicole M; Thomas, Elizabeth; Amrine-Madsen, Heather
2016-01-01
A number of molecular typing methods have been developed for characterization of Staphylococcus aureus isolates. The utility of these systems depends on the nature of the investigation for which they are used. We compared two commonly used methods of molecular typing, multilocus sequence typing (MLST) (and its clustering algorithm, Based Upon Related Sequence Type [BURST]) with the staphylococcal protein A (spa) typing (and its clustering algorithm, Based Upon Repeat Pattern [BURP]), to assess the utility of these methods for macroepidemiology and evolutionary studies of S. aureus in the United States. We typed a total of 366 clinical isolates of S. aureus by these methods and evaluated indices of diversity and concordance values. Our results show that, when combined with the BURP clustering algorithm to delineate clonal lineages, spa typing produces results that are highly comparable with those produced by MLST/BURST. Therefore, spa typing is appropriate for use in macroepidemiology and evolutionary studies and, given its lower implementation cost, this method appears to be more efficient. The findings are robust and are consistent across different settings, patient ages, and specimen sources. Our results also support a model in which the methicillin-resistant S. aureus (MRSA) population in the United States comprises two major lineages (USA300 and USA100), which each consist of closely related variants.
Deeth, Robert J
2008-08-04
A general molecular mechanics method is presented for modeling the symmetric bidentate, asymmetric bidentate, and bridging modes of metal-carboxylates with a single parameter set by using a double-minimum M-O-C angle-bending potential. The method is implemented within the Molecular Operating Environment (MOE) with parameters based on the Merck molecular force field although, with suitable modifications, other MM packages and force fields could easily be used. Parameters for high-spin d (5) manganese(II) bound to carboxylate and water plus amine, pyridyl, imidazolyl, and pyrazolyl donors are developed based on 26 mononuclear and 29 dinuclear crystallographically characterized complexes. The average rmsd for Mn-L distances is 0.08 A, which is comparable to the experimental uncertainty required to cover multiple binding modes, and the average rmsd in heavy atom positions is around 0.5 A. In all cases, whatever binding mode is reported is also computed to be a stable local minimum. In addition, the structure-based parametrization implicitly captures the energetics and gives the same relative energies of symmetric and asymmetric coordination modes as density functional theory calculations in model and "real" complexes. Molecular dynamics simulations show that carboxylate rotation is favored over "flipping" while a stochastic search algorithm is described for randomly searching conformational space. The model reproduces Mn-Mn distances in dinuclear systems especially accurately, and this feature is employed to illustrate how MM calculations on models for the dimanganese active site of methionine aminopeptidase can help determine some of the details which may be missing from the experimental structure.
Molecular Signature for Lymphatic Invasion Associated with Survival of Epithelial Ovarian Cancer.
Paik, E Sun; Choi, Hyun Jin; Kim, Tae-Joong; Lee, Jeong-Won; Kim, Byoung-Gie; Bae, Duk-Soo; Choi, Chel Hun
2018-04-01
We aimed to develop molecular classifier that can predict lymphatic invasion and their clinical significance in epithelial ovarian cancer (EOC) patients. We analyzed gene expression (mRNA, methylated DNA) in data from The Cancer Genome Atlas. To identify molecular signatures for lymphatic invasion, we found differentially expressed genes. The performance of classifier was validated by receiver operating characteristics analysis, logistic regression, linear discriminant analysis (LDA), and support vector machine (SVM). We assessed prognostic role of classifier using random survival forest (RSF) model and pathway deregulation score (PDS). For external validation,we analyzed microarray data from 26 EOC samples of Samsung Medical Center and curatedOvarianData database. We identified 21 mRNAs, and seven methylated DNAs from primary EOC tissues that predicted lymphatic invasion and created prognostic models. The classifier predicted lymphatic invasion well, which was validated by logistic regression, LDA, and SVM algorithm (C-index of 0.90, 0.71, and 0.74 for mRNA and C-index of 0.64, 0.68, and 0.69 for DNA methylation). Using RSF model, incorporating molecular data with clinical variables improved prediction of progression-free survival compared with using only clinical variables (p < 0.001 and p=0.008). Similarly, PDS enabled us to classify patients into high-risk and low-risk group, which resulted in survival difference in mRNA profiles (log-rank p-value=0.011). In external validation, gene signature was well correlated with prediction of lymphatic invasion and patients' survival. Molecular signature model predicting lymphatic invasion was well performed and also associated with survival of EOC patients.
Visual verification and analysis of cluster detection for molecular dynamics.
Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas
2007-01-01
A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.
Clinical librarian support for rapid review of clinical utility of cancer molecular biomarkers.
Geng, Yimin; Fowler, Clara S; Fulton, Stephanie
2015-01-01
The clinical librarian used a restricted literature searching and quality-filtering approach to provide relevant clinical evidence for the use of cancer molecular biomarkers by institutional policy makers and clinicians in the rapid review process. The librarian-provided evidence was compared with the cited references in the institutional molecular biomarker algorithm. The overall incorporation rate of the librarian-provided references into the algorithm was above 80%. This study suggests the usefulness of clinical librarian expertise for clinical practice. The searching and filtering methods for high-level evidence can be adopted by information professionals who are involved in the rapid literature review.
Patel, Saumya K; Khedkar, Vijay M; Jha, Prakash C; Jasrai, Yogesh T; Pandya, Himanshu A; George, Linz-Buoy; Highland, Hyacinth N; Skelton, Adam A
2016-01-01
Phytochemicals of Catharanthus roseus Linn. and Tylophora indica have been known for their inhibition of malarial parasite, Plasmodium falciparum in cell culture. Resistance to chloroquine (CQ), a widely used antimalarial drug, is due to the CQ resistance transporter (CRT) system. The present study deals with computational modeling of Plasmodium falciparum chloroquine resistance transporter (PfCRT) protein and development of charged environment to mimic a condition of resistance. The model of PfCRT was developed using Protein homology/analogy engine (PHYRE ver 0.2) and was validated based on the results obtained using PSI-PRED. Subsequently, molecular interactions of selected phytochemicals extracted from C. roseus Linn. and T. indica were studied using multiple-iterated genetic algorithm-based docking protocol in order to investigate the translocation of these legends across the PfCRT protein. Further, molecular dynamics studies exhibiting interaction energy estimates of these compounds within the active site of the protein showed that compounds are more selective toward PfCRT. Clusters of conformations with the free energy of binding were estimated which clearly demonstrated the potential channel and by this means the translocation across the PfCRT is anticipated.
Biophysical Discovery through the Lens of a Computational Microscope
NASA Astrophysics Data System (ADS)
Amaro, Rommie
With exascale computing power on the horizon, improvements in the underlying algorithms and available structural experimental data are enabling new paradigms for chemical discovery. My work has provided key insights for the systematic incorporation of structural information resulting from state-of-the-art biophysical simulations into protocols for inhibitor and drug discovery. We have shown that many disease targets have druggable pockets that are otherwise ``hidden'' in high resolution x-ray structures, and that this is a common theme across a wide range of targets in different disease areas. We continue to push the limits of computational biophysical modeling by expanding the time and length scales accessible to molecular simulation. My sights are set on, ultimately, the development of detailed physical models of cells, as the fundamental unit of life, and two recent achievements highlight our efforts in this arena. First is the development of a molecular and Brownian dynamics multi-scale modeling framework, which allows us to investigate drug binding kinetics in addition to thermodynamics. In parallel, we have made significant progress developing new tools to extend molecular structure to cellular environments. Collectively, these achievements are enabling the investigation of the chemical and biophysical nature of cells at unprecedented scales.
Desai, Bhargav; Hsu, Ying; Schneller, Benjamin; Hobbs, Jonathan G; Mehta, Ankit I; Linninger, Andreas
2016-09-01
Aquaporin-4 (AQP4) channels play an important role in brain water homeostasis. Water transport across plasma membranes has a critical role in brain water exchange of the normal and the diseased brain. AQP4 channels are implicated in the pathophysiology of hydrocephalus, a disease of water imbalance that leads to CSF accumulation in the ventricular system. Many molecular aspects of fluid exchange during hydrocephalus have yet to be firmly elucidated, but review of the literature suggests that modulation of AQP4 channel activity is a potentially attractive future pharmaceutical therapy. Drug therapy targeting AQP channels may enable control over water exchange to remove excess CSF through a molecular intervention instead of by mechanical shunting. This article is a review of a vast body of literature on the current understanding of AQP4 channels in relation to hydrocephalus, details regarding molecular aspects of AQP4 channels, possible drug development strategies, and limitations. Advances in medical imaging and computational modeling of CSF dynamics in the setting of hydrocephalus are summarized. Algorithmic developments in computational modeling continue to deepen the understanding of the hydrocephalus disease process and display promising potential benefit as a tool for physicians to evaluate patients with hydrocephalus.
Li, Qingli; Zhang, Jingfa; Wang, Yiting; Xu, Guoteng
2009-12-01
A molecular spectral imaging system has been developed based on microscopy and spectral imaging technology. The system is capable of acquiring molecular spectral images from 400 nm to 800 nm with 2 nm wavelength increments. The basic principles, instrumental systems, and system calibration method as well as its applications for the calculation of the stain-uptake by tissues are introduced. As a case study, the system is used for determining the pathogenesis of diabetic retinopathy and evaluating the therapeutic effects of erythropoietin. Some molecular spectral images of retinal sections of normal, diabetic, and treated rats were collected and analyzed. The typical transmittance curves of positive spots stained for albumin and advanced glycation end products are retrieved from molecular spectral data with the spectral response calibration algorithm. To explore and evaluate the protective effect of erythropoietin (EPO) on retinal albumin leakage of streptozotocin-induced diabetic rats, an algorithm based on Beer-Lambert's law is presented. The algorithm can assess the uptake by histologic retinal sections of stains used in quantitative pathology to label albumin leakage and advanced glycation end products formation. Experimental results show that the system is helpful for the ophthalmologist to reveal the pathogenesis of diabetic retinopathy and explore the protective effect of erythropoietin on retinal cells of diabetic rats. It also highlights the potential of molecular spectral imaging technology to provide more effective and reliable diagnostic criteria in pathology.
Ab initio molecular simulations with numeric atom-centered orbitals
NASA Astrophysics Data System (ADS)
Blum, Volker; Gehrke, Ralf; Hanke, Felix; Havu, Paula; Havu, Ville; Ren, Xinguo; Reuter, Karsten; Scheffler, Matthias
2009-11-01
We describe a complete set of algorithms for ab initio molecular simulations based on numerically tabulated atom-centered orbitals (NAOs) to capture a wide range of molecular and materials properties from quantum-mechanical first principles. The full algorithmic framework described here is embodied in the Fritz Haber Institute "ab initio molecular simulations" (FHI-aims) computer program package. Its comprehensive description should be relevant to any other first-principles implementation based on NAOs. The focus here is on density-functional theory (DFT) in the local and semilocal (generalized gradient) approximations, but an extension to hybrid functionals, Hartree-Fock theory, and MP2/GW electron self-energies for total energies and excited states is possible within the same underlying algorithms. An all-electron/full-potential treatment that is both computationally efficient and accurate is achieved for periodic and cluster geometries on equal footing, including relaxation and ab initio molecular dynamics. We demonstrate the construction of transferable, hierarchical basis sets, allowing the calculation to range from qualitative tight-binding like accuracy to meV-level total energy convergence with the basis set. Since all basis functions are strictly localized, the otherwise computationally dominant grid-based operations scale as O(N) with system size N. Together with a scalar-relativistic treatment, the basis sets provide access to all elements from light to heavy. Both low-communication parallelization of all real-space grid based algorithms and a ScaLapack-based, customized handling of the linear algebra for all matrix operations are possible, guaranteeing efficient scaling (CPU time and memory) up to massively parallel computer systems with thousands of CPUs.
Shi, Junwei; Zhang, Bin; Liu, Fei; Luo, Jianwen; Bai, Jing
2013-09-15
For the ill-posed fluorescent molecular tomography (FMT) inverse problem, the L1 regularization can protect the high-frequency information like edges while effectively reduce the image noise. However, the state-of-the-art L1 regularization-based algorithms for FMT reconstruction are expensive in memory, especially for large-scale problems. An efficient L1 regularization-based reconstruction algorithm based on nonlinear conjugate gradient with restarted strategy is proposed to increase the computational speed with low memory consumption. The reconstruction results from phantom experiments demonstrate that the proposed algorithm can obtain high spatial resolution and high signal-to-noise ratio, as well as high localization accuracy for fluorescence targets.
Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L; Hemmatzadeh, Farhid; Ebrahimie, Esmaeil
2014-01-01
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L.
2014-01-01
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics. PMID:24809455
A machine learning approach to computer-aided molecular design
NASA Astrophysics Data System (ADS)
Bolis, Giorgio; Di Pace, Luigi; Fabrocini, Filippo
1991-12-01
Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one — the specialization step — the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase — the generalization step — the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process.
NASA Astrophysics Data System (ADS)
Tuckerman, Mark
2006-03-01
One of the computational grand challenge problems is to develop methodology capable of sampling conformational equilibria in systems with rough energy landscapes. If met, many important problems, most notably protein folding, could be significantly impacted. In this talk, two new approaches for addressing this problem will be presented. First, it will be shown how molecular dynamics can be combined with a novel variable transformation designed to warp configuration space in such a way that barriers are reduced and attractive basins stretched. This method rigorously preserves equilibrium properties while leading to very large enhancements in sampling efficiency. Extensions of this approach to the calculation/exploration of free energy surfaces will be discussed. Next, a new very large time-step molecular dynamics method will be introduced that overcomes the resonances which plague many molecular dynamics algorithms. The performance of the methods is demonstrated on a variety of systems including liquid water, long polymer chains simple protein models, and oligopeptides.
Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering
NASA Technical Reports Server (NTRS)
Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland
2000-01-01
Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.
Pogliani, Lionello
2010-01-30
Twelve properties of a highly heterogeneous class of organic solvents have been modeled with a graph-theoretical molecular connectivity modified (MC) method, which allows to encode the core electrons and the hydrogen atoms. The graph-theoretical method uses the concepts of simple, general, and complete graphs, where these last types of graphs are used to encode the core electrons. The hydrogen atoms have been encoded by the aid of a graph-theoretical perturbation parameter, which contributes to the definition of the valence delta, delta(v), a key parameter in molecular connectivity studies. The model of the twelve properties done with a stepwise search algorithm is always satisfactory, and it allows to check the influence of the hydrogen content of the solvent molecules on the choice of the type of descriptor. A similar argument holds for the influence of the halogen atoms on the type of core electron representation. In some cases the molar mass, and in a minor way, special "ad hoc" parameters have been used to improve the model. A very good model of the surface tension could be obtained by the aid of five experimental parameters. A mixed model method based on experimental parameters plus molecular connectivity indices achieved, instead, to consistently improve the model quality of five properties. To underline is the importance of the boiling point temperatures as descriptors in these last two model methodologies. Copyright 2009 Wiley Periodicals, Inc.
Beleut, Manfred; Soeldner, Robert; Egorov, Mark; Guenther, Rolf; Dehler, Silvia; Morys-Wortmann, Corinna; Moch, Holger; Henco, Karsten; Schraml, Peter
2016-01-01
Despite the individually different molecular alterations in tumors, the malignancy associated biological traits are strikingly similar. Results of a previous study using renal cell carcinoma (RCC) as a model pointed towards cancer-related features, which could be visualized as three groups by microarray based gene expression analysis. In this study, we used a mathematic model to verify the presence of these groups in RCC as well as in other cancer types. We developed an algorithm for gene-expression deviation profiling for analyzing gene expression data of a total of 8397 patients with 13 different cancer types and normal tissues. We revealed three common Cancer Transcriptomic Profiles (CTPs) which recurred in all investigated tumors. Additionally, CTPs remained robust regardless of the functions or numbers of genes analyzed. CTPs may represent common genetic fingerprints, which potentially reflect the closely related biological traits of human cancers.
Cine: Line excitation by infrared fluorescence in cometary atmospheres
NASA Astrophysics Data System (ADS)
de Val-Borro, Miguel; Cordiner, Martin A.; Milam, Stefanie N.; Charnley, Steven B.
2017-03-01
CINE is a Python module for calculating infrared pumping efficiencies that can be applied to the most common molecules found in cometary comae such as water, hydrogen cyanide or methanol. Excitation by solar radiation of vibrational bands followed by radiative decay to the ground vibrational state is one of the main mechanisms for molecular excitation in comets. This code calculates the effective pumping rates for rotational levels in the ground vibrational state scaled by the heliocentric distance of the comet. Line transitions are queried from the latest version of the HITRAN spectroscopic repository using the astroquery affiliated package of astropy. Molecular data are obtained from the LAMDA database. These coefficients are useful for modeling rotational emission lines observed in cometary spectra at sub-millimeter wavelengths. Combined with computational methods to solve the radiative transfer equations based, e.g., on the Monte Carlo algorithm, this model can retrieve production rates and rotational temperatures from the observed emission spectrum.
Learning molecular energies using localized graph kernels.
Ferré, Grégoire; Haut, Terry; Barros, Kipton
2017-03-21
Recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturally incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. This Graph Approximated Energy (GRAPE) approach is flexible and admits many possible extensions. We benchmark a simple version of GRAPE by predicting atomization energies on a standard dataset of organic molecules.
Learning molecular energies using localized graph kernels
NASA Astrophysics Data System (ADS)
Ferré, Grégoire; Haut, Terry; Barros, Kipton
2017-03-01
Recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturally incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. This Graph Approximated Energy (GRAPE) approach is flexible and admits many possible extensions. We benchmark a simple version of GRAPE by predicting atomization energies on a standard dataset of organic molecules.
2016-01-01
Semiempirical (SE) methods can be derived from either Hartree–Fock or density functional theory by applying systematic approximations, leading to efficient computational schemes that are several orders of magnitude faster than ab initio calculations. Such numerical efficiency, in combination with modern computational facilities and linear scaling algorithms, allows application of SE methods to very large molecular systems with extensive conformational sampling. To reliably model the structure, dynamics, and reactivity of biological and other soft matter systems, however, good accuracy for the description of noncovalent interactions is required. In this review, we analyze popular SE approaches in terms of their ability to model noncovalent interactions, especially in the context of describing biomolecules, water solution, and organic materials. We discuss the most significant errors and proposed correction schemes, and we review their performance using standard test sets of molecular systems for quantum chemical methods and several recent applications. The general goal is to highlight both the value and limitations of SE methods and stimulate further developments that allow them to effectively complement ab initio methods in the analysis of complex molecular systems. PMID:27074247
De Nicola, Antonio; Kawakatsu, Toshihiro; Milano, Giuseppe
2014-12-09
A procedure based on Molecular Dynamics (MD) simulations employing soft potentials derived from self-consistent field (SCF) theory (named MD-SCF) able to generate well-relaxed all-atom structures of polymer melts is proposed. All-atom structures having structural correlations indistinguishable from ones obtained by long MD relaxations have been obtained for poly(methyl methacrylate) (PMMA) and poly(ethylene oxide) (PEO) melts. The proposed procedure leads to computational costs mainly related on system size rather than to the chain length. Several advantages of the proposed procedure over current coarse-graining/reverse mapping strategies are apparent. No parametrization is needed to generate relaxed structures of different polymers at different scales or resolutions. There is no need for special algorithms or back-mapping schemes to change the resolution of the models. This characteristic makes the procedure general and its extension to other polymer architectures straightforward. A similar procedure can be easily extended to the generation of all-atom structures of block copolymer melts and polymer nanocomposites.
Modelling of internal architecture of kinesin nanomotor as a machine language.
Khataee, H R; Ibrahim, M Y
2012-09-01
Kinesin is a protein-based natural nanomotor that transports molecular cargoes within cells by walking along microtubules. Kinesin nanomotor is considered as a bio-nanoagent which is able to sense the cell through its sensors (i.e. its heads and tail), make the decision internally and perform actions on the cell through its actuator (i.e. its motor domain). The study maps the agent-based architectural model of internal decision-making process of kinesin nanomotor to a machine language using an automata algorithm. The applied automata algorithm receives the internal agent-based architectural model of kinesin nanomotor as a deterministic finite automaton (DFA) model and generates a regular machine language. The generated regular machine language was acceptable by the architectural DFA model of the nanomotor and also in good agreement with its natural behaviour. The internal agent-based architectural model of kinesin nanomotor indicates the degree of autonomy and intelligence of the nanomotor interactions with its cell. Thus, our developed regular machine language can model the degree of autonomy and intelligence of kinesin nanomotor interactions with its cell as a language. Modelling of internal architectures of autonomous and intelligent bio-nanosystems as machine languages can lay the foundation towards the concept of bio-nanoswarms and next phases of the bio-nanorobotic systems development.
NASA Astrophysics Data System (ADS)
Babbush, Ryan; Berry, Dominic W.; Sanders, Yuval R.; Kivlichan, Ian D.; Scherer, Artur; Wei, Annie Y.; Love, Peter J.; Aspuru-Guzik, Alán
2018-01-01
We present a quantum algorithm for the simulation of molecular systems that is asymptotically more efficient than all previous algorithms in the literature in terms of the main problem parameters. As in Babbush et al (2016 New Journal of Physics 18, 033032), we employ a recently developed technique for simulating Hamiltonian evolution using a truncated Taylor series to obtain logarithmic scaling with the inverse of the desired precision. The algorithm of this paper involves simulation under an oracle for the sparse, first-quantized representation of the molecular Hamiltonian known as the configuration interaction (CI) matrix. We construct and query the CI matrix oracle to allow for on-the-fly computation of molecular integrals in a way that is exponentially more efficient than classical numerical methods. Whereas second-quantized representations of the wavefunction require \\widetilde{{ O }}(N) qubits, where N is the number of single-particle spin-orbitals, the CI matrix representation requires \\widetilde{{ O }}(η ) qubits, where η \\ll N is the number of electrons in the molecule of interest. We show that the gate count of our algorithm scales at most as \\widetilde{{ O }}({η }2{N}3t).
Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun
2017-03-23
The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.
Signature molecular descriptor : advanced applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Visco, Donald Patrick, Jr.
In this work we report on the development of the Signature Molecular Descriptor (or Signature) for use in the solution of inverse design problems as well as in highthroughput screening applications. The ultimate goal of using Signature is to identify novel and non-intuitive chemical structures with optimal predicted properties for a given application. We demonstrate this in three studies: green solvent design, glucocorticoid receptor ligand design and the design of inhibitors for Factor XIa. In many areas of engineering, compounds are designed and/or modified in incremental ways which rely upon heuristics or institutional knowledge. Often multiple experiments are performed andmore » the optimal compound is identified in this brute-force fashion. Perhaps a traditional chemical scaffold is identified and movement of a substituent group around a ring constitutes the whole of the design process. Also notably, a chemical being evaluated in one area might demonstrate properties very attractive in another area and serendipity was the mechanism for solution. In contrast to such approaches, computer-aided molecular design (CAMD) looks to encompass both experimental and heuristic-based knowledge into a strategy that will design a molecule on a computer to meet a given target. Depending on the algorithm employed, the molecule which is designed might be quite novel (re: no CAS registration number) and/or non-intuitive relative to what is known about the problem at hand. While CAMD is a fairly recent strategy (dating to the early 1980s), it contains a variety of bottlenecks and limitations which have prevented the technique from garnering more attention in the academic, governmental and industrial institutions. A main reason for this is how the molecules are described in the computer. This step can control how models are developed for the properties of interest on a given problem as well as how to go from an output of the algorithm to an actual chemical structure. This report provides details on a technique to describe molecules on a computer, called Signature, as well as the computer-aided molecule design algorithm built around Signature. Two applications are provided of the CAMD algorithm with Signature. The first describes the design of green solvents based on data in the GlaxoSmithKline (GSK) Solvent Selection Guide. The second provides novel non-steroidal glucocorticoid receptor ligands with some optimally predicted properties. In addition to using the CAMD algorithm with Signature, it is demonstrated how to employ Signature in a high-throughput screening study. Here, after classifying both active and inactive inhibitors for the protein Factor XIa using Signature, the model developed is used to screen a large, publicly-available database called PubChem for the most active compounds.« less
Fixman compensating potential for general branched molecules
NASA Astrophysics Data System (ADS)
Jain, Abhinandan; Kandel, Saugat; Wagner, Jeffrey; Larsen, Adrien; Vaidehi, Nagarajan
2013-12-01
The technique of constraining high frequency modes of molecular motion is an effective way to increase simulation time scale and improve conformational sampling in molecular dynamics simulations. However, it has been shown that constraints on higher frequency modes such as bond lengths and bond angles stiffen the molecular model, thereby introducing systematic biases in the statistical behavior of the simulations. Fixman proposed a compensating potential to remove such biases in the thermodynamic and kinetic properties calculated from dynamics simulations. Previous implementations of the Fixman potential have been limited to only short serial chain systems. In this paper, we present a spatial operator algebra based algorithm to calculate the Fixman potential and its gradient within constrained dynamics simulations for branched topology molecules of any size. Our numerical studies on molecules of increasing complexity validate our algorithm by demonstrating recovery of the dihedral angle probability distribution function for systems that range in complexity from serial chains to protein molecules. We observe that the Fixman compensating potential recovers the free energy surface of a serial chain polymer, thus annulling the biases caused by constraining the bond lengths and bond angles. The inclusion of Fixman potential entails only a modest increase in the computational cost in these simulations. We believe that this work represents the first instance where the Fixman potential has been used for general branched systems, and establishes the viability for its use in constrained dynamics simulations of proteins and other macromolecules.
Yang, L. H.; Brooks III, E. D.; Belak, J.
1992-01-01
A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fattebert, J.-L.; Richards, D.F.; Glosli, J.N.
2012-12-01
We present a new algorithm for automatic parallel load balancing in classical molecular dynamics. It assumes a spatial domain decomposition of particles into Voronoi cells. It is a gradient method which attempts to minimize a cost function by displacing Voronoi sites associated with each processor/sub-domain along steepest descent directions. Excellent load balance has been obtained for quasi-2D and 3D practical applications, with up to 440·10 6 particles on 65,536 MPI tasks.
The Structure and Evolution of Self-Gravitating Molecular Clouds
NASA Astrophysics Data System (ADS)
Holliman, John Herbert, II
1995-01-01
We present a theoretical formalism to evaluate the structure of molecular clouds and to determine precollapse conditions in star-forming regions. Models consist of pressure-bounded, self-gravitating spheres of a single -fluid ideal gas. We treat the case without rotation. The analysis is generalized to consider states in hydrostatic equilibrium maintained by multiple pressure components. Individual pressures vary with density as P_i(r) ~ rho^{gamma {rm p},i}(r), where gamma_{rm p},i is the polytropic index. Evolution depends additionally on whether conduction occurs on a dynamical time scale and on the adiabatic index gammai of each component, which is modified to account for the effects of any thermal coupling to the environment of the cloud. Special attention is given to properly representing the major contributors to dynamical support in molecular clouds: the pressures due to static magnetic fields, Alfven waves, and thermal motions. Straightforward adjustments to the model allow us to treat the intrinsically anisotropic support provided by the static fields. We derive structure equations, as well as perturbation equations for performing a linear stability analysis. The analysis provides insight on the nature of dynamical motions due to collapse from an equilibrium state and estimates the mass of condensed objects that form in such a process. After presenting a set of general results, we describe models of star-forming regions that include the major pressure components. We parameterize the extent of ambipolar diffusion. The analysis contributes to the physical understanding of several key results from observations of these regions. Commonly observed quantities are explicitly cross-referenced with model results. We theoretically determine density and linewidth profiles on scales ranging from that of molecular cloud cores to that of giant molecular clouds (GMCs). The model offers an explanation of the mean pressures in GMCs, which are observed to be high relative to that in the intercloud medium. We estimate what fraction of a cloud on the verge of gravitational collapse will ultimately form a condensed object, and we predict the qualitative appearance of the collapse. Finally, we simulate fragmentation--a key step in the star-forming process whereby molecular clouds or clumps within more massive clouds break up into substantially less massive cores that can in turn condense into stars. Fragmentation occurs in the context of dynamical collapse--a highly nonlinear process--so it has been difficult to reach a consensus on its specific appearance or on the influence of initial conditions. Increases in density by several orders of magnitude and the unknown, time-dependent positions of the rapidly evolving fragments present difficulties for the simulation of fragmentation. In order to increase the efficiency and effective resolution with which we can model this process, we have assembled can adaptive mesh refinement (AMR) hydrodynamics algorithm and an adaptive elliptical solver for self-gravity. The code is adaptive in the sense that it can dynamically and automatically alter the configuration of a recursively finer mesh in the computational domain. A test suite helps confirm the proper operation of the algorithm. Using initial conditions adopted in previous fragmentation studies, we simulate the collapse of a molecular cloud core. (Abstract shortened by UMI.).
Terahertz Technology and Molecular Interactions
2010-12-16
numerical identification algorithm, based on a simple threshold model, showed that the probability for false alarm ( PFA ) for the least favorable of...Briefly put, Phase I of MACS was to develop in 18 months a sensor system in a 1 cu ft vol- ume that could correctly identify with a PFA < 10-4 gases in a...observe spectral lines that have fractional ab- sorptions of 10-7 there are six orders of magnitude in sensitivity at stake. If spectral lines have
Pharmacotherapy of Essential Tremor
Hedera, Peter; Cibulčík, František; Davis, Thomas L.
2013-01-01
Essential tremor (ET) is a common movement disorder but its pathogenesis remains poorly understood. This has limited the development of effective pharmacotherapy. The current therapeutic armamentaria for ET represent the product of careful clinical observation rather than targeted molecular modeling. Here we review their pharmacokinetics, metabolism, dosing, and adverse effect profiles and propose a treatment algorithm. We also discuss the concept of medically refractory tremor, as therapeutic trials should be limited unless invasive therapy is contraindicated or not desired by patients. PMID:24385718
Stargate GTM: Bridging Descriptor and Activity Spaces.
Gaspar, Héléna A; Baskin, Igor I; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2015-11-23
Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate" version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method.
Logic integer programming models for signaling networks.
Haus, Utz-Uwe; Niermann, Kathrin; Truemper, Klaus; Weismantel, Robert
2009-05-01
We propose a static and a dynamic approach to model biological signaling networks, and show how each can be used to answer relevant biological questions. For this, we use the two different mathematical tools of Propositional Logic and Integer Programming. The power of discrete mathematics for handling qualitative as well as quantitative data has so far not been exploited in molecular biology, which is mostly driven by experimental research, relying on first-order or statistical models. The arising logic statements and integer programs are analyzed and can be solved with standard software. For a restricted class of problems the logic models reduce to a polynomial-time solvable satisfiability algorithm. Additionally, a more dynamic model enables enumeration of possible time resolutions in poly-logarithmic time. Computational experiments are included.
ELF: An Extended-Lagrangian Free Energy Calculation Module for Multiple Molecular Dynamics Engines.
Chen, Haochuan; Fu, Haohao; Shao, Xueguang; Chipot, Christophe; Cai, Wensheng
2018-06-18
Extended adaptive biasing force (eABF), a collective variable (CV)-based importance-sampling algorithm, has proven to be very robust and efficient compared with the original ABF algorithm. Its implementation in Colvars, a software addition to molecular dynamics (MD) engines, is, however, currently limited to NAMD and LAMMPS. To broaden the scope of eABF and its variants, like its generalized form (egABF), and make them available to other MD engines, e.g., GROMACS, AMBER, CP2K, and openMM, we present a PLUMED-based implementation, called extended-Lagrangian free energy calculation (ELF). This implementation can be used as a stand-alone gradient estimator for other CV-based sampling algorithms, such as temperature-accelerated MD (TAMD) and extended-Lagrangian metadynamics (MtD). ELF provides the end user with a convenient framework to help select the best-suited importance-sampling algorithm for a given application without any commitment to a particular MD engine.
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Xiao, Li; Luo, Ray
2017-12-07
We explored a multi-scale algorithm for the Poisson-Boltzmann continuum solvent model for more robust simulations of biomolecules. In this method, the continuum solvent/solute interface is explicitly simulated with a numerical fluid dynamics procedure, which is tightly coupled to the solute molecular dynamics simulation. There are multiple benefits to adopt such a strategy as presented below. At this stage of the development, only nonelectrostatic interactions, i.e., van der Waals and hydrophobic interactions, are included in the algorithm to assess the quality of the solvent-solute interface generated by the new method. Nevertheless, numerical challenges exist in accurately interpolating the highly nonlinear van der Waals term when solving the finite-difference fluid dynamics equations. We were able to bypass the challenge rigorously by merging the van der Waals potential and pressure together when solving the fluid dynamics equations and by considering its contribution in the free-boundary condition analytically. The multi-scale simulation method was first validated by reproducing the solute-solvent interface of a single atom with analytical solution. Next, we performed the relaxation simulation of a restrained symmetrical monomer and observed a symmetrical solvent interface at equilibrium with detailed surface features resembling those found on the solvent excluded surface. Four typical small molecular complexes were then tested, both volume and force balancing analyses showing that these simple complexes can reach equilibrium within the simulation time window. Finally, we studied the quality of the multi-scale solute-solvent interfaces for the four tested dimer complexes and found that they agree well with the boundaries as sampled in the explicit water simulations.
MoCha: Molecular Characterization of Unknown Pathways.
Lobo, Daniel; Hammelman, Jennifer; Levin, Michael
2016-04-01
Automated methods for the reverse-engineering of complex regulatory networks are paving the way for the inference of mechanistic comprehensive models directly from experimental data. These novel methods can infer not only the relations and parameters of the known molecules defined in their input datasets, but also unknown components and pathways identified as necessary by the automated algorithms. Identifying the molecular nature of these unknown components is a crucial step for making testable predictions and experimentally validating the models, yet no specific and efficient tools exist to aid in this process. To this end, we present here MoCha (Molecular Characterization), a tool optimized for the search of unknown proteins and their pathways from a given set of known interacting proteins. MoCha uses the comprehensive dataset of protein-protein interactions provided by the STRING database, which currently includes more than a billion interactions from over 2,000 organisms. MoCha is highly optimized, performing typical searches within seconds. We demonstrate the use of MoCha with the characterization of unknown components from reverse-engineered models from the literature. MoCha is useful for working on network models by hand or as a downstream step of a model inference engine workflow and represents a valuable and efficient tool for the characterization of unknown pathways using known data from thousands of organisms. MoCha and its source code are freely available online under the GPLv3 license.
SIFTER search: a web server for accurate phylogeny-based protein function prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
SIFTER search: a web server for accurate phylogeny-based protein function prediction
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
2015-05-15
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
Computational approaches for the classification of seed storage proteins.
Radhika, V; Rao, V Sree Hari
2015-07-01
Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.
Parallel algorithms for the molecular conformation problem
NASA Astrophysics Data System (ADS)
Rajan, Kumar
Given a set of objects, and some of the pairwise distances between them, the problem of identifying the positions of the objects in the Euclidean space is referred to as the molecular conformation problem. This problem is known to be computationally difficult. One of the most important applications of this problem is the determination of the structure of molecules. In the case of molecular structure determination, usually only the lower and upper bounds on some of the interatomic distances are available. The process of obtaining a tighter set of bounds between all pairs of atoms, using the available interatomic distance bounds is referred to as bound-smoothing . One method for bound-smoothing is to use the limits imposed by the triangle inequality. The distance bounds so obtained can often be tightened further by applying the tetrangle inequality---the limits imposed on the six pairwise distances among a set of four atoms (instead of three for the triangle inequalities). The tetrangle inequality is expressed by the Cayley-Menger determinants. The sequential tetrangle-inequality bound-smoothing algorithm considers a quadruple of atoms at a time, and tightens the bounds on each of its six distances. The sequential algorithm is computationally expensive, and its application is limited to molecules with up to a few hundred atoms. Here, we conduct an experimental study of tetrangle-inequality bound-smoothing and reduce the sequential time by identifying the most computationally expensive portions of the process. We also present a simple criterion to determine which of the quadruples of atoms are likely to be tightened the most by tetrangle-inequality bound-smoothing. This test could be used to enhance the applicability of this process to large molecules. We map the problem of parallelizing tetrangle-inequality bound-smoothing to that of generating disjoint packing designs of a certain kind. We map this, in turn, to a regular-graph coloring problem, and present a simple, parallel algorithm for tetrangle-inequality bound-smoothing. We implement the parallel algorithm on the Intel Paragon X/PS, and apply it to real-life molecules. Our results show that with this parallel algorithm, tetrangle inequality can be applied to large molecules in a reasonable amount of time. We extend the regular graph to represent more general packing designs, and present a coloring algorithm for this graph. This can be used to generate constant-weight binary codes in parallel. Once a tighter set of distance bounds is obtained, the molecular conformation problem is usually formulated as a non-linear optimization problem, and a global optimization algorithm is then used to solve the problem. Here we present a parallel, deterministic algorithm for the optimization problem based on Interval Analysis. We implement our algorithm, using dynamic load balancing, on a network of Sun Ultra-Sparc workstations. Our experience with this algorithm shows that its application is limited to small instances of the molecular conformation problem, where the number of measured, pairwise distances is close to the maximum value. However, since the interval method eliminates a substantial portion of the initial search space very quickly, it can be used to prune the search space before any of the more efficient, nondeterministic methods can be applied.
Optimizing Tissue Sampling for the Diagnosis, Subtyping, and Molecular Analysis of Lung Cancer
Ofiara, Linda Marie; Navasakulpong, Asma; Beaudoin, Stephane; Gonzalez, Anne Valerie
2014-01-01
Lung cancer has entered the era of personalized therapy with histologic subclassification and the presence of molecular biomarkers becoming increasingly important in therapeutic algorithms. At the same time, biopsy specimens are becoming increasingly smaller as diagnostic algorithms seek to establish diagnosis and stage with the least invasive techniques. Here, we review techniques used in the diagnosis of lung cancer including bronchoscopy, ultrasound-guided bronchoscopy, transthoracic needle biopsy, and thoracoscopy. In addition to discussing indications and complications, we focus our discussion on diagnostic yields and the feasibility of testing for molecular biomarkers such as epidermal growth factor receptor and anaplastic lymphoma kinase, emphasizing the importance of a sufficient tumor biopsy. PMID:25295226
Fakhar, Zeynab; Naiker, Suhashni; Alves, Claudio N; Govender, Thavendran; Maguire, Glenn E M; Lameira, Jeronimo; Lamichhane, Gyanu; Kruger, Hendrik G; Honarparvar, Bahareh
2016-11-01
An alarming rise of multidrug-resistant Mycobacterium tuberculosis strains and the continuous high global morbidity of tuberculosis have reinvigorated the need to identify novel targets to combat the disease. The enzymes that catalyze the biosynthesis of peptidoglycan in M. tuberculosis are essential and noteworthy therapeutic targets. In this study, the biochemical function and homology modeling of MurI, MurG, MraY, DapE, DapA, Alr, and Ddl enzymes of the CDC1551 M. tuberculosis strain involved in the biosynthesis of peptidoglycan cell wall are reported. Generation of the 3D structures was achieved with Modeller 9.13. To assess the structural quality of the obtained homology modeled targets, the models were validated using PROCHECK, PDBsum, QMEAN, and ERRAT scores. Molecular dynamics simulations were performed to calculate root mean square deviation (RMSD) and radius of gyration (Rg) of MurI and MurG target proteins and their corresponding templates. For further model validation, RMSD and Rg for selected targets/templates were investigated to compare the close proximity of their dynamic behavior in terms of protein stability and average distances. To identify the potential binding mode required for molecular docking, binding site information of all modeled targets was obtained using two prediction algorithms. A docking study was performed for MurI to determine the potential mode of interaction between the inhibitor and the active site residues. This study presents the first accounts of the 3D structural information for the selected M. tuberculosis targets involved in peptidoglycan biosynthesis.
NASA Astrophysics Data System (ADS)
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2013-12-01
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Hashemian, Behrooz; Millán, Daniel; Arroyo, Marino
2013-12-07
Collective variables (CVs) are low-dimensional representations of the state of a complex system, which help us rationalize molecular conformations and sample free energy landscapes with molecular dynamics simulations. Given their importance, there is need for systematic methods that effectively identify CVs for complex systems. In recent years, nonlinear manifold learning has shown its ability to automatically characterize molecular collective behavior. Unfortunately, these methods fail to provide a differentiable function mapping high-dimensional configurations to their low-dimensional representation, as required in enhanced sampling methods. We introduce a methodology that, starting from an ensemble representative of molecular flexibility, builds smooth and nonlinear data-driven collective variables (SandCV) from the output of nonlinear manifold learning algorithms. We demonstrate the method with a standard benchmark molecule, alanine dipeptide, and show how it can be non-intrusively combined with off-the-shelf enhanced sampling methods, here the adaptive biasing force method. We illustrate how enhanced sampling simulations with SandCV can explore regions that were poorly sampled in the original molecular ensemble. We further explore the transferability of SandCV from a simpler system, alanine dipeptide in vacuum, to a more complex system, alanine dipeptide in explicit water.
Vashishta, Priya; Kalia, Rajiv K; Nakano, Aiichiro
2006-03-02
We have developed a first-principles-based hierarchical simulation framework, which seamlessly integrates (1) a quantum mechanical description based on the density functional theory (DFT), (2) multilevel molecular dynamics (MD) simulations based on a reactive force field (ReaxFF) that describes chemical reactions and polarization, a nonreactive force field that employs dynamic atomic charges, and an effective force field (EFF), and (3) an atomistically informed continuum model to reach macroscopic length scales. For scalable hierarchical simulations, we have developed parallel linear-scaling algorithms for (1) DFT calculation based on a divide-and-conquer algorithm on adaptive multigrids, (2) chemically reactive MD based on a fast ReaxFF (F-ReaxFF) algorithm, and (3) EFF-MD based on a space-time multiresolution MD (MRMD) algorithm. On 1920 Intel Itanium2 processors, we have demonstrated 1.4 million atom (0.12 trillion grid points) DFT, 0.56 billion atom F-ReaxFF, and 18.9 billion atom MRMD calculations, with parallel efficiency as high as 0.953. Through the use of these algorithms, multimillion atom MD simulations have been performed to study the oxidation of an aluminum nanoparticle. Structural and dynamic correlations in the oxide region are calculated as well as the evolution of charges, surface oxide thickness, diffusivities of atoms, and local stresses. In the microcanonical ensemble, the oxidizing reaction becomes explosive in both molecular and atomic oxygen environments, due to the enormous energy release associated with Al-O bonding. In the canonical ensemble, an amorphous oxide layer of a thickness of approximately 40 angstroms is formed after 466 ps, in good agreement with experiments. Simulations have been performed to study nanoindentation on crystalline, amorphous, and nanocrystalline silicon nitride and silicon carbide. Simulation on nanocrystalline silicon carbide reveals unusual deformation mechanisms in brittle nanophase materials, due to coexistence of brittle grains and soft amorphous-like grain boundary phases. Simulations predict a crossover from intergranular continuous deformation to intragrain discrete deformation at a critical indentation depth.
On the fragmentation of filaments in a molecular cloud simulation
NASA Astrophysics Data System (ADS)
Chira, R.-A.; Kainulainen, J.; Ibáñez-Mejía, J. C.; Henning, Th.; Mac Low, M.-M.
2018-03-01
Context. The fragmentation of filaments in molecular clouds has attracted a lot of attention recently as there seems to be a close relation between the evolution of filaments and star formation. The study of the fragmentation process has been motivated by simple analytical models. However, only a few comprehensive studies have analysed the evolution of filaments using numerical simulations where the filaments form self-consistently as part of large-scale molecular cloud evolution. Aim. We address the early evolution of parsec-scale filaments that form within individual clouds. In particular, we focus on three questions: How do the line masses of filaments evolve? How and when do the filaments fragment? How does the fragmentation relate to the line masses of the filaments? Methods: We examine three simulated molecular clouds formed in kiloparsec-scale numerical simulations performed with the FLASH adaptive mesh refinement magnetohydrodynamic code. The simulations model a self-gravitating, magnetised, stratified, supernova-driven interstellar medium, including photoelectric heating and radiative cooling. We follow the evolution of the clouds for 6 Myr from the time self-gravity starts to act. We identify filaments using the DisPerSe algorithm, and compare the results to other filament-finding algorithms. We determine the properties of the identified filaments and compare them with the predictions of analytic filament stability models. Results: The average line masses of the identified filaments, as well as the fraction of mass in filamentary structures, increases fairly continuously after the onset of self-gravity. The filaments show fragmentation starting relatively early: the first fragments appear when the line masses lie well below the critical line mass of Ostriker's isolated hydrostatic equilibrium solution ( 16 M⊙ pc-1), commonly used as a fragmentation criterion. The average line masses of filaments identified in three-dimensional volume density cubes increases far more quickly than those identified in two-dimensional column density maps. Conclusions: Our results suggest that hydrostatic or dynamic compression from the surrounding cloud has a significant impact on the early dynamical evolution of filaments. A simple model of an isolated, isothermal cylinder may not provide a good approach for fragmentation analysis. Caution must be exercised in interpreting distributions of properties of filaments identified in column density maps, especially in the case of low-mass filaments. Comparing or combining results from studies that use different filament finding techniques is strongly discouraged.
Optimizing Likelihood Models for Particle Trajectory Segmentation in Multi-State Systems.
Young, Dylan Christopher; Scrimgeour, Jan
2018-06-19
Particle tracking offers significant insight into the molecular mechanics that govern the behav- ior of living cells. The analysis of molecular trajectories that transition between different motive states, such as diffusive, driven and tethered modes, is of considerable importance, with even single trajectories containing significant amounts of information about a molecule's environment and its interactions with cellular structures. Hidden Markov models (HMM) have been widely adopted to perform the segmentation of such complex tracks. In this paper, we show that extensive analysis of hidden Markov model outputs using data derived from multi-state Brownian dynamics simulations can be used both for the optimization of the likelihood models used to describe the states of the system and for characterization of the technique's failure mechanisms. This analysis was made pos- sible by the implementation of parallelized adaptive direct search algorithm on a Nvidia graphics processing unit. This approach provides critical information for the visualization of HMM failure and successful design of particle tracking experiments where trajectories contain multiple mobile states. © 2018 IOP Publishing Ltd.
Kinetic rate constant prediction supports the conformational selection mechanism of protein binding.
Moal, Iain H; Bates, Paul A
2012-01-01
The prediction of protein-protein kinetic rate constants provides a fundamental test of our understanding of molecular recognition, and will play an important role in the modeling of complex biological systems. In this paper, a feature selection and regression algorithm is applied to mine a large set of molecular descriptors and construct simple models for association and dissociation rate constants using empirical data. Using separate test data for validation, the predicted rate constants can be combined to calculate binding affinity with accuracy matching that of state of the art empirical free energy functions. The models show that the rate of association is linearly related to the proportion of unbound proteins in the bound conformational ensemble relative to the unbound conformational ensemble, indicating that the binding partners must adopt a geometry near to that of the bound prior to binding. Mirroring the conformational selection and population shift mechanism of protein binding, the models provide a strong separate line of evidence for the preponderance of this mechanism in protein-protein binding, complementing structural and theoretical studies.
Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints.
Ai, Haixin; Chen, Wen; Zhang, Li; Huang, Liangchao; Yin, Zimo; Hu, Huan; Zhao, Qi; Zhao, Jian; Liu, Hongsheng
2018-05-21
Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using three machine learning algorithms and 12 molecular fingerprints from a dataset containing 1,241 diverse compounds. The ensemble model achieved an average accuracy of 71.1±2.6%, sensitivity of 79.9±3.6%, specificity of 60.3±4.8%, and area under the receiver operating characteristic curve (AUC) of 0.764±0.026 in five-fold cross-validation and an accuracy of 84.3%, sensitivity of 86.9%, specificity of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base (LTKB). Compared with previous methods, the ensemble model achieved relatively high accuracy and sensitivity. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).
Labourier, Emmanuel; Shifrin, Alexander; Busseniers, Anne E; Lupo, Mark A; Manganelli, Monique L; Andruss, Bernard; Wylie, Dennis; Beaudenon-Huibregtse, Sylvie
2015-07-01
Molecular testing for oncogenic mutations or gene expression in fine-needle aspirations (FNAs) from thyroid nodules with indeterminate cytology identifies a subset of benign or malignant lesions with high predictive value. This study aimed to evaluate a novel diagnostic algorithm combining mutation detection and miRNA expression to improve the diagnostic yield of molecular cytology. Surgical specimens and preoperative FNAs (n = 638) were tested for 17 validated gene alterations using the miRInform Thyroid test and with a 10-miRNA gene expression classifier generating positive (malignant) or negative (benign) results. Cross-sectional sampling of thyroid nodules with atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS) or follicular neoplasm/suspicious for a follicular neoplasm (FN/SFN) cytology (n = 109) was conducted at 12 endocrinology centers across the United States. Qualitative molecular results were compared with surgical histopathology to determine diagnostic performance and model clinical effect. Mutations were detected in 69% of nodules with malignant outcome. Among mutation-negative specimens, miRNA testing correctly identified 64% of malignant cases and 98% of benign cases. The diagnostic sensitivity and specificity of the combined algorithm was 89% (95% confidence interval [CI], 73-97%) and 85% (95% CI, 75-92%), respectively. At 32% cancer prevalence, 61% of the molecular results were benign with a negative predictive value of 94% (95% CI, 85-98%). Independently of variations in cancer prevalence, the test increased the yield of true benign results by 65% relative to mRNA-based gene expression classification and decreased the rate of avoidable diagnostic surgeries by 69%. Multiplatform testing for DNA, mRNA, and miRNA can accurately classify benign and malignant thyroid nodules, increase the diagnostic yield of molecular cytology, and further improve the preoperative risk-based management of benign nodules with AUS/FLUS or FN/SFN cytology.
Analysis of A Drug Target-based Classification System using Molecular Descriptors.
Lu, Jing; Zhang, Pin; Bi, Yi; Luo, Xiaomin
2016-01-01
Drug-target interaction is an important topic in drug discovery and drug repositioning. KEGG database offers a drug annotation and classification using a target-based classification system. In this study, we gave an investigation on five target-based classes: (I) G protein-coupled receptors; (II) Nuclear receptors; (III) Ion channels; (IV) Enzymes; (V) Pathogens, using molecular descriptors to represent each drug compound. Two popular feature selection methods, maximum relevance minimum redundancy and incremental feature selection, were adopted to extract the important descriptors. Meanwhile, an optimal prediction model based on nearest neighbor algorithm was constructed, which got the best result in identifying drug target-based classes. Finally, some key descriptors were discussed to uncover their important roles in the identification of drug-target classes.
Molecular dynamics simulations of field emission from a planar nanodiode
NASA Astrophysics Data System (ADS)
Torfason, Kristinn; Valfells, Agust; Manolescu, Andrei
2015-03-01
High resolution molecular dynamics simulations with full Coulomb interactions of electrons are used to investigate field emission in planar nanodiodes. The effects of space-charge and emitter radius are examined and compared to previous results concerning transition from Fowler-Nordheim to Child-Langmuir current [Y. Y. Lau, Y. Liu, and R. K. Parker, Phys. Plasmas 1, 2082 (1994) and Y. Feng and J. P. Verboncoeur, Phys. Plasmas 13, 073105 (2006)]. The Fowler-Nordheim law is used to determine the current density injected into the system and the Metropolis-Hastings algorithm to find a favourable point of emission on the emitter surface. A simple fluid like model is also developed and its results are in qualitative agreement with the simulations.
Emperador, Agustí; Sfriso, Pedro; Villarreal, Marcos Ariel; Gelpí, Josep Lluis; Orozco, Modesto
2015-12-08
Molecular dynamics simulations of proteins are usually performed on a single molecule, and coarse-grained protein models are calibrated using single-molecule simulations, therefore ignoring intermolecular interactions. We present here a new coarse-grained force field for the study of many protein systems. The force field, which is implemented in the context of the discrete molecular dynamics algorithm, is able to reproduce the properties of folded and unfolded proteins, in both isolation, complexed forming well-defined quaternary structures, or aggregated, thanks to its proper evaluation of protein-protein interactions. The accuracy and computational efficiency of the method makes it a universal tool for the study of the structure, dynamics, and association/dissociation of proteins.
Molecular dynamics simulations of field emission from a planar nanodiode
DOE Office of Scientific and Technical Information (OSTI.GOV)
Torfason, Kristinn; Valfells, Agust; Manolescu, Andrei
High resolution molecular dynamics simulations with full Coulomb interactions of electrons are used to investigate field emission in planar nanodiodes. The effects of space-charge and emitter radius are examined and compared to previous results concerning transition from Fowler-Nordheim to Child-Langmuir current [Y. Y. Lau, Y. Liu, and R. K. Parker, Phys. Plasmas 1, 2082 (1994) and Y. Feng and J. P. Verboncoeur, Phys. Plasmas 13, 073105 (2006)]. The Fowler-Nordheim law is used to determine the current density injected into the system and the Metropolis-Hastings algorithm to find a favourable point of emission on the emitter surface. A simple fluid likemore » model is also developed and its results are in qualitative agreement with the simulations.« less
Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N
2015-10-13
We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?
Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend
2011-10-11
In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.
Path statistics, memory, and coarse-graining of continuous-time random walks on networks
Kion-Crosby, Willow; Morozov, Alexandre V.
2015-01-01
Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868
Li, Bai; Lin, Mu; Liu, Qiao; Li, Ya; Zhou, Changjun
2015-10-01
Protein folding is a fundamental topic in molecular biology. Conventional experimental techniques for protein structure identification or protein folding recognition require strict laboratory requirements and heavy operating burdens, which have largely limited their applications. Alternatively, computer-aided techniques have been developed to optimize protein structures or to predict the protein folding process. In this paper, we utilize a 3D off-lattice model to describe the original protein folding scheme as a simplified energy-optimal numerical problem, where all types of amino acid residues are binarized into hydrophobic and hydrophilic ones. We apply a balance-evolution artificial bee colony (BE-ABC) algorithm as the minimization solver, which is featured by the adaptive adjustment of search intensity to cater for the varying needs during the entire optimization process. In this work, we establish a benchmark case set with 13 real protein sequences from the Protein Data Bank database and evaluate the convergence performance of BE-ABC algorithm through strict comparisons with several state-of-the-art ABC variants in short-term numerical experiments. Besides that, our obtained best-so-far protein structures are compared to the ones in comprehensive previous literature. This study also provides preliminary insights into how artificial intelligence techniques can be applied to reveal the dynamics of protein folding. Graphical Abstract Protein folding optimization using 3D off-lattice model and advanced optimization techniques.
Chen, Chun-Teh; Martin-Martinez, Francisco J.; Jung, Gang Seob
2017-01-01
A set of computational methods that contains a brute-force algorithmic generation of chemical isomers, molecular dynamics (MD) simulations, and density functional theory (DFT) calculations is reported and applied to investigate nearly 3000 probable molecular structures of polydopamine (PDA) and eumelanin. All probable early-polymerized 5,6-dihydroxyindole (DHI) oligomers, ranging from dimers to tetramers, have been systematically analyzed to find the most stable geometry connections as well as to propose a set of molecular models that represents the chemically diverse nature of PDA and eumelanin. Our results indicate that more planar oligomers have a tendency to be more stable. This finding is in good agreement with recent experimental observations, which suggested that PDA and eumelanin are composed of nearly planar oligomers that appear to be stacked together via π–π interactions to form graphite-like layered aggregates. We also show that there is a group of tetramers notably more stable than the others, implying that even though there is an inherent chemical diversity in PDA and eumelanin, the molecular structures of the majority of the species are quite repetitive. Our results also suggest that larger oligomers are less likely to form. This observation is also consistent with experimental measurements, supporting the existence of small oligomers instead of large polymers as main components of PDA and eumelanin. In summary, this work brings an insight into the controversial structure of PDA and eumelanin, explaining some of the most important structural features, and providing a set of molecular models for more accurate modeling of eumelanin-like materials. PMID:28451292
On the Genealogy of Asexual Diploids
NASA Astrophysics Data System (ADS)
Lam, Fumei; Langley, Charles H.; Song, Yun S.
Given molecular genetic data from diploid individuals that, at present, reproduce mostly or exclusively asexually without recombination, an important problem in evolutionary biology is detecting evidence of past sexual reproduction (i.e., meiosis and mating) and recombination (both meiotic and mitotic). However, currently there is a lack of computational tools for carrying out such a study. In this paper, we formulate a new problem of reconstructing diploid genealogies under the assumption of no sexual reproduction or recombination, with the ultimate goal being to devise genealogy-based tools for testing deviation from these assumptions. We first consider the infinite-sites model of mutation and develop linear-time algorithms to test the existence of an asexual diploid genealogy compatible with the infinite-sites model of mutation, and to construct one if it exists. Then, we relax the infinite-sites assumption and develop an integer linear programming formulation to reconstruct asexual diploid genealogies with the minimum number of homoplasy (back or recurrent mutation) events. We apply our algorithms on simulated data sets with sizes of biological interest.
Cerebellar supervised learning revisited: biophysical modeling and degrees-of-freedom control.
Kawato, Mitsuo; Kuroda, Shinya; Schweighofer, Nicolas
2011-10-01
The biophysical models of spike-timing-dependent plasticity have explored dynamics with molecular basis for such computational concepts as coincidence detection, synaptic eligibility trace, and Hebbian learning. They overall support different learning algorithms in different brain areas, especially supervised learning in the cerebellum. Because a single spine is physically very small, chemical reactions at it are essentially stochastic, and thus sensitivity-longevity dilemma exists in the synaptic memory. Here, the cascade of excitable and bistable dynamics is proposed to overcome this difficulty. All kinds of learning algorithms in different brain regions confront with difficult generalization problems. For resolution of this issue, the control of the degrees-of-freedom can be realized by changing synchronicity of neural firing. Especially, for cerebellar supervised learning, the triangle closed-loop circuit consisting of Purkinje cells, the inferior olive nucleus, and the cerebellar nucleus is proposed as a circuit to optimally control synchronous firing and degrees-of-freedom in learning. Copyright © 2011 Elsevier Ltd. All rights reserved.
Rodrigues, João P G L M; Melquiond, Adrien S J; Bonvin, Alexandre M J J
2016-01-01
Molecular modelling and simulations are nowadays an integral part of research in areas ranging from physics to chemistry to structural biology, as well as pharmaceutical drug design. This popularity is due to the development of high-performance hardware and of accurate and efficient molecular mechanics algorithms by the scientific community. These improvements are also benefitting scientific education. Molecular simulations, their underlying theory, and their applications are particularly difficult to grasp for undergraduate students. Having hands-on experience with the methods contributes to a better understanding and solidification of the concepts taught during the lectures. To this end, we have created a computer practical class, which has been running for the past five years, composed of several sessions where students characterize the conformational landscape of small peptides using molecular dynamics simulations in order to gain insights on their binding to protein receptors. In this report, we detail the ingredients and recipe necessary to establish and carry out this practical, as well as some of the questions posed to the students and their expected results. Further, we cite some examples of the students' written reports, provide statistics, and share their feedbacks on the structure and execution of the sessions. These sessions were implemented alongside a theoretical molecular modelling course but have also been used successfully as a standalone tutorial during specialized workshops. The availability of the material on our web page also facilitates this integration and dissemination and lends strength to the thesis of open-source science and education. © 2016 The International Union of Biochemistry and Molecular Biology.
Multiscale geometric modeling of macromolecules I: Cartesian representation
NASA Astrophysics Data System (ADS)
Xia, Kelin; Feng, Xin; Chen, Zhan; Tong, Yiying; Wei, Guo-Wei
2014-01-01
This paper focuses on the geometric modeling and computational algorithm development of biomolecular structures from two data sources: Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB) in the Eulerian (or Cartesian) representation. Molecular surface (MS) contains non-smooth geometric singularities, such as cusps, tips and self-intersecting facets, which often lead to computational instabilities in molecular simulations, and violate the physical principle of surface free energy minimization. Variational multiscale surface definitions are proposed based on geometric flows and solvation analysis of biomolecular systems. Our approach leads to geometric and potential driven Laplace-Beltrami flows for biomolecular surface evolution and formation. The resulting surfaces are free of geometric singularities and minimize the total free energy of the biomolecular system. High order partial differential equation (PDE)-based nonlinear filters are employed for EMDB data processing. We show the efficacy of this approach in feature-preserving noise reduction. After the construction of protein multiresolution surfaces, we explore the analysis and characterization of surface morphology by using a variety of curvature definitions. Apart from the classical Gaussian curvature and mean curvature, maximum curvature, minimum curvature, shape index, and curvedness are also applied to macromolecular surface analysis for the first time. Our curvature analysis is uniquely coupled to the analysis of electrostatic surface potential, which is a by-product of our variational multiscale solvation models. As an expository investigation, we particularly emphasize the numerical algorithms and computational protocols for practical applications of the above multiscale geometric models. Such information may otherwise be scattered over the vast literature on this topic. Based on the curvature and electrostatic analysis from our multiresolution surfaces, we introduce a new concept, the polarized curvature, for the prediction of protein binding sites.
Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways
Seyler, Sean L.; Kumar, Avishek; Thorpe, M. F.; Beckstein, Oliver
2015-01-01
Diverse classes of proteins function through large-scale conformational changes and various sophisticated computational algorithms have been proposed to enhance sampling of these macromolecular transition paths. Because such paths are curves in a high-dimensional space, it has been difficult to quantitatively compare multiple paths, a necessary prerequisite to, for instance, assess the quality of different algorithms. We introduce a method named Path Similarity Analysis (PSA) that enables us to quantify the similarity between two arbitrary paths and extract the atomic-scale determinants responsible for their differences. PSA utilizes the full information available in 3N-dimensional configuration space trajectories by employing the Hausdorff or Fréchet metrics (adopted from computational geometry) to quantify the degree of similarity between piecewise-linear curves. It thus completely avoids relying on projections into low dimensional spaces, as used in traditional approaches. To elucidate the principles of PSA, we quantified the effect of path roughness induced by thermal fluctuations using a toy model system. Using, as an example, the closed-to-open transitions of the enzyme adenylate kinase (AdK) in its substrate-free form, we compared a range of protein transition path-generating algorithms. Molecular dynamics-based dynamic importance sampling (DIMS) MD and targeted MD (TMD) and the purely geometric FRODA (Framework Rigidity Optimized Dynamics Algorithm) were tested along with seven other methods publicly available on servers, including several based on the popular elastic network model (ENM). PSA with clustering revealed that paths produced by a given method are more similar to each other than to those from another method and, for instance, that the ENM-based methods produced relatively similar paths. PSA applied to ensembles of DIMS MD and FRODA trajectories of the conformational transition of diphtheria toxin, a particularly challenging example, showed that the geometry-based FRODA occasionally sampled the pathway space of force field-based DIMS MD. For the AdK transition, the new concept of a Hausdorff-pair map enabled us to extract the molecular structural determinants responsible for differences in pathways, namely a set of conserved salt bridges whose charge-charge interactions are fully modelled in DIMS MD but not in FRODA. PSA has the potential to enhance our understanding of transition path sampling methods, validate them, and to provide a new approach to analyzing conformational transitions. PMID:26488417
Evaluation of an Inverse Molecular Design Algorithm in a Model Binding Site
Huggins, David J.; Altman, Michael D.; Tidor, Bruce
2008-01-01
Computational molecular design is a useful tool in modern drug discovery. Virtual screening is an approach that docks and then scores individual members of compound libraries. In contrast to this forward approach, inverse approaches construct compounds from fragments, such that the computed affinity, or a combination of relevant properties, is optimized. We have recently developed a new inverse approach to drug design based on the dead-end elimination and A* algorithms employing a physical potential function. This approach has been applied to combinatorially constructed libraries of small-molecule ligands to design high-affinity HIV-1 protease inhibitors [M. D. Altman et al. J. Am. Chem. Soc. 130: 6099–6013, 2008]. Here we have evaluated the new method using the well studied W191G mutant of cytochrome c peroxidase. This mutant possesses a charged binding pocket and has been used to evaluate other design approaches. The results show that overall the new inverse approach does an excellent job of separating binders from non-binders. For a few individual cases, scoring inaccuracies led to false positives. The majority of these involve erroneous solvation energy estimation for charged amines, anilinium ions and phenols, which has been observed previously for a variety of scoring algorithms. Interestingly, although inverse approaches are generally expected to identify some but not all binders in a library, due to limited conformational searching, these results show excellent coverage of the known binders while still showing strong discrimination of the non-binders. PMID:18831031
Evaluation of an inverse molecular design algorithm in a model binding site.
Huggins, David J; Altman, Michael D; Tidor, Bruce
2009-04-01
Computational molecular design is a useful tool in modern drug discovery. Virtual screening is an approach that docks and then scores individual members of compound libraries. In contrast to this forward approach, inverse approaches construct compounds from fragments, such that the computed affinity, or a combination of relevant properties, is optimized. We have recently developed a new inverse approach to drug design based on the dead-end elimination and A* algorithms employing a physical potential function. This approach has been applied to combinatorially constructed libraries of small-molecule ligands to design high-affinity HIV-1 protease inhibitors (Altman et al., J Am Chem Soc 2008;130:6099-6013). Here we have evaluated the new method using the well-studied W191G mutant of cytochrome c peroxidase. This mutant possesses a charged binding pocket and has been used to evaluate other design approaches. The results show that overall the new inverse approach does an excellent job of separating binders from nonbinders. For a few individual cases, scoring inaccuracies led to false positives. The majority of these involve erroneous solvation energy estimation for charged amines, anilinium ions, and phenols, which has been observed previously for a variety of scoring algorithms. Interestingly, although inverse approaches are generally expected to identify some but not all binders in a library, due to limited conformational searching, these results show excellent coverage of the known binders while still showing strong discrimination of the nonbinders. (c) 2008 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Colarco, P. R.; Gasso, S.; Jethva, H. T.; Buchard, V.; Ahn, C.; Torres, O.; daSilva, A.
2016-12-01
Output from the NASA Goddard Earth Observing System, version 5 (GEOS-5) Earth system model is used to simulate the top-of-atmosphere 354 and 388 nm radiances observed by the Ozone Monitoring Instrument (OMI) onboard the Aura spacecraft. The principle purpose of developing this simulator tool is to compute from the modeled fields the so-called OMI Aerosol Index (AI), which is a more fundamental retrieval product than higher level products such as the aerosol optical depth (AOD) or absorbing aerosol optical depth (AAOD). This lays the groundwork for eventually developing a capability to assimilate either the OMI AI or its radiances, which would provide further constraint on aerosol loading and absorption properties for global models. We extend the use of the simulator capability to understand the nature of the OMI aerosol retrieval algorithms themselves in an Observing System Simulation Experiment (OSSE). The simulated radiances are used to calculate the AI from the modeled fields. These radiances are also provided to the OMI aerosol algorithms, which return their own retrievals of the AI, AOD, and AAOD. Our assessment reveals that the OMI-retrieved AI can be mostly harmonized with the model-derived AI given the same radiances provided a common surface pressure field is assumed. This is important because the operational OMI algorithms presently assume a fixed pressure field, while the contribution of molecular scattering to the actual OMI signal in fact responds to the actual atmospheric pressure profile, which is accounted for in our OSSE by using GEOS-5 produced atmospheric reanalyses. Other differences between the model and OMI AI are discussed, and we present a preliminary assessment of the OMI AOD and AAOD products with respect to the known inputs from the GEOS-5 simulation.
Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias
2014-06-10
We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Salloum, Maher N.; Sargsyan, Khachik; Jones, Reese E.
2015-08-11
We present a methodology to assess the predictive fidelity of multiscale simulations by incorporating uncertainty in the information exchanged between the components of an atomistic-to-continuum simulation. We account for both the uncertainty due to finite sampling in molecular dynamics (MD) simulations and the uncertainty in the physical parameters of the model. Using Bayesian inference, we represent the expensive atomistic component by a surrogate model that relates the long-term output of the atomistic simulation to its uncertain inputs. We then present algorithms to solve for the variables exchanged across the atomistic-continuum interface in terms of polynomial chaos expansions (PCEs). We alsomore » consider a simple Couette flow where velocities are exchanged between the atomistic and continuum components, while accounting for uncertainty in the atomistic model parameters and the continuum boundary conditions. Results show convergence of the coupling algorithm at a reasonable number of iterations. As a result, the uncertainty in the obtained variables significantly depends on the amount of data sampled from the MD simulations and on the width of the time averaging window used in the MD simulations.« less
A statistical framework for genetic association studies of power curves in bird flight
Lin, Min; Zhao, Wei
2006-01-01
How the power required for bird flight varies as a function of forward speed can be used to predict the flight style and behavioral strategy of a bird for feeding and migration. A U-shaped curve was observed between the power and flight velocity in many birds, which is consistent to the theoretical prediction by aerodynamic models. In this article, we present a general genetic model for fine mapping of quantitative trait loci (QTL) responsible for power curves in a sample of birds drawn from a natural population. This model is developed within the maximum likelihood context, implemented with the EM algorithm for estimating the population genetic parameters of QTL and the simplex algorithm for estimating the QTL genotype-specific parameters of power curves. Using Monte Carlo simulation derived from empirical observations of power curves in the European starling (Sturnus vulgaris), we demonstrate how the underlying QTL for power curves can be detected from molecular markers and how the QTL detected affect the most appropriate flight speeds used to design an optimal migration strategy. The results from our model can be directly integrated into a conceptual framework for understanding flight origin and evolution. PMID:17066123
Huang, Yu-Ming M; McCammon, J Andrew; Miao, Yinglong
2018-04-10
Through adding a harmonic boost potential to smooth the system potential energy surface, Gaussian accelerated molecular dynamics (GaMD) provides enhanced sampling and free energy calculation of biomolecules without the need of predefined reaction coordinates. This work continues to improve the acceleration power and energy reweighting of the GaMD by combining the GaMD with replica exchange algorithms. Two versions of replica exchange GaMD (rex-GaMD) are presented: force constant rex-GaMD and threshold energy rex-GaMD. During simulations of force constant rex-GaMD, the boost potential can be exchanged between replicas of different harmonic force constants with fixed threshold energy. However, the algorithm of threshold energy rex-GaMD tends to switch the threshold energy between lower and upper bounds for generating different levels of boost potential. Testing simulations on three model systems, including the alanine dipeptide, chignolin, and HIV protease, demonstrate that through continuous exchanges of the boost potential, the rex-GaMD simulations not only enhance the conformational transitions of the systems but also narrow down the distribution width of the applied boost potential for accurate energetic reweighting to recover biomolecular free energy profiles.
An atomistic fingerprint algorithm for learning ab initio molecular force fields
NASA Astrophysics Data System (ADS)
Tang, Yu-Hang; Zhang, Dongkun; Karniadakis, George Em
2018-01-01
Molecular fingerprints, i.e., feature vectors describing atomistic neighborhood configurations, is an important abstraction and a key ingredient for data-driven modeling of potential energy surface and interatomic force. In this paper, we present the density-encoded canonically aligned fingerprint algorithm, which is robust and efficient, for fitting per-atom scalar and vector quantities. The fingerprint is essentially a continuous density field formed through the superimposition of smoothing kernels centered on the atoms. Rotational invariance of the fingerprint is achieved by aligning, for each fingerprint instance, the neighboring atoms onto a local canonical coordinate frame computed from a kernel minisum optimization procedure. We show that this approach is superior over principal components analysis-based methods especially when the atomistic neighborhood is sparse and/or contains symmetry. We propose that the "distance" between the density fields be measured using a volume integral of their pointwise difference. This can be efficiently computed using optimal quadrature rules, which only require discrete sampling at a small number of grid points. We also experiment on the choice of weight functions for constructing the density fields and characterize their performance for fitting interatomic potentials. The applicability of the fingerprint is demonstrated through a set of benchmark problems.
Clustering the Orion B giant molecular cloud based on its molecular emission.
Bron, Emeric; Daudon, Chloé; Pety, Jérôme; Levrier, François; Gerin, Maryvonne; Gratier, Pierre; Orkisz, Jan H; Guzman, Viviana; Bardeau, Sébastien; Goicoechea, Javier R; Liszt, Harvey; Öberg, Karin; Peretto, Nicolas; Sievers, Albrecht; Tremblin, Pascal
2018-02-01
Previous attempts at segmenting molecular line maps of molecular clouds have focused on using position-position-velocity data cubes of a single molecular line to separate the spatial components of the cloud. In contrast, wide field spectral imaging over a large spectral bandwidth in the (sub)mm domain now allows one to combine multiple molecular tracers to understand the different physical and chemical phases that constitute giant molecular clouds (GMCs). We aim at using multiple tracers (sensitive to different physical processes and conditions) to segment a molecular cloud into physically/chemically similar regions (rather than spatially connected components), thus disentangling the different physical/chemical phases present in the cloud. We use a machine learning clustering method, namely the Meanshift algorithm, to cluster pixels with similar molecular emission, ignoring spatial information. Clusters are defined around each maximum of the multidimensional Probability Density Function (PDF) of the line integrated intensities. Simple radiative transfer models were used to interpret the astrophysical information uncovered by the clustering analysis. A clustering analysis based only on the J = 1 - 0 lines of three isotopologues of CO proves suffcient to reveal distinct density/column density regimes ( n H ~ 100 cm -3 , ~ 500 cm -3 , and > 1000 cm -3 ), closely related to the usual definitions of diffuse, translucent and high-column-density regions. Adding two UV-sensitive tracers, the J = 1 - 0 line of HCO + and the N = 1 - 0 line of CN, allows us to distinguish two clearly distinct chemical regimes, characteristic of UV-illuminated and UV-shielded gas. The UV-illuminated regime shows overbright HCO + and CN emission, which we relate to a photochemical enrichment effect. We also find a tail of high CN/HCO + intensity ratio in UV-illuminated regions. Finer distinctions in density classes ( n H ~ 7 × 10 3 cm -3 ~ 4 × 10 4 cm -3 ) for the densest regions are also identified, likely related to the higher critical density of the CN and HCO + (1 - 0) lines. These distinctions are only possible because the high-density regions are spatially resolved. Molecules are versatile tracers of GMCs because their line intensities bear the signature of the physics and chemistry at play in the gas. The association of simultaneous multi-line, wide-field mapping and powerful machine learning methods such as the Meanshift clustering algorithm reveals how to decode the complex information available in these molecular tracers.
Shape-Based Virtual Screening with Volumetric Aligned Molecular Shapes
Koes, David Ryan; Camacho, Carlos J.
2014-01-01
Shape-based virtual screening is an established and effective method for identifying small molecules that are similar in shape and function to a reference ligand. We describe a new method of shape-based virtual screening, volumetric aligned molecular shapes (VAMS). VAMS uses efficient data structures to encode and search molecular shapes. We demonstrate that VAMS is an effective method for shape-based virtual screening and that it can be successfully used as a pre-filter to accelerate more computationally demanding search algorithms. Unique to VAMS is a novel minimum/maximum shape constraint query for precisely specifying the desired molecular shape. Shape constraint searches in VAMS are particularly efficient and millions of shapes can be searched in a fraction of a second. We compare the performance of VAMS with two other shape-based virtual screening algorithms a benchmark of 102 protein targets consisting of more than 32 million molecular shapes and find that VAMS provides a competitive trade-off between run-time performance and virtual screening performance. PMID:25049193
Avoiding Defect Nucleation during Equilibration in Molecular Dynamics Simulations with ReaxFF
2015-04-01
respectively. All simulations are performed using the LAMMPS computer code.12 2 Fig. 1 a) Initial and b) final configurations of the molecular centers...Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Comput J Phys. 1995;117:1–19. (Software available at http:// lammps .sandia.gov
Merritt, M.L.
1993-01-01
The simulation of the transport of injected freshwater in a thin brackish aquifer, overlain and underlain by confining layers containing more saline water, is shown to be influenced by the choice of the finite-difference approximation method, the algorithm for representing vertical advective and dispersive fluxes, and the values assigned to parametric coefficients that specify the degree of vertical dispersion and molecular diffusion that occurs. Computed potable water recovery efficiencies will differ depending upon the choice of algorithm and approximation method, as will dispersion coefficients estimated based on the calibration of simulations to match measured data. A comparison of centered and backward finite-difference approximation methods shows that substantially different transition zones between injected and native waters are depicted by the different methods, and computed recovery efficiencies vary greatly. Standard and experimental algorithms and a variety of values for molecular diffusivity, transverse dispersivity, and vertical scaling factor were compared in simulations of freshwater storage in a thin brackish aquifer. Computed recovery efficiencies vary considerably, and appreciable differences are observed in the distribution of injected freshwater in the various cases tested. The results demonstrate both a qualitatively different description of transport using the experimental algorithms and the interrelated influences of molecular diffusion and transverse dispersion on simulated recovery efficiency. When simulating natural aquifer flow in cross-section, flushing of the aquifer occurred for all tested coefficient choices using both standard and experimental algorithms. ?? 1993.
Wang, Zhaocai; Pu, Jun; Cao, Liling; Tan, Jian
2015-10-23
The unbalanced assignment problem (UAP) is to optimally resolve the problem of assigning n jobs to m individuals (m < n), such that minimum cost or maximum profit obtained. It is a vitally important Non-deterministic Polynomial (NP) complete problem in operation management and applied mathematics, having numerous real life applications. In this paper, we present a new parallel DNA algorithm for solving the unbalanced assignment problem using DNA molecular operations. We reasonably design flexible-length DNA strands representing different jobs and individuals, take appropriate steps, and get the solutions of the UAP in the proper length range and O(mn) time. We extend the application of DNA molecular operations and simultaneity to simplify the complexity of the computation.
Basic primitives for molecular diagram sketching
2010-01-01
A collection of primitive operations for molecular diagram sketching has been developed. These primitives compose a concise set of operations which can be used to construct publication-quality 2 D coordinates for molecular structures using a bare minimum of input bandwidth. The input requirements for each primitive consist of a small number of discrete choices, which means that these primitives can be used to form the basis of a user interface which does not require an accurate pointing device. This is particularly relevant to software designed for contemporary mobile platforms. The reduction of input bandwidth is accomplished by using algorithmic methods for anticipating probable geometries during the sketching process, and by intelligent use of template grafting. The algorithms and their uses are described in detail. PMID:20923555
Adaptively restrained molecular dynamics in LAMMPS
NASA Astrophysics Data System (ADS)
Kant Singh, Krishna; Redon, Stephane
2017-07-01
Adaptively restrained molecular dynamics (ARMD) is a recently introduced particles simulation method that switches positional degrees of freedom on and off during simulation in order to speed up calculations. In the NVE ensemble, ARMD allows users to trade between precision and speed while, in the NVT ensemble, it makes it possible to compute statistical averages faster. Despite the conceptual simplicity of the approach, however, integrating it in existing molecular dynamics packages is non-trivial, in particular since implemented potentials should a priori be rewritten to take advantage of frozen particles and achieve a speed-up. In this paper, we present novel algorithms for integrating ARMD in LAMMPS, a popular multi-purpose molecular simulation package. In particular, we demonstrate how to enable ARMD in LAMMPS without having to re-implement all available force fields. The proposed algorithms are assessed on four different benchmarks, and show how they allow us to speed up simulations up to one order of magnitude.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...
2017-03-08
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
The structural bioinformatics library: modeling in biomolecular science and beyond.
Cazals, Frédéric; Dreyfus, Tom
2017-04-01
Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library ( SBL , http://sbl.inria.fr ), a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances. The SBL involves four software components (1-4 thereafter). For end-users, the SBL provides ready to use, state-of-the-art (1) applications to handle molecular models defined by unions of balls, to deal with molecular flexibility, to model macro-molecular assemblies. These applications can also be combined to tackle integrated analysis problems. For developers, the SBL provides a broad C ++ toolbox with modular design, involving core (2) algorithms , (3) biophysical models and (4) modules , the latter being especially suited to develop novel applications. The SBL comes with a thorough documentation consisting of user and reference manuals, and a bugzilla platform to handle community feedback. The SBL is available from http://sbl.inria.fr. Frederic.Cazals@inria.fr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A Geometric Method for Model Reduction of Biochemical Networks with Polynomial Rate Functions.
Samal, Satya Swarup; Grigoriev, Dima; Fröhlich, Holger; Weber, Andreas; Radulescu, Ovidiu
2015-12-01
Model reduction of biochemical networks relies on the knowledge of slow and fast variables. We provide a geometric method, based on the Newton polytope, to identify slow variables of a biochemical network with polynomial rate functions. The gist of the method is the notion of tropical equilibration that provides approximate descriptions of slow invariant manifolds. Compared to extant numerical algorithms such as the intrinsic low-dimensional manifold method, our approach is symbolic and utilizes orders of magnitude instead of precise values of the model parameters. Application of this method to a large collection of biochemical network models supports the idea that the number of dynamical variables in minimal models of cell physiology can be small, in spite of the large number of molecular regulatory actors.
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faulon, Jean-Loup, Wilcox, R.T.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in themore » center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.« less
[Standard algorithm of molecular typing of Yersinia pestis strains].
Eroshenko, G A; Odinokov, G N; Kukleva, L M; Pavlova, A I; Krasnov, Ia M; Shavina, N Iu; Guseva, N P; Vinogradova, N A; Kutyrev, V V
2012-01-01
Development of the standard algorithm of molecular typing of Yersinia pestis that ensures establishing of subspecies, biovar and focus membership of the studied isolate. Determination of the characteristic strain genotypes of plague infectious agent of main and nonmain subspecies from various natural foci of plague of the Russian Federation and the near abroad. Genotyping of 192 natural Y. pestis strains of main and nonmain subspecies was performed by using PCR methods, multilocus sequencing and multilocus analysis of variable tandem repeat number. A standard algorithm of molecular typing of plague infectious agent including several stages of Yersinia pestis differentiation by membership: in main and nonmain subspecies, various biovars of the main subspecies, specific subspecies; natural foci and geographic territories was developed. The algorithm is based on 3 typing methods--PCR, multilocus sequence typing and multilocus analysis of variable tandem repeat number using standard DNA targets--life support genes (terC, ilvN, inv, glpD, napA, rhaS and araC) and 7 loci of variable tandem repeats (ms01, ms04, ms06, ms07, ms46, ms62, ms70). The effectiveness of the developed algorithm is shown on the large number of natural Y. pestis strains. Characteristic sequence types of Y. pestis strains of various subspecies and biovars as well as MLVA7 genotypes of strains from natural foci of plague of the Russian Federation and the near abroad were established. The application of the developed algorithm will increase the effectiveness of epidemiologic monitoring of plague infectious agent, and analysis of epidemics and outbreaks of plague with establishing the source of origin of the strain and routes of introduction of the infection.
NASA Astrophysics Data System (ADS)
Al-Refaie, Ahmed F.; Tennyson, Jonathan
2017-12-01
Construction and diagonalization of the Hamiltonian matrix is the rate-limiting step in most low-energy electron - molecule collision calculations. Tennyson (1996) implemented a novel algorithm for Hamiltonian construction which took advantage of the structure of the wavefunction in such calculations. This algorithm is re-engineered to make use of modern computer architectures and the use of appropriate diagonalizers is considered. Test calculations demonstrate that significant speed-ups can be gained using multiple CPUs. This opens the way to calculations which consider higher collision energies, larger molecules and / or more target states. The methodology, which is implemented as part of the UK molecular R-matrix codes (UKRMol and UKRMol+) can also be used for studies of bound molecular Rydberg states, photoionization and positron-molecule collisions.
Efficient and Flexible Computation of Many-Electron Wave Function Overlaps.
Plasser, Felix; Ruckenbauer, Matthias; Mai, Sebastian; Oppel, Markus; Marquetand, Philipp; González, Leticia
2016-03-08
A new algorithm for the computation of the overlap between many-electron wave functions is described. This algorithm allows for the extensive use of recurring intermediates and thus provides high computational efficiency. Because of the general formalism employed, overlaps can be computed for varying wave function types, molecular orbitals, basis sets, and molecular geometries. This paves the way for efficiently computing nonadiabatic interaction terms for dynamics simulations. In addition, other application areas can be envisaged, such as the comparison of wave functions constructed at different levels of theory. Aside from explaining the algorithm and evaluating the performance, a detailed analysis of the numerical stability of wave function overlaps is carried out, and strategies for overcoming potential severe pitfalls due to displaced atoms and truncated wave functions are presented.
Rusu, Mirabela; Birmanns, Stefan
2010-04-01
A structural characterization of multi-component cellular assemblies is essential to explain the mechanisms governing biological function. Macromolecular architectures may be revealed by integrating information collected from various biophysical sources - for instance, by interpreting low-resolution electron cryomicroscopy reconstructions in relation to the crystal structures of the constituent fragments. A simultaneous registration of multiple components is beneficial when building atomic models as it introduces additional spatial constraints to facilitate the native placement inside the map. The high-dimensional nature of such a search problem prevents the exhaustive exploration of all possible solutions. Here we introduce a novel method based on genetic algorithms, for the efficient exploration of the multi-body registration search space. The classic scheme of a genetic algorithm was enhanced with new genetic operations, tabu search and parallel computing strategies and validated on a benchmark of synthetic and experimental cryo-EM datasets. Even at a low level of detail, for example 35-40 A, the technique successfully registered multiple component biomolecules, measuring accuracies within one order of magnitude of the nominal resolutions of the maps. The algorithm was implemented using the Sculptor molecular modeling framework, which also provides a user-friendly graphical interface and enables an instantaneous, visual exploration of intermediate solutions. (c) 2009 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Nikitin, A. V.; Rey, M.; Champion, J. P.; Tyuterev, Vl. G.
2012-07-01
The MIRS software for the modeling of ro-vibrational spectra of polyatomic molecules was considerably extended and improved. The original version [Nikitin AV, Champion JP, Tyuterev VlG. The MIRS computer package for modeling the rovibrational spectra of polyatomic molecules. J Quant Spectrosc Radiat Transf 2003;82:239-49.] was especially designed for separate or simultaneous treatments of complex band systems of polyatomic molecules. It was set up in the frame of effective polyad models by using algorithms based on advanced group theory algebra to take full account of symmetry properties. It has been successfully used for predictions and data fitting (positions and intensities) of numerous spectra of symmetric and spherical top molecules within the vibration extrapolation scheme. The new version offers more advanced possibilities for spectra calculations and modeling by getting rid of several previous limitations particularly for the size of polyads and the number of tensors involved. It allows dealing with overlapping polyads and includes more efficient and faster algorithms for the calculation of coefficients related to molecular symmetry properties (6C, 9C and 12C symbols for C3v, Td, and Oh point groups) and for better convergence of least-square-fit iterations as well. The new version is not limited to polyad effective models. It also allows direct predictions using full ab initio ro-vibrational normal mode Hamiltonians converted into the irreducible tensor form. Illustrative examples on CH3D, CH4, CH3Cl, CH3F and PH3 are reported reflecting the present status of data available. It is written in C++ for standard PC computer operating under Windows. The full package including on-line documentation and recent data are freely available at http://www.iao.ru/mirs/mirs.htm or http://xeon.univ-reims.fr/Mirs/ or http://icb.u-bourgogne.fr/OMR/SMA/SHTDS/MIRS.html and as supplementary data from the online version of the article.
An ensemble predictive modeling framework for breast cancer classification.
Nagarajan, Radhakrishnan; Upreti, Meenakshi
2017-12-01
Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Model Order Reduction Algorithm for Estimating the Absorption Spectrum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Beeumen, Roel; Williams-Young, David B.; Kasper, Joseph M.
The ab initio description of the spectral interior of the absorption spectrum poses both a theoretical and computational challenge for modern electronic structure theory. Due to the often spectrally dense character of this domain in the quantum propagator’s eigenspectrum for medium-to-large sized systems, traditional approaches based on the partial diagonalization of the propagator often encounter oscillatory and stagnating convergence. Electronic structure methods which solve the molecular response problem through the solution of spectrally shifted linear systems, such as the complex polarization propagator, offer an alternative approach which is agnostic to the underlying spectral density or domain location. This generality comesmore » at a seemingly high computational cost associated with solving a large linear system for each spectral shift in some discretization of the spectral domain of interest. In this work, we present a novel, adaptive solution to this high computational overhead based on model order reduction techniques via interpolation. Model order reduction reduces the computational complexity of mathematical models and is ubiquitous in the simulation of dynamical systems and control theory. The efficiency and effectiveness of the proposed algorithm in the ab initio prediction of X-ray absorption spectra is demonstrated using a test set of challenging water clusters which are spectrally dense in the neighborhood of the oxygen K-edge. On the basis of a single, user defined tolerance we automatically determine the order of the reduced models and approximate the absorption spectrum up to the given tolerance. We also illustrate that, for the systems studied, the automatically determined model order increases logarithmically with the problem dimension, compared to a linear increase of the number of eigenvalues within the energy window. Furthermore, we observed that the computational cost of the proposed algorithm only scales quadratically with respect to the problem dimension.« less
Efficiency in nonequilibrium molecular dynamics Monte Carlo simulations
Radak, Brian K.; Roux, Benoît
2016-10-07
Hybrid algorithms combining nonequilibrium molecular dynamics and Monte Carlo (neMD/MC) offer a powerful avenue for improving the sampling efficiency of computer simulations of complex systems. These neMD/MC algorithms are also increasingly finding use in applications where conventional approaches are impractical, such as constant-pH simulations with explicit solvent. However, selecting an optimal nonequilibrium protocol for maximum efficiency often represents a non-trivial challenge. This work evaluates the efficiency of a broad class of neMD/MC algorithms and protocols within the theoretical framework of linear response theory. The approximations are validated against constant pH-MD simulations and shown to provide accurate predictions of neMD/MC performance.more » An assessment of a large set of protocols confirms (both theoretically and empirically) that a linear work protocol gives the best neMD/MC performance. Lastly, a well-defined criterion for optimizing the time parameters of the protocol is proposed and demonstrated with an adaptive algorithm that improves the performance on-the-fly with minimal cost.« less
LOR-interleaving image reconstruction for PET imaging with fractional-crystal collimation
NASA Astrophysics Data System (ADS)
Li, Yusheng; Matej, Samuel; Karp, Joel S.; Metzler, Scott D.
2015-01-01
Positron emission tomography (PET) has become an important modality in medical and molecular imaging. However, in most PET applications, the resolution is still mainly limited by the physical crystal sizes or the detector’s intrinsic spatial resolution. To achieve images with better spatial resolution in a central region of interest (ROI), we have previously proposed using collimation in PET scanners. The collimator is designed to partially mask detector crystals to detect lines of response (LORs) within fractional crystals. A sequence of collimator-encoded LORs is measured with different collimation configurations. This novel collimated scanner geometry makes the reconstruction problem challenging, as both detector and collimator effects need to be modeled to reconstruct high-resolution images from collimated LORs. In this paper, we present a LOR-interleaving (LORI) algorithm, which incorporates these effects and has the advantage of reusing existing reconstruction software, to reconstruct high-resolution images for PET with fractional-crystal collimation. We also develop a 3D ray-tracing model incorporating both the collimator and crystal penetration for simulations and reconstructions of the collimated PET. By registering the collimator-encoded LORs with the collimator configurations, high-resolution LORs are restored based on the modeled transfer matrices using the non-negative least-squares method and EM algorithm. The resolution-enhanced images are then reconstructed from the high-resolution LORs using the MLEM or OSEM algorithm. For validation, we applied the LORI method to a small-animal PET scanner, A-PET, with a specially designed collimator. We demonstrate through simulated reconstructions with a hot-rod phantom and MOBY phantom that the LORI reconstructions can substantially improve spatial resolution and quantification compared to the uncollimated reconstructions. The LORI algorithm is crucial to improve overall image quality of collimated PET, which can have significant implications in preclinical and clinical ROI imaging applications.
Preserving the Boltzmann ensemble in replica-exchange molecular dynamics.
Cooke, Ben; Schmidler, Scott C
2008-10-28
We consider the convergence behavior of replica-exchange molecular dynamics (REMD) [Sugita and Okamoto, Chem. Phys. Lett. 314, 141 (1999)] based on properties of the numerical integrators in the underlying isothermal molecular dynamics (MD) simulations. We show that a variety of deterministic algorithms favored by molecular dynamics practitioners for constant-temperature simulation of biomolecules fail either to be measure invariant or irreducible, and are therefore not ergodic. We then show that REMD using these algorithms also fails to be ergodic. As a result, the entire configuration space may not be explored even in an infinitely long simulation, and the simulation may not converge to the desired equilibrium Boltzmann ensemble. Moreover, our analysis shows that for initial configurations with unfavorable energy, it may be impossible for the system to reach a region surrounding the minimum energy configuration. We demonstrate these failures of REMD algorithms for three small systems: a Gaussian distribution (simple harmonic oscillator dynamics), a bimodal mixture of Gaussians distribution, and the alanine dipeptide. Examination of the resulting phase plots and equilibrium configuration densities indicates significant errors in the ensemble generated by REMD simulation. We describe a simple modification to address these failures based on a stochastic hybrid Monte Carlo correction, and prove that this is ergodic.
Trajectory NG: portable, compressed, general molecular dynamics trajectories.
Spångberg, Daniel; Larsson, Daniel S D; van der Spoel, David
2011-10-01
We present general algorithms for the compression of molecular dynamics trajectories. The standard ways to store MD trajectories as text or as raw binary floating point numbers result in very large files when efficient simulation programs are used on supercomputers. Our algorithms are based on the observation that differences in atomic coordinates/velocities, in either time or space, are generally smaller than the absolute values of the coordinates/velocities. Also, it is often possible to store values at a lower precision. We apply several compression schemes to compress the resulting differences further. The most efficient algorithms developed here use a block sorting algorithm in combination with Huffman coding. Depending on the frequency of storage of frames in the trajectory, either space, time, or combinations of space and time differences are usually the most efficient. We compare the efficiency of our algorithms with each other and with other algorithms present in the literature for various systems: liquid argon, water, a virus capsid solvated in 15 mM aqueous NaCl, and solid magnesium oxide. We perform tests to determine how much precision is necessary to obtain accurate structural and dynamic properties, as well as benchmark a parallelized implementation of the algorithms. We obtain compression ratios (compared to single precision floating point) of 1:3.3-1:35 depending on the frequency of storage of frames and the system studied.
Sanibel Symposium in the Petascale-Exascale Computational Era
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheng, Hai-Ping
The 56 th Sanibel Symposium was held February 14-19 2016 at the King and Prince Hotel, St. Simons Island, GA. It successfully brought quantum chemists and chemical and condensed matter physicists together in presentations, posters, and informal discussions bridging those two communities. The Symposium has had a significant role in preparing generations of quantum theorists. As computational potency and algorithmic sophistication have grown, the Symposium has evolved to emphasize more heavily computationally oriented method development in chemistry and materials physics, including nanoscience, complex molecular phenomena, and even bio-molecular methods and problems. Given this context, the 56 th Sanibel meeting systematicallymore » and deliberately had sessions focused on exascale computation. A selection of outstanding theoretical problems that need serious attention was included. Five invited sessions, two contributed sessions (hot topics), and a poster session were organized with the exascale theme. This was a historic milestone in the evolution of the Symposia. Just as years ago linear algebra, perturbation theory, density matrices, and band-structure methods dominated early Sanibel Symposia, the exascale sessions of the 56 thmeeting contributed a transformative influence to add structure and strength to the computational physical science community in an unprecedented way. A copy of the full program of the 56 th Symposium is attached. The exascale sessions were Linear Scaling, Non-Adabatic Dynamics, Interpretive Theory and Models, Computation, Software, and Algorithms, and Quantum Monte Carlo. The Symposium Proceedings will be published in Molecular Physics (2017). Note that the Sanibel proceedings from 2015 and 2014 were published as Molecular Physics vol. 114, issue 3-4 (2016) and vol. 113, issue 3-4 (2015) respectively.« less
A relational learning approach to Structure-Activity Relationships in drug design toxicity studies.
Camacho, Rui; Pereira, Max; Costa, Vítor Santos; Fonseca, Nuno A; Adriano, Carlos; Simões, Carlos J V; Brito, Rui M M
2011-09-16
It has been recognized that the development of new therapeutic drugs is a complex and expensive process. A large number of factors affect the activity in vivo of putative candidate molecules and the propensity for causing adverse and toxic effects is recognized as one of the major hurdles behind the current "target-rich, lead-poor" scenario. Structure-Activity Relationship (SAR) studies, using relational Machine Learning (ML) algorithms, have already been shown to be very useful in the complex process of rational drug design. Despite the ML successes, human expertise is still of the utmost importance in the drug development process. An iterative process and tight integration between the models developed by ML algorithms and the know-how of medicinal chemistry experts would be a very useful symbiotic approach. In this paper we describe a software tool that achieves that goal--iLogCHEM. The tool allows the use of Relational Learners in the task of identifying molecules or molecular fragments with potential to produce toxic effects, and thus help in stream-lining drug design in silico. It also allows the expert to guide the search for useful molecules without the need to know the details of the algorithms used. The models produced by the algorithms may be visualized using a graphical interface, that is of common use amongst researchers in structural biology and medicinal chemistry. The graphical interface enables the expert to provide feedback to the learning system. The developed tool has also facilities to handle the similarity bias typical of large chemical databases. For that purpose the user can filter out similar compounds when assembling a data set. Additionally, we propose ways of providing background knowledge for Relational Learners using the results of Graph Mining algorithms. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.
Ferreira, Leonardo L G; Ferreira, Rafaela S; Palomino, David L; Andricopulo, Adriano D
2018-04-27
The glycolytic enzyme fructose-1,6-bisphosphate aldolase is a validated molecular target in human African trypanosomiasis (HAT) drug discovery, a neglected tropical disease (NTD) caused by the protozoan Trypanosoma brucei. Herein, a structure-based virtual screening (SBVS) approach to the identification of novel T. brucei aldolase inhibitors is described. Distinct molecular docking algorithms were used to screen more than 500,000 compounds against the X-ray structure of the enzyme. This SBVS strategy led to the selection of a series of molecules which were evaluated for their activity on recombinant T. brucei aldolase. The effort led to the discovery of structurally new ligands able to inhibit the catalytic activity the enzyme. The predicted binding conformations were additionally investigated in molecular dynamics simulations, which provided useful insights into the enzyme-inhibitor intermolecular interactions. The molecular modeling results along with the enzyme inhibition data generated practical knowledge to be explored in further structure-based drug design efforts in HAT drug discovery. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Variational Identification of Markovian Transition States
NASA Astrophysics Data System (ADS)
Martini, Linda; Kells, Adam; Covino, Roberto; Hummer, Gerhard; Buchete, Nicolae-Viorel; Rosta, Edina
2017-07-01
We present a method that enables the identification and analysis of conformational Markovian transition states from atomistic or coarse-grained molecular dynamics (MD) trajectories. Our algorithm is presented by using both analytical models and examples from MD simulations of the benchmark system helix-forming peptide Ala5 , and of larger, biomedically important systems: the 15-lipoxygenase-2 enzyme (15-LOX-2), the epidermal growth factor receptor (EGFR) protein, and the Mga2 fungal transcription factor. The analysis of 15-LOX-2 uses data generated exclusively from biased umbrella sampling simulations carried out at the hybrid ab initio density functional theory (DFT) quantum mechanics/molecular mechanics (QM/MM) level of theory. In all cases, our method automatically identifies the corresponding transition states and metastable conformations in a variationally optimal way, with the input of a set of relevant coordinates, by accurately reproducing the intrinsic slowest relaxation rate of each system. Our approach offers a general yet easy-to-implement analysis method that provides unique insight into the molecular mechanism and the rare but crucial (i.e., rate-limiting) transition states occurring along conformational transition paths in complex dynamical systems such as molecular trajectories.
Molecular dynamics simulations of aqueous solutions of ethanolamines.
López-Rendón, Roberto; Mora, Marco A; Alejandre, José; Tuckerman, Mark E
2006-08-03
We report on molecular dynamics simulations performed at constant temperature and pressure to study ethanolamines as pure components and in aqueous solutions. A new geometric integration algorithm that preserves the correct phase space volume is employed to study molecules having up to three ethanol chains. The most stable geometry, rotational barriers, and atomic charges were obtained by ab initio calculations in the gas phase. The calculated dipole moments agree well with available experimental data. The most stable conformation, due to intramolecular hydrogen bonding interactions, has a ringlike structure in one of the ethanol chains, leading to high molecular stability. All molecular dynamics simulations were performed in the liquid phase. The interaction parameters are the same for the atoms in the ethanol chains, reducing the number of variables in the potential model. Intermolecular hydrogen bonding is also analyzed, and it is shown that water associates at low water mole fractions. The force field reproduced (within 1%) the experimental liquid densities at different temperatures of pure components and aqueous solutions at 313 K. The excess and partial molar volumes are analyzed as a function of ethanolamine concentration.
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Artificial neural networks and the study of the psychoactivity of cannabinoid compounds.
Honório, Káthia M; de Lima, Emmanuela F; Quiles, Marcos G; Romero, Roseli A F; Molfetta, Fábio A; da Silva, Albérico B F
2010-06-01
Cannabinoid compounds have widely been employed because of its medicinal and psychotropic properties. These compounds are isolated from Cannabis sativa (or marijuana) and are used in several medical treatments, such as glaucoma, nausea associated to chemotherapy, pain and many other situations. More recently, its use as appetite stimulant has been indicated in patients with cachexia or AIDS. In this work, the influence of several molecular descriptors on the psychoactivity of 50 cannabinoid compounds is analyzed aiming one obtain a model able to predict the psychoactivity of new cannabinoids. For this purpose, initially, the selection of descriptors was carried out using the Fisher's weight, the correlation matrix among the calculated variables and principal component analysis. From these analyses, the following descriptors have been considered more relevant: E(LUMO) (energy of the lowest unoccupied molecular orbital), Log P (logarithm of the partition coefficient), VC4 (volume of the substituent at the C4 position) and LP1 (Lovasz-Pelikan index, a molecular branching index). To follow, two neural network models were used to construct a more adequate model for classifying new cannabinoid compounds. The first model employed was multi-layer perceptrons, with algorithm back-propagation, and the second model used was the Kohonen network. The results obtained from both networks were compared and showed that both techniques presented a high percentage of correctness to discriminate psychoactive and psychoinactive compounds. However, the Kohonen network was superior to multi-layer perceptrons.
The Temperature and Distribution of Organic Molecules in the Inner Regions of T Tauri Disks
NASA Technical Reports Server (NTRS)
Mandell, Avi
2012-01-01
"High-resolution NIR spectroscopic observations of warm molecular gas emission from young circumstellar disks allow us to constrain the temperature and composition of material in the inner planet-forming region. By combining advanced data reduction algorithms with accurate modeling of the terrestrial atmospheric spectrum and a novel double-differencing data analysis technique, we have achieved very high-contrast measurements (S/N approx. 500-1000) of molecular emission at 3 microns. In disks around low-mass stars, we have achieved the first detections of emission from HCN and C2H2 at near-infrared wavelengths from several bright T Tauri stars using the CRIRES spectrograph on the Very Large Telescope and NIRSPEC spectrograph on the Keck Telescope. We spectrally resolve the line shape, showing that the emission has both a Keplerian and non-Keplerian component as observed previously for CO emission. We used a simplified single-temperature local thermal equilibrium (LTE) slab model with a Gaussian line profile to make line identifications and determine a best-fit temperature and initial abundance ratios, and we then compared these values with constraints derived from a detailed disk radiative transfer model assuming LTE excitation but utilizing a realistic temperature and density structure. Abundance ratios from both sets of models are consistent with each other and consistent with expected values from theoretical chemical models, and analysis of the line shapes suggests that the molecular emission originates from within a narrow region in the inner disk (R < 1 AU)."
A computational kinetic model of diffusion for molecular systems.
Teo, Ivan; Schulten, Klaus
2013-09-28
Regulation of biomolecular transport in cells involves intra-protein steps like gating and passage through channels, but these steps are preceded by extra-protein steps, namely, diffusive approach and admittance of solutes. The extra-protein steps develop over a 10-100 nm length scale typically in a highly particular environment, characterized through the protein's geometry, surrounding electrostatic field, and location. In order to account for solute energetics and mobility of solutes in this environment at a relevant resolution, we propose a particle-based kinetic model of diffusion based on a Markov State Model framework. Prerequisite input data consist of diffusion coefficient and potential of mean force maps generated from extensive molecular dynamics simulations of proteins and their environment that sample multi-nanosecond durations. The suggested diffusion model can describe transport processes beyond microsecond duration, relevant for biological function and beyond the realm of molecular dynamics simulation. For this purpose the systems are represented by a discrete set of states specified by the positions, volumes, and surface elements of Voronoi grid cells distributed according to a density function resolving the often intricate relevant diffusion space. Validation tests carried out for generic diffusion spaces show that the model and the associated Brownian motion algorithm are viable over a large range of parameter values such as time step, diffusion coefficient, and grid density. A concrete application of the method is demonstrated for ion diffusion around and through the Eschericia coli mechanosensitive channel of small conductance ecMscS.
Cross-Platform Toxicogenomics for the Prediction of Non-Genotoxic Hepatocarcinogenesis in Rat
Metzger, Ute; Templin, Markus F.; Plummer, Simon; Ellinger-Ziegelbauer, Heidrun; Zell, Andreas
2014-01-01
In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens. PMID:24830643
Molecular Modeling of Thermodynamic and Transport Properties for CO 2 and Aqueous Brines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Hao; Economou, Ioannis G.; Panagiotopoulos, Athanassios Z.
Molecular simulation techniques using classical force-fields occupy the space between ab initio quantum mechanical methods and phenomenological correlations. In particular, Monte Carlo and molecular dynamics algorithms can be used to provide quantitative predictions of thermodynamic and transport properties of fluids relevant for geologic carbon sequestration at conditions for which experimental data are uncertain or not available. These methods can cover time and length scales far exceeding those of quantum chemical methods, while maintaining transferability and predictive power lacking from phenomenological correlations. The accuracy of predictions depends sensitively on the quality of the molecular models used. Many existing fixed-point-charge models formore » water and aqueous mixtures fail to represent accurately these fluid properties, especially when descriptions covering broad ranges of thermodynamic conditions are needed. Recent work on development of accurate models for water, CO 2, and dissolved salts, as well as their mixtures, is summarized in this Account. Polarizable models that can respond to the different dielectric environments in aqueous versus nonaqueous phases are necessary for predictions of properties over extended ranges of temperatures and pressures. Phase compositions and densities, activity coefficients of the dissolved salts, interfacial tensions, viscosities and diffusivities can be obtained in near-quantitative agreement to available experimental data, using relatively modest computational resources. In some cases, for example, for the composition of the CO 2-rich phase in coexistence with an aqueous phase, recent results from molecular simulations have helped discriminate among conflicting experimental data sets. The sensitivity of properties on the quality of the intermolecular interaction model varies significantly. Properties such as the phase compositions or electrolyte activity coefficients are much more sensitive than phase densities, viscosities, or component diffusivities. Strong confinement effects on physical properties in nanoscale media can also be directly obtained from molecular simulations. Future work on molecular modeling for CO 2 and aqueous brines is likely to be focused on more systematic generation of interaction models by utilizing quantum chemical as well as direct experimental measurements. New ion models need to be developed for use with the current generation of polarizable water models, including ion–ion interactions that will allow for accurate description of dense, mixed brines. Methods will need to be devised that go beyond the use of effective potentials for incorporation of quantum effects known to be important for water, and reactive force fields developed that can handle bond creation and breaking in systems with carbonate and silicate minerals. Lastly, another area of potential future work is the integration of molecular simulation methods in multiscale models for the chemical reactions leading to mineral dissolution and flow within the porous media in underground formations.« less
Molecular Modeling of Thermodynamic and Transport Properties for CO2 and Aqueous Brines.
Jiang, Hao; Economou, Ioannis G; Panagiotopoulos, Athanassios Z
2017-04-18
Molecular simulation techniques using classical force-fields occupy the space between ab initio quantum mechanical methods and phenomenological correlations. In particular, Monte Carlo and molecular dynamics algorithms can be used to provide quantitative predictions of thermodynamic and transport properties of fluids relevant for geologic carbon sequestration at conditions for which experimental data are uncertain or not available. These methods can cover time and length scales far exceeding those of quantum chemical methods, while maintaining transferability and predictive power lacking from phenomenological correlations. The accuracy of predictions depends sensitively on the quality of the molecular models used. Many existing fixed-point-charge models for water and aqueous mixtures fail to represent accurately these fluid properties, especially when descriptions covering broad ranges of thermodynamic conditions are needed. Recent work on development of accurate models for water, CO 2 , and dissolved salts, as well as their mixtures, is summarized in this Account. Polarizable models that can respond to the different dielectric environments in aqueous versus nonaqueous phases are necessary for predictions of properties over extended ranges of temperatures and pressures. Phase compositions and densities, activity coefficients of the dissolved salts, interfacial tensions, viscosities and diffusivities can be obtained in near-quantitative agreement to available experimental data, using relatively modest computational resources. In some cases, for example, for the composition of the CO 2 -rich phase in coexistence with an aqueous phase, recent results from molecular simulations have helped discriminate among conflicting experimental data sets. The sensitivity of properties on the quality of the intermolecular interaction model varies significantly. Properties such as the phase compositions or electrolyte activity coefficients are much more sensitive than phase densities, viscosities, or component diffusivities. Strong confinement effects on physical properties in nanoscale media can also be directly obtained from molecular simulations. Future work on molecular modeling for CO 2 and aqueous brines is likely to be focused on more systematic generation of interaction models by utilizing quantum chemical as well as direct experimental measurements. New ion models need to be developed for use with the current generation of polarizable water models, including ion-ion interactions that will allow for accurate description of dense, mixed brines. Methods will need to be devised that go beyond the use of effective potentials for incorporation of quantum effects known to be important for water, and reactive force fields developed that can handle bond creation and breaking in systems with carbonate and silicate minerals. Another area of potential future work is the integration of molecular simulation methods in multiscale models for the chemical reactions leading to mineral dissolution and flow within the porous media in underground formations.
Molecular Modeling of Thermodynamic and Transport Properties for CO 2 and Aqueous Brines
Jiang, Hao; Economou, Ioannis G.; Panagiotopoulos, Athanassios Z.
2017-02-24
Molecular simulation techniques using classical force-fields occupy the space between ab initio quantum mechanical methods and phenomenological correlations. In particular, Monte Carlo and molecular dynamics algorithms can be used to provide quantitative predictions of thermodynamic and transport properties of fluids relevant for geologic carbon sequestration at conditions for which experimental data are uncertain or not available. These methods can cover time and length scales far exceeding those of quantum chemical methods, while maintaining transferability and predictive power lacking from phenomenological correlations. The accuracy of predictions depends sensitively on the quality of the molecular models used. Many existing fixed-point-charge models formore » water and aqueous mixtures fail to represent accurately these fluid properties, especially when descriptions covering broad ranges of thermodynamic conditions are needed. Recent work on development of accurate models for water, CO 2, and dissolved salts, as well as their mixtures, is summarized in this Account. Polarizable models that can respond to the different dielectric environments in aqueous versus nonaqueous phases are necessary for predictions of properties over extended ranges of temperatures and pressures. Phase compositions and densities, activity coefficients of the dissolved salts, interfacial tensions, viscosities and diffusivities can be obtained in near-quantitative agreement to available experimental data, using relatively modest computational resources. In some cases, for example, for the composition of the CO 2-rich phase in coexistence with an aqueous phase, recent results from molecular simulations have helped discriminate among conflicting experimental data sets. The sensitivity of properties on the quality of the intermolecular interaction model varies significantly. Properties such as the phase compositions or electrolyte activity coefficients are much more sensitive than phase densities, viscosities, or component diffusivities. Strong confinement effects on physical properties in nanoscale media can also be directly obtained from molecular simulations. Future work on molecular modeling for CO 2 and aqueous brines is likely to be focused on more systematic generation of interaction models by utilizing quantum chemical as well as direct experimental measurements. New ion models need to be developed for use with the current generation of polarizable water models, including ion–ion interactions that will allow for accurate description of dense, mixed brines. Methods will need to be devised that go beyond the use of effective potentials for incorporation of quantum effects known to be important for water, and reactive force fields developed that can handle bond creation and breaking in systems with carbonate and silicate minerals. Lastly, another area of potential future work is the integration of molecular simulation methods in multiscale models for the chemical reactions leading to mineral dissolution and flow within the porous media in underground formations.« less
NASA Astrophysics Data System (ADS)
Li, Hongzhi; Min, Donghong; Liu, Yusong; Yang, Wei
2007-09-01
To overcome the possible pseudoergodicity problem, molecular dynamic simulation can be accelerated via the realization of an energy space random walk. To achieve this, a biased free energy function (BFEF) needs to be priori obtained. Although the quality of BFEF is essential for sampling efficiency, its generation is usually tedious and nontrivial. In this work, we present an energy space metadynamics algorithm to efficiently and robustly obtain BFEFs. Moreover, in order to deal with the associated diffusion sampling problem caused by the random walk in the total energy space, the idea in the original umbrella sampling method is generalized to be the random walk in the essential energy space, which only includes the energy terms determining the conformation of a region of interest. This essential energy space generalization allows the realization of efficient localized enhanced sampling and also offers the possibility of further sampling efficiency improvement when high frequency energy terms irrelevant to the target events are free of activation. The energy space metadynamics method and its generalization in the essential energy space for the molecular dynamics acceleration are demonstrated in the simulation of a pentanelike system, the blocked alanine dipeptide model, and the leucine model.
Asymmetric bagging and feature selection for activities prediction of drug molecules.
Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu
2008-05-28
Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.
Fast flexible modeling of RNA structure using internal coordinates.
Flores, Samuel Coulbourn; Sherman, Michael A; Bruns, Christopher M; Eastman, Peter; Altman, Russ Biagio
2011-01-01
Modeling the structure and dynamics of large macromolecules remains a critical challenge. Molecular dynamics (MD) simulations are expensive because they model every atom independently, and are difficult to combine with experimentally derived knowledge. Assembly of molecules using fragments from libraries relies on the database of known structures and thus may not work for novel motifs. Coarse-grained modeling methods have yielded good results on large molecules but can suffer from difficulties in creating more detailed full atomic realizations. There is therefore a need for molecular modeling algorithms that remain chemically accurate and economical for large molecules, do not rely on fragment libraries, and can incorporate experimental information. RNABuilder works in the internal coordinate space of dihedral angles and thus has time requirements proportional to the number of moving parts rather than the number of atoms. It provides accurate physics-based response to applied forces, but also allows user-specified forces for incorporating experimental information. A particular strength of RNABuilder is that all Leontis-Westhof basepairs can be specified as primitives by the user to be satisfied during model construction. We apply RNABuilder to predict the structure of an RNA molecule with 160 bases from its secondary structure, as well as experimental information. Our model matches the known structure to 10.2 Angstroms RMSD and has low computational expense.
A study of metaheuristic algorithms for high dimensional feature selection on microarray data
NASA Astrophysics Data System (ADS)
Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna
2017-11-01
Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.
Coding considerations for standalone molecular dynamics simulations of atomistic structures
NASA Astrophysics Data System (ADS)
Ocaya, R. O.; Terblans, J. J.
2017-10-01
The laws of Newtonian mechanics allow ab-initio molecular dynamics to model and simulate particle trajectories in material science by defining a differentiable potential function. This paper discusses some considerations for the coding of ab-initio programs for simulation on a standalone computer and illustrates the approach by C language codes in the context of embedded metallic atoms in the face-centred cubic structure. The algorithms use velocity-time integration to determine particle parameter evolution for up to several thousands of particles in a thermodynamical ensemble. Such functions are reusable and can be placed in a redistributable header library file. While there are both commercial and free packages available, their heuristic nature prevents dissection. In addition, developing own codes has the obvious advantage of teaching techniques applicable to new problems.
FESetup: Automating Setup for Alchemical Free Energy Simulations.
Loeffler, Hannes H; Michel, Julien; Woods, Christopher
2015-12-28
FESetup is a new pipeline tool which can be used flexibly within larger workflows. The tool aims to support fast and easy setup of alchemical free energy simulations for molecular simulation packages such as AMBER, GROMACS, Sire, or NAMD. Post-processing methods like MM-PBSA and LIE can be set up as well. Ligands are automatically parametrized with AM1-BCC, and atom mappings for a single topology description are computed with a maximum common substructure search (MCSS) algorithm. An abstract molecular dynamics (MD) engine can be used for equilibration prior to free energy setup or standalone. Currently, all modern AMBER force fields are supported. Ease of use, robustness of the code, and automation where it is feasible are the main development goals. The project follows an open development model, and we welcome contributions.
Integrated Multiscale Modeling of Molecular Computing Devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gregory Beylkin
2012-03-23
Significant advances were made on all objectives of the research program. We have developed fast multiresolution methods for performing electronic structure calculations with emphasis on constructing efficient representations of functions and operators. We extended our approach to problems of scattering in solids, i.e. constructing fast algorithms for computing above the Fermi energy level. Part of the work was done in collaboration with Robert Harrison and George Fann at ORNL. Specific results (in part supported by this grant) are listed here and are described in greater detail. (1) We have implemented a fast algorithm to apply the Green's function for themore » free space (oscillatory) Helmholtz kernel. The algorithm maintains its speed and accuracy when the kernel is applied to functions with singularities. (2) We have developed a fast algorithm for applying periodic and quasi-periodic, oscillatory Green's functions and those with boundary conditions on simple domains. Importantly, the algorithm maintains its speed and accuracy when applied to functions with singularities. (3) We have developed a fast algorithm for obtaining and applying multiresolution representations of periodic and quasi-periodic Green's functions and Green's functions with boundary conditions on simple domains. (4) We have implemented modifications to improve the speed of adaptive multiresolution algorithms for applying operators which are represented via a Gaussian expansion. (5) We have constructed new nearly optimal quadratures for the sphere that are invariant under the icosahedral rotation group. (6) We obtained new results on approximation of functions by exponential sums and/or rational functions, one of the key methods that allows us to construct separated representations for Green's functions. (7) We developed a new fast and accurate reduction algorithm for obtaining optimal approximation of functions by exponential sums and/or their rational representations.« less
Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger
2017-01-01
Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.
Andrews, Steven S
2017-03-01
Smoldyn is a spatial and stochastic biochemical simulator. It treats each molecule of interest as an individual particle in continuous space, simulating molecular diffusion, molecule-membrane interactions and chemical reactions, all with good accuracy. This article presents several new features. Smoldyn now supports two types of rule-based modeling. These are a wildcard method, which is very convenient, and the BioNetGen package with extensions for spatial simulation, which is better for complicated models. Smoldyn also includes new algorithms for simulating the diffusion of surface-bound molecules and molecules with excluded volume. Both are exact in the limit of short time steps and reasonably good with longer steps. In addition, Smoldyn supports single-molecule tracking simulations. Finally, the Smoldyn source code can be accessed through a C/C ++ language library interface. Smoldyn software, documentation, code, and examples are at http://www.smoldyn.org . steven.s.andrews@gmail.com. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Quantitative kinetic theory of active matter
NASA Astrophysics Data System (ADS)
Ihle, Thomas; Chou, Yen-Liang
2014-03-01
Models of self-driven agents similar to the Vicsek model [Phys. Rev. Lett. 75 (1995) 1226] are studied by means of kinetic theory. In these models, particles try to align their travel directions with the average direction of their neighbours. At strong alignment a globally ordered state of collective motion forms. An Enskog-like kinetic theory is derived from the exact Chapman-Kolmogorov equation in phase space using Boltzmann's mean-field approximation of molecular chaos. The kinetic equation is solved numerically by a nonlocal Lattice-Boltzmann-like algorithm. Steep soliton-like waves are observed that lead to an abrupt jump of the global order parameter if the noise level is changed. The shape of the wave is shown to follow a novel scaling law and to quantitatively agree within 3 % with agent-based simulations at large particle speeds. This provides a mean-field mechanism to change the second-order character of the flocking transition to first order. Diagrammatic techniques are used to investigate small particle speeds, where the mean-field assumption of Molecular Chaos is invalid and where correlation effects need to be included.
Cao, D-S; Zhao, J-C; Yang, Y-N; Zhao, C-X; Yan, J; Liu, S; Hu, Q-N; Xu, Q-S; Liang, Y-Z
2012-01-01
There is a great need to assess the harmful effects or toxicities of chemicals to which man is exposed. In the present paper, the simplified molecular input line entry specification (SMILES) representation-based string kernel, together with the state-of-the-art support vector machine (SVM) algorithm, were used to classify the toxicity of chemicals from the US Environmental Protection Agency Distributed Structure-Searchable Toxicity (DSSTox) database network. In this method, the molecular structure can be directly encoded by a series of SMILES substrings that represent the presence of some chemical elements and different kinds of chemical bonds (double, triple and stereochemistry) in the molecules. Thus, SMILES string kernel can accurately and directly measure the similarities of molecules by a series of local information hidden in the molecules. Two model validation approaches, five-fold cross-validation and independent validation set, were used for assessing the predictive capability of our developed models. The results obtained indicate that SVM based on the SMILES string kernel can be regarded as a very promising and alternative modelling approach for potential toxicity prediction of chemicals.
A Fast Algorithm for Massively Parallel, Long-Term, Simulation of Complex Molecular Dynamics Systems
NASA Technical Reports Server (NTRS)
Jaramillo-Botero, Andres; Goddard, William A, III; Fijany, Amir
1997-01-01
The advances in theory and computing technology over the last decade have led to enormous progress in applying atomistic molecular dynamics (MD) methods to the characterization, prediction, and design of chemical, biological, and material systems,.
Modeling the Mechanism of Action of a DGAT1 Inhibitor Using a Causal Reasoning Platform
Enayetallah, Ahmed E.; Ziemek, Daniel; Leininger, Michael T.; Randhawa, Ranjit; Yang, Jianxin; Manion, Tara B.; Mather, Dawn E.; Zavadoski, William J.; Kuhn, Max; Treadway, Judith L.; des Etages, Shelly Ann G.; Gibbs, E. Michael; Greene, Nigel; Steppan, Claire M.
2011-01-01
Triglyceride accumulation is associated with obesity and type 2 diabetes. Genetic disruption of diacylglycerol acyltransferase 1 (DGAT1), which catalyzes the final reaction of triglyceride synthesis, confers dramatic resistance to high-fat diet induced obesity. Hence, DGAT1 is considered a potential therapeutic target for treating obesity and related metabolic disorders. However, the molecular events shaping the mechanism of action of DGAT1 pharmacological inhibition have not been fully explored yet. Here, we investigate the metabolic molecular mechanisms induced in response to pharmacological inhibition of DGAT1 using a recently developed computational systems biology approach, the Causal Reasoning Engine (CRE). The CRE algorithm utilizes microarray transcriptomic data and causal statements derived from the biomedical literature to infer upstream molecular events driving these transcriptional changes. The inferred upstream events (also called hypotheses) are aggregated into biological models using a set of analytical tools that allow for evaluation and integration of the hypotheses in context of their supporting evidence. In comparison to gene ontology enrichment analysis which pointed to high-level changes in metabolic processes, the CRE results provide detailed molecular hypotheses to explain the measured transcriptional changes. CRE analysis of gene expression changes in high fat habituated rats treated with a potent and selective DGAT1 inhibitor demonstrate that the majority of transcriptomic changes support a metabolic network indicative of reversal of high fat diet effects that includes a number of molecular hypotheses such as PPARG, HNF4A and SREBPs. Finally, the CRE-generated molecular hypotheses from DGAT1 inhibitor treated rats were found to capture the major molecular characteristics of DGAT1 deficient mice, supporting a phenotype of decreased lipid and increased insulin sensitivity. PMID:22073239
Stanke, Monika; Palikot, Ewa; Kȩdziera, Dariusz; Adamowicz, Ludwik
2016-12-14
An algorithm for calculating the first-order electronic orbit-orbit magnetic interaction correction for an electronic wave function expanded in terms of all-electron explicitly correlated molecular Gaussian (ECG) functions with shifted centers is derived and implemented. The algorithm is tested in calculations concerning the H 2 molecule. It is also applied in calculations for LiH and H 3 + molecular systems. The implementation completes our work on the leading relativistic correction for ECGs and paves the way for very accurate ECG calculations of ground and excited potential energy surfaces (PESs) of small molecules with two and more nuclei and two and more electrons, such as HeH - , H 3 + , HeH 2 + , and LiH 2 + . The PESs will be used to determine rovibrational spectra of the systems.
Wang, Zhaocai; Pu, Jun; Cao, Liling; Tan, Jian
2015-01-01
The unbalanced assignment problem (UAP) is to optimally resolve the problem of assigning n jobs to m individuals (m < n), such that minimum cost or maximum profit obtained. It is a vitally important Non-deterministic Polynomial (NP) complete problem in operation management and applied mathematics, having numerous real life applications. In this paper, we present a new parallel DNA algorithm for solving the unbalanced assignment problem using DNA molecular operations. We reasonably design flexible-length DNA strands representing different jobs and individuals, take appropriate steps, and get the solutions of the UAP in the proper length range and O(mn) time. We extend the application of DNA molecular operations and simultaneity to simplify the complexity of the computation. PMID:26512650
Learning molecular energies using localized graph kernels
Ferré, Grégoire; Haut, Terry Scot; Barros, Kipton Marcos
2017-03-21
We report that recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturallymore » incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. This Graph Approximated Energy (GRAPE) approach is flexible and admits many possible extensions. Finally, we benchmark a simple version of GRAPE by predicting atomization energies on a standard dataset of organic molecules.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Endres, Florian, E-mail: florian.endres@ltm.uni-erlangen.de; Steinmann, Paul, E-mail: paul.steinmann@ltm.uni-erlangen.de
2016-01-14
Ferroelectric functional materials are of great interest in science and technology due to their electromechanically coupled material properties. Therefore, ferroelectrics, such as barium titanate, are modeled and simulated at the continuum scale as well as at the atomistic scale. Due to recent advancements in related manufacturing technologies the modeling and simulation of smart materials at the nanometer length scale is getting more important not only to predict but also fundamentally understand the complex material behavior of such materials. In this study, we analyze the size effects of 109° nanodomain walls in ferroelectric barium titanate single crystals in the rhombohedral phasemore » using a recently proposed extended molecular statics algorithm. We study the impact of domain thicknesses on the spontaneous polarization, the coercive field, and the lattice constants. Moreover, we discuss how the electromechanical coupling of an applied electric field and the introduced strain in the converse piezoelectric effect is affected by the thickness of nanodomains.« less
Learning molecular energies using localized graph kernels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferré, Grégoire; Haut, Terry Scot; Barros, Kipton Marcos
We report that recent machine learning methods make it possible to model potential energy of atomic configurations with chemical-level accuracy (as calculated from ab initio calculations) and at speeds suitable for molecular dynamics simulation. Best performance is achieved when the known physical constraints are encoded in the machine learning models. For example, the atomic energy is invariant under global translations and rotations; it is also invariant to permutations of same-species atoms. Although simple to state, these symmetries are complicated to encode into machine learning algorithms. In this paper, we present a machine learning approach based on graph theory that naturallymore » incorporates translation, rotation, and permutation symmetries. Specifically, we use a random walk graph kernel to measure the similarity of two adjacency matrices, each of which represents a local atomic environment. This Graph Approximated Energy (GRAPE) approach is flexible and admits many possible extensions. Finally, we benchmark a simple version of GRAPE by predicting atomization energies on a standard dataset of organic molecules.« less
NASA Astrophysics Data System (ADS)
Davies, Michael; Ganapathysubramanian, Baskar; Balasubramanian, Ganesh
2017-03-01
We present results from a computational framework integrating genetic algorithm and molecular dynamics simulations to systematically design isotope engineered graphene structures for reduced thermal conductivity. In addition to the effect of mass disorder, our results reveal the importance of atomic distribution on thermal conductivity for the same isotopic concentration. Distinct groups of isotope-substituted graphene sheets are identified based on the atomic composition and distribution. Our results show that in structures with equiatomic compositions, the enhanced scattering by lattice vibrations results in lower thermal conductivities due to the absence of isotopic clusters.
Using collective variables to drive molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Fiorin, Giacomo; Klein, Michael L.; Hénin, Jérôme
2013-12-01
A software framework is introduced that facilitates the application of biasing algorithms to collective variables of the type commonly employed to drive massively parallel molecular dynamics (MD) simulations. The modular framework that is presented enables one to combine existing collective variables into new ones, and combine any chosen collective variable with available biasing methods. The latter include the classic time-dependent biases referred to as steered MD and targeted MD, the temperature-accelerated MD algorithm, as well as the adaptive free-energy biases called metadynamics and adaptive biasing force. The present modular software is extensible, and portable between commonly used MD simulation engines.
Correlating Free-Volume Hole Distribution to the Glass Transition Temperature of Epoxy Polymers.
Aramoon, Amin; Breitzman, Timothy D; Woodward, Christopher; El-Awady, Jaafar A
2017-09-07
A new algorithm is developed to quantify the free-volume hole distribution and its evolution in coarse-grained molecular dynamics simulations of polymeric networks. This is achieved by analyzing the geometry of the network rather than a voxelized image of the structure to accurately and efficiently find and quantify free-volume hole distributions within large scale simulations of polymer networks. The free-volume holes are quantified by fitting the largest ellipsoids and spheres in the free-volumes between polymer chains. The free-volume hole distributions calculated from this algorithm are shown to be in excellent agreement with those measured from positron annihilation lifetime spectroscopy (PALS) experiments at different temperature and pressures. Based on the results predicted using this algorithm, an evolution model is proposed for the thermal behavior of an individual free-volume hole. This model is calibrated such that the average radius of free-volumes holes mimics the one predicted from the simulations. The model is then employed to predict the glass-transition temperature of epoxy polymers with different degrees of cross-linking and lengths of prepolymers. Comparison between the predicted glass-transition temperatures and those measured from simulations or experiments implies that this model is capable of successfully predicting the glass-transition temperature of the material using only a PDF of the initial free-volume holes radii of each microstructure. This provides an effective approach for the optimized design of polymeric systems on the basis of the glass-transition temperature, degree of cross-linking, and average length of prepolymers.
Hybrid stochastic simplifications for multiscale gene networks.
Crudu, Alina; Debussche, Arnaud; Radulescu, Ovidiu
2009-09-07
Stochastic simulation of gene networks by Markov processes has important applications in molecular biology. The complexity of exact simulation algorithms scales with the number of discrete jumps to be performed. Approximate schemes reduce the computational time by reducing the number of simulated discrete events. Also, answering important questions about the relation between network topology and intrinsic noise generation and propagation should be based on general mathematical results. These general results are difficult to obtain for exact models. We propose a unified framework for hybrid simplifications of Markov models of multiscale stochastic gene networks dynamics. We discuss several possible hybrid simplifications, and provide algorithms to obtain them from pure jump processes. In hybrid simplifications, some components are discrete and evolve by jumps, while other components are continuous. Hybrid simplifications are obtained by partial Kramers-Moyal expansion [1-3] which is equivalent to the application of the central limit theorem to a sub-model. By averaging and variable aggregation we drastically reduce simulation time and eliminate non-critical reactions. Hybrid and averaged simplifications can be used for more effective simulation algorithms and for obtaining general design principles relating noise to topology and time scales. The simplified models reproduce with good accuracy the stochastic properties of the gene networks, including waiting times in intermittence phenomena, fluctuation amplitudes and stationary distributions. The methods are illustrated on several gene network examples. Hybrid simplifications can be used for onion-like (multi-layered) approaches to multi-scale biochemical systems, in which various descriptions are used at various scales. Sets of discrete and continuous variables are treated with different methods and are coupled together in a physically justified approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Zhihui; Ma, Qiang; Wu, Junlin
2014-12-09
Based on the Gas-Kinetic Unified Algorithm (GKUA) directly solving the Boltzmann model equation, the effect of rotational non-equilibrium is investigated recurring to the kinetic Rykov model with relaxation property of rotational degrees of freedom. The spin movement of diatomic molecule is described by moment of inertia, and the conservation of total angle momentum is taken as a new Boltzmann collision invariant. The molecular velocity distribution function is integrated by the weight factor on the internal energy, and the closed system of two kinetic controlling equations is obtained with inelastic and elastic collisions. The optimization selection technique of discrete velocity ordinatemore » points and numerical quadrature rules for macroscopic flow variables with dynamic updating evolvement are developed to simulate hypersonic flows, and the gas-kinetic numerical scheme is constructed to capture the time evolution of the discretized velocity distribution functions. The gas-kinetic boundary conditions in thermodynamic non-equilibrium and numerical procedures are studied and implemented by directly acting on the velocity distribution function, and then the unified algorithm of Boltzmann model equation involving non-equilibrium effect is presented for the whole range of flow regimes. The hypersonic flows involving non-equilibrium effect are numerically simulated including the inner flows of shock wave structures in nitrogen with different Mach numbers of 1.5-Ma-25, the planar ramp flow with the whole range of Knudsen numbers of 0.0009-Kn-10 and the three-dimensional re-entering flows around tine double-cone body.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed
2011-05-01
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less
Stone, John E.; Kohlmeyer, Axel
2011-01-01
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007
DOE Office of Scientific and Technical Information (OSTI.GOV)
Du, Qiang
The rational design of materials, the development of accurate and efficient material simulation algorithms, and the determination of the response of materials to environments and loads occurring in practice all require an understanding of mechanics at disparate spatial and temporal scales. The project addresses mathematical and numerical analyses for material problems for which relevant scales range from those usually treated by molecular dynamics all the way up to those most often treated by classical elasticity. The prevalent approach towards developing a multiscale material model couples two or more well known models, e.g., molecular dynamics and classical elasticity, each of whichmore » is useful at a different scale, creating a multiscale multi-model. However, the challenges behind such a coupling are formidable and largely arise because the atomistic and continuum models employ nonlocal and local models of force, respectively. The project focuses on a multiscale analysis of the peridynamics materials model. Peridynamics can be used as a transition between molecular dynamics and classical elasticity so that the difficulties encountered when directly coupling those two models are mitigated. In addition, in some situations, peridynamics can be used all by itself as a material model that accurately and efficiently captures the behavior of materials over a wide range of spatial and temporal scales. Peridynamics is well suited to these purposes because it employs a nonlocal model of force, analogous to that of molecular dynamics; furthermore, at sufficiently large length scales and assuming smooth deformation, peridynamics can be approximated by classical elasticity. The project will extend the emerging mathematical and numerical analysis of peridynamics. One goal is to develop a peridynamics-enabled multiscale multi-model that potentially provides a new and more extensive mathematical basis for coupling classical elasticity and molecular dynamics, thus enabling next generation atomistic-to-continuum multiscale simulations. In addition, a rigorous studyof nite element discretizations of peridynamics will be considered. Using the fact that peridynamics is spatially derivative free, we will also characterize the space of admissible peridynamic solutions and carry out systematic analyses of the models, in particular rigorously showing how peridynamics encompasses fracture and other failure phenomena. Additional aspects of the project include the mathematical and numerical analysis of peridynamics applied to stochastic peridynamics models. In summary, the project will make feasible mathematically consistent multiscale models for the analysis and design of advanced materials.« less
Shang, Shang; Bai, Jing; Song, Xiaolei; Wang, Hongkai; Lau, Jaclyn
2007-01-01
Conjugate gradient method is verified to be efficient for nonlinear optimization problems of large-dimension data. In this paper, a penalized linear and nonlinear combined conjugate gradient method for the reconstruction of fluorescence molecular tomography (FMT) is presented. The algorithm combines the linear conjugate gradient method and the nonlinear conjugate gradient method together based on a restart strategy, in order to take advantage of the two kinds of conjugate gradient methods and compensate for the disadvantages. A quadratic penalty method is adopted to gain a nonnegative constraint and reduce the illposedness of the problem. Simulation studies show that the presented algorithm is accurate, stable, and fast. It has a better performance than the conventional conjugate gradient-based reconstruction algorithms. It offers an effective approach to reconstruct fluorochrome information for FMT.
Function-Based Algorithms for Biological Sequences
ERIC Educational Resources Information Center
Mohanty, Pragyan Sheela P.
2015-01-01
Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…
Free Energy Perturbation Calculations of the Thermodynamics of Protein Side-Chain Mutations.
Steinbrecher, Thomas; Abel, Robert; Clark, Anthony; Friesner, Richard
2017-04-07
Protein side-chain mutation is fundamental both to natural evolutionary processes and to the engineering of protein therapeutics, which constitute an increasing fraction of important medications. Molecular simulation enables the prediction of the effects of mutation on properties such as binding affinity, secondary and tertiary structure, conformational dynamics, and thermal stability. A number of widely differing approaches have been applied to these predictions, including sequence-based algorithms, knowledge-based potential functions, and all-atom molecular mechanics calculations. Free energy perturbation theory, employing all-atom and explicit-solvent molecular dynamics simulations, is a rigorous physics-based approach for calculating thermodynamic effects of, for example, protein side-chain mutations. Over the past several years, we have initiated an investigation of the ability of our most recent free energy perturbation methodology to model the thermodynamics of protein mutation for two specific problems: protein-protein binding affinities and protein thermal stability. We highlight recent advances in the field and outline current and future challenges. Copyright © 2017 Elsevier Ltd. All rights reserved.
GW100: Benchmarking G0W0 for Molecular Systems.
van Setten, Michiel J; Caruso, Fabio; Sharifzadeh, Sahar; Ren, Xinguo; Scheffler, Matthias; Liu, Fang; Lischner, Johannes; Lin, Lin; Deslippe, Jack R; Louie, Steven G; Yang, Chao; Weigend, Florian; Neaton, Jeffrey B; Evers, Ferdinand; Rinke, Patrick
2015-12-08
We present the GW100 set. GW100 is a benchmark set of the ionization potentials and electron affinities of 100 molecules computed with the GW method using three independent GW codes and different GW methodologies. The quasi-particle energies of the highest-occupied molecular orbitals (HOMO) and lowest-unoccupied molecular orbitals (LUMO) are calculated for the GW100 set at the G0W0@PBE level using the software packages TURBOMOLE, FHI-aims, and BerkeleyGW. The use of these three codes allows for a quantitative comparison of the type of basis set (plane wave or local orbital) and handling of unoccupied states, the treatment of core and valence electrons (all electron or pseudopotentials), the treatment of the frequency dependence of the self-energy (full frequency or more approximate plasmon-pole models), and the algorithm for solving the quasi-particle equation. Primary results include reference values for future benchmarks, best practices for convergence within a particular approach, and average error bars for the most common approximations.
Molecular Simulations of The Formation of Gold-Molecule-Gold Junctions
NASA Astrophysics Data System (ADS)
Wang, Huachuan
2013-03-01
We perform classical molecular simulations by combining grand canonical Monte Carlo (GCMC) sampling with molecular dynamics (MD) simulation to explore the dynamic gold nanojunctions in a Alkenedithiol (ADT) solvent. With the aid of a simple driving-spring model, which can reasonably represent the long-range elasticity of the gold electrode, the spring forces are obtained during the dynamic stretching procedure. A specific multi-time-scale double reversible reference system propagator (double-RESPA) algorithm has been designed for the metal-organic complex in MD simulations to identify the detailed metal-molecule bonding geometry at metal-molecule-metal interface. We investigate the variations of bonding sites of ADT molecules on gold nanojunctions at Au (111) surface at a constant chemical potential. Simulation results show that an Au-ADT-Au interface is formed on Au nanojunctions, bond-breaking intersection is at 1-1 bond of the monatomic chain of the cross-section, instead of at the Au-S bond. Breaking force is around 1.5 nN. These are consistent with the experimental measurements.
Classical Molecular Dynamics with Mobile Protons.
Lazaridis, Themis; Hummer, Gerhard
2017-11-27
An important limitation of standard classical molecular dynamics simulations is the inability to make or break chemical bonds. This restricts severely our ability to study processes that involve even the simplest of chemical reactions, the transfer of a proton. Existing approaches for allowing proton transfer in the context of classical mechanics are rather cumbersome and have not achieved widespread use and routine status. Here we reconsider the combination of molecular dynamics with periodic stochastic proton hops. To ensure computational efficiency, we propose a non-Boltzmann acceptance criterion that is heuristically adjusted to maintain the correct or desirable thermodynamic equilibria between different protonation states and proton transfer rates. Parameters are proposed for hydronium, Asp, Glu, and His. The algorithm is implemented in the program CHARMM and tested on proton diffusion in bulk water and carbon nanotubes and on proton conductance in the gramicidin A channel. Using hopping parameters determined from proton diffusion in bulk water, the model reproduces the enhanced proton diffusivity in carbon nanotubes and gives a reasonable estimate of the proton conductance in gramicidin A.
Pasta Nucleosynthesis: Molecular dynamics simulations of nuclear statistical equilibrium
NASA Astrophysics Data System (ADS)
Caplan, Matthew; Horowitz, Charles; da Silva Schneider, Andre; Berry, Donald
2014-09-01
We simulate the decompression of cold dense nuclear matter, near the nuclear saturation density, in order to study the role of nuclear pasta in r-process nucleosynthesis in neutron star mergers. Our simulations are performed using a classical molecular dynamics model with 51 200 and 409 600 nucleons, and are run on GPUs. We expand our simulation region to decompress systems from initial densities of 0.080 fm-3 down to 0.00125 fm-3. We study proton fractions of YP = 0.05, 0.10, 0.20, 0.30, and 0.40 at T = 0.5, 0.75, and 1 MeV. We calculate the composition of the resulting systems using a cluster algorithm. This composition is in good agreement with nuclear statistical equilibrium models for temperatures of 0.75 and 1 MeV. However, for proton fractions greater than YP = 0.2 at a temperature of T = 0.5 MeV, the MD simulations produce non-equilibrium results with large rod-like nuclei. Our MD model is valid at higher densities than simple nuclear statistical equilibrium models and may help determine the initial temperatures and proton fractions of matter ejected in mergers.
Zhu, Dianwen; Li, Changqing
2014-12-01
Fluorescence molecular tomography (FMT) is a promising imaging modality and has been actively studied in the past two decades since it can locate the specific tumor position three-dimensionally in small animals. However, it remains a challenging task to obtain fast, robust and accurate reconstruction of fluorescent probe distribution in small animals due to the large computational burden, the noisy measurement and the ill-posed nature of the inverse problem. In this paper we propose a nonuniform preconditioning method in combination with L (1) regularization and ordered subsets technique (NUMOS) to take care of the different updating needs at different pixels, to enhance sparsity and suppress noise, and to further boost convergence of approximate solutions for fluorescence molecular tomography. Using both simulated data and phantom experiment, we found that the proposed nonuniform updating method outperforms its popular uniform counterpart by obtaining a more localized, less noisy, more accurate image. The computational cost was greatly reduced as well. The ordered subset (OS) technique provided additional 5 times and 3 times speed enhancements for simulation and phantom experiments, respectively, without degrading image qualities. When compared with the popular L (1) algorithms such as iterative soft-thresholding algorithm (ISTA) and Fast iterative soft-thresholding algorithm (FISTA) algorithms, NUMOS also outperforms them by obtaining a better image in much shorter period of time.
The threshold algorithm: Description of the methodology and new developments
NASA Astrophysics Data System (ADS)
Neelamraju, Sridhar; Oligschleger, Christina; Schön, J. Christian
2017-10-01
Understanding the dynamics of complex systems requires the investigation of their energy landscape. In particular, the flow of probability on such landscapes is a central feature in visualizing the time evolution of complex systems. To obtain such flows, and the concomitant stable states of the systems and the generalized barriers among them, the threshold algorithm has been developed. Here, we describe the methodology of this approach starting from the fundamental concepts in complex energy landscapes and present recent new developments, the threshold-minimization algorithm and the molecular dynamics threshold algorithm. For applications of these new algorithms, we draw on landscape studies of three disaccharide molecules: lactose, maltose, and sucrose.
Ngo, Trieu-Du; Tran, Thanh-Dao; Le, Minh-Tri; Thai, Khac-Minh
2016-11-01
The human P-glycoprotein (P-gp) efflux pump is of great interest for medicinal chemists because of its important role in multidrug resistance (MDR). Because of the high polyspecificity as well as the unavailability of high-resolution X-ray crystal structures of this transmembrane protein, ligand-based, and structure-based approaches which were machine learning, homology modeling, and molecular docking were combined for this study. In ligand-based approach, individual two-dimensional quantitative structure-activity relationship models were developed using different machine learning algorithms and subsequently combined into the Ensemble model which showed good performance on both the diverse training set and the validation sets. The applicability domain and the prediction quality of the developed models were also judged using the state-of-the-art methods and tools. In our structure-based approach, the P-gp structure and its binding region were predicted for a docking study to determine possible interactions between the ligands and the receptor. Based on these in silico tools, hit compounds for reversing MDR were discovered from the in-house and DrugBank databases through virtual screening using prediction models and molecular docking in an attempt to restore cancer cell sensitivity to cytotoxic drugs.
Method of identifying hairpin DNA probes by partial fold analysis
Miller, Benjamin L [Penfield, NY; Strohsahl, Christopher M [Saugerties, NY
2009-10-06
Method of identifying molecular beacons in which a secondary structure prediction algorithm is employed to identify oligonucleotide sequences within a target gene having the requisite hairpin structure. Isolated oligonucleotides, molecular beacons prepared from those oligonucleotides, and their use are also disclosed.
Method of identifying hairpin DNA probes by partial fold analysis
Miller, Benjamin L.; Strohsahl, Christopher M.
2008-10-28
Methods of identifying molecular beacons in which a secondary structure prediction algorithm is employed to identify oligonucleotide sequences within a target gene having the requisite hairpin structure. Isolated oligonucleotides, molecular beacons prepared from those oligonucleotides, and their use are also disclosed.
Cabreira, Verónica; Pinto, Carla; Pinheiro, Manuela; Lopes, Paula; Peixoto, Ana; Santos, Catarina; Veiga, Isabel; Rocha, Patrícia; Pinto, Pedro; Henrique, Rui; Teixeira, Manuel R
2017-01-01
Lynch syndrome (LS) accounts for up to 4 % of all colorectal cancers (CRC). Detection of a pathogenic germline mutation in one of the mismatch repair genes is the definitive criterion for LS diagnosis, but it is time-consuming and expensive. Immunohistochemistry is the most sensitive prescreening test and its predictive value is very high for loss of expression of MSH2, MSH6, and (isolated) PMS2, but not for MLH1. We evaluated if LS predictive models have a role to improve the molecular testing algorithm in this specific setting by studying 38 individuals referred for molecular testing and who were subsequently shown to have loss of MLH1 immunoexpression in their tumors. For each proband we calculated a risk score, which represents the probability that the patient with CRC carries a pathogenic MLH1 germline mutation, using the PREMM 1,2,6 and MMRpro predictive models. Of the 38 individuals, 18.4 % had a pathogenic MLH1 germline mutation. MMRpro performed better for the purpose of this study, presenting a AUC of 0.83 (95 % CI 0.67-0.9; P < 0.001) compared with a AUC of 0.68 (95 % CI 0.51-0.82, P = 0.09) for PREMM 1,2,6 . Considering a threshold of 5 %, MMRpro would eliminate unnecessary germline mutation analysis in a significant proportion of cases while keeping very high sensitivity. We conclude that MMRpro is useful to correctly predict who should be screened for a germline MLH1 gene mutation and propose an algorithm to improve the cost-effectiveness of LS diagnosis.
Salient aspects of PBP2A-inhibition; A QSAR Study.
Ogunleye, Adewale J; Eniafe, Gabriel O; Inyang, Olumide K; Adewumi, Benjamin; Omotuyi, Olaposi I
2018-05-15
Backgound: Inhibition of penicillin binding protein 2A (PBP2A) represents a sound drug design strategy in combatting Methicillin resistant Staphylococcus aureus (MRSA). Considering the urgent need for effective antimicrobials in combatting MRSA infections, we have developed a statistically robust ensemble of molecular descriptors (1, 2, & 3-D) from compounds targeting PBP2A in vivo. 37 (training set: 26, test set: 11) PBP2A-inhibitors were submitted for descriptor generation after which an unsupervised, non-exhaustive genetic algorithm (GA) was deployed for fishing out the best descriptor subset. Assignment of descriptors to a regression model was accomplished with the Partial Least Square (PLS) algorithm. At the end, an ensemble of 30 descriptors accurately predicted the ligand bioactivity, IC50 (R = 0.9996, R2 = 0.9992, R2a = 0.9949, SEE =, 0.2297 Q2LOO = 0.9741). Inferentially, we noticed that the overall efficacy of this model greatly depends on atomic polarizability and negative charge (electron) density. Besides the formula derived, the high dimensional model also offers critical insights into salient cheminformatics parameter to note during hit-to-lead PBP2A-antagonist optimization. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
NASA Astrophysics Data System (ADS)
Davis, Anthony B.; Kalashnikova, Olga V.; Diner, David J.; Garay, Michael J.; Lyapustin, Alexei I.; Korkin, Sergey V.; Martonchik, John V.; Natraj, Vijay; Sanghavi, Suniti V.; Xu, Feng; Zhai, Pengwang; Rozanov, Vladimir V.; Kokhanovsky, Alexander A.
2014-05-01
Quantification and characterization of the omnipresent atmospheric aerosol by remote sensing methods is key to answering many challenging questions in atmospheric science, in climate modeling and in air quality monitoring foremost. In recent years, accurate measurement of the state of polarization of photon fluxes at optical sensors in the visible and near-IR spectrum has been hailed as a very promising approach to aerosol remote sensing. Consequently, there has been a flurry of activity in polarized or 'vector' radiative transfer (vRT) model development. This covers the multiple scattering and ground reflection aspects of sensor signal prediction that complement single-particle scattering computation, and lies at the core of all physics-based retrieval algorithms. One can legitimately ask: What level of model fidelity (representativeness of natural scenes) and what computational accuracy should be achieved for this task in view of the practical constraints that apply? These constraints are, at a minimum: (i) the desired accuracy of the retrieved aerosol properties, (ii) observational uncertainties, and (iii) operational efficiency requirements as determined by throughput. We offer a rational and balanced approach to address these questions and illustrate it with a systematic inter-comparison of the performance of a diverse set of 1D vRT models using a small but representative set of test cases. This 'JPL' benchmarking suite of cases is naturally divided into two parts. First the emphasis is on stratified atmospheres with a continuous mixture of molecular and aerosol scattering and absorption over a black surface, with the corresponding pure cases treated for diagnostic purposes. Then the emphasis shifts to the variety of surfaces, both polarizing and not, that can be encountered in real observations and may confuse the aerosol retrieval algorithm if not properly treated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu
Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less
NASA Astrophysics Data System (ADS)
Cubillos, Patricio; Harrington, Joseph; Blecic, Jasmina; Stemm, Madison M.; Lust, Nate B.; Foster, Andrew S.; Rojo, Patricio M.; Loredo, Thomas J.
2014-11-01
Multi-wavelength secondary-eclipse and transit depths probe the thermo-chemical properties of exoplanets. In recent years, several research groups have developed retrieval codes to analyze the existing data and study the prospects of future facilities. However, the scientific community has limited access to these packages. Here we premiere the open-source Bayesian Atmospheric Radiative Transfer (BART) code. We discuss the key aspects of the radiative-transfer algorithm and the statistical package. The radiation code includes line databases for all HITRAN molecules, high-temperature H2O, TiO, and VO, and includes a preprocessor for adding additional line databases without recompiling the radiation code. Collision-induced absorption lines are available for H2-H2 and H2-He. The parameterized thermal and molecular abundance profiles can be modified arbitrarily without recompilation. The generated spectra are integrated over arbitrary bandpasses for comparison to data. BART's statistical package, Multi-core Markov-chain Monte Carlo (MC3), is a general-purpose MCMC module. MC3 implements the Differental-evolution Markov-chain Monte Carlo algorithm (ter Braak 2006, 2009). MC3 converges 20-400 times faster than the usual Metropolis-Hastings MCMC algorithm, and in addition uses the Message Passing Interface (MPI) to parallelize the MCMC chains. We apply the BART retrieval code to the HD 209458b data set to estimate the planet's temperature profile and molecular abundances. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G. JB holds a NASA Earth and Space Science Fellowship.
Cannon, Tyrone D; Thompson, Paul M; van Erp, Theo G M; Huttunen, Matti; Lonnqvist, Jouko; Kaprio, Jaakko; Toga, Arthur W
2006-01-01
There is an urgent need to decipher the complex nature of genotype-phenotype relationships within the multiple dimensions of brain structure and function that are compromised in neuropsychiatric syndromes such as schizophrenia. Doing so requires sophisticated methodologies to represent population variability in neural traits and to probe their heritable and molecular genetic bases. We have recently developed and applied computational algorithms to map the heritability of, as well as genetic linkage and association to, neural features encoded using brain imaging in the context of three-dimensional (3D), populationbased, statistical brain atlases. One set of algorithms builds on our prior work using classical twin study methods to estimate heritability by fitting biometrical models for additive genetic, unique, and common environmental influences. Another set of algorithms performs regression-based (Haseman-Elston) identical-bydescent linkage analysis and genetic association analysis of DNA polymorphisms in relation to neural traits of interest in the same 3D population-based brain atlas format. We demonstrate these approaches using samples of healthy monozygotic (MZ) and dizygotic (DZ) twin pairs, as well as MZ and DZ twin pairs discordant for schizophrenia, but the methods can be generalized to other classes of relatives and to other diseases. The results confirm prior evidence of genetic influences on gray matter density in frontal brain regions. They also provide converging evidence that the chromosome 1q42 region is relevant to schizophrenia by demonstrating linkage and association of markers of the Transelin-Associated-Factor-X and Disrupted-In- Schizophrenia-1 genes with prefrontal cortical gray matter deficits in twins discordant for schizophrenia.