parallel simulator lammps: Topics by Science.gov

Sample records for parallel simulator lammps

Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Software

DTIC Science & Technology

2015-08-01

Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten and James P Larentzos Approved for...Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten Weapons and Materials Research Directorate, ARL James P Larentzos Engility...Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software 5a. CONTRACT NUMBER 5b
Exploring the Ability of a Coarse-grained Potential to Describe the Stress-strain Response of Glassy Polystyrene

DTIC Science & Technology

2012-10-01

using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
Porting LAMMPS to GPUs.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, William Michael; Plimpton, Steven James; Wang, Peng

2010-03-01

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Substructured multibody molecular dynamics.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grest, Gary Stephen; Stevens, Mark Jackson; Plimpton, Steven James

2006-11-01

We have enhanced our parallel molecular dynamics (MD) simulation software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator, lammps.sandia.gov) to include many new features for accelerated simulation including articulated rigid body dynamics via coupling to the Rensselaer Polytechnic Institute code POEMS (Parallelizable Open-source Efficient Multibody Software). We use new features of the LAMMPS software package to investigate rhodopsin photoisomerization, and water model surface tension and capillary waves at the vapor-liquid interface. Finally, we motivate the recipes of MD for practitioners and researchers in numerical analysis and computational mechanics.
Modeling Nanocomposites for Molecular Dynamics (MD) Simulations

DTIC Science & Technology

2015-01-01

Parallel Simulator ( LAMMPS ) is used as the MD simulator [9], the coordinates must be formatted for use in LAMMPSs. VMD has a set of tools (TopoTools...that can be used to generate a LAMMPS -readable format [6]. 3 Figure 4. Ethylene Monomer Produced From Coordinates in PDB and Rendered Using...where, i and j are the atom subscripts. Simulations are performed using LAMMPS simulation software. Periodic boundary conditions are
Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Simulations of the Molecular Crystal alphaRDX

DTIC Science & Technology

2013-08-01

potential for HMX / RDX (3, 9). ...................................................................................8 1 1. Purpose This work...6 dispersion and electrostatic interactions. Constants for the SB potential are given in table 1. 8 Table 1. SB potential for HMX / RDX (3, 9...modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular
Preparation of Entangled Polymer Melts of Various Architecture for Coarse-Grained Models

DTIC Science & Technology

2011-09-01

Simulator ( LAMMPS ). This report presents a theory overview and a manual how to use the method. 15. SUBJECT TERMS Ammunition, coarse-grained model...polymer builder, LAMMPS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF PAGES 26 19a. NAME OF RESPONSIBLE PERSON...scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ). Gel is an in house written C program of coarse- grained polymer builder, and LAMMPS is
Large-scale molecular dynamics simulation of DNA: implementation and validation of the AMBER98 force field in LAMMPS.

PubMed

Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles

2004-07-15

Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
Avoiding Defect Nucleation during Equilibration in Molecular Dynamics Simulations with ReaxFF

DTIC Science & Technology

2015-04-01

respectively. All simulations are performed using the LAMMPS computer code.12 2 Fig. 1 a) Initial and b) final configurations of the molecular centers...Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Comput J Phys. 1995;117:1–19. (Software available at http:// lammps .sandia.gov
An Overview of Mesoscale Modeling Software for Energetic Materials Research

DTIC Science & Technology

2010-03-01

12 2.9 Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ...13 Table 10. LAMMPS summary...extensive reviews, lectures and workshops are available on multiscale modeling of materials applications (76-78). • Multi-phase mixtures of
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations

NASA Astrophysics Data System (ADS)

Nguyen, Trung Dac

2017-03-01

The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Branched Polymers for Enhancing Polymer Gel Strength and Toughness

DTIC Science & Technology

2013-02-01

Molecular Massively Parallel Simulator ( LAMMPS ) program and the stress-strain relations were calculated with varying strain-rates (figure 6). A...Acronyms ARL U.S. Army Research Laboratory D3 hexamethylcyclotrisiloxane FTIR Fourier transform infrared GPC gel permeation chromatography LAMMPS
Lennard-Jones type pair-potential method for coarse-grained lipid bilayer membrane simulations in LAMMPS

NASA Astrophysics Data System (ADS)

Fu, S.-P.; Peng, Z.; Yuan, H.; Kfoury, R.; Young, Y.-N.

2017-01-01

Lipid bilayer membranes have been extensively studied by coarse-grained molecular dynamics simulations. Numerical efficiencies have been reported in the cases of aggressive coarse-graining, where several lipids are coarse-grained into a particle of size 4 ∼ 6 nm so that there is only one particle in the thickness direction. Yuan et al. proposed a pair-potential between these one-particle-thick coarse-grained lipid particles to capture the mechanical properties of a lipid bilayer membrane, such as gel-fluid-gas phase transitions of lipids, diffusion, and bending rigidity Yuan et al. (2010). In this work we implement such an interaction potential in LAMMPS to simulate large-scale lipid systems such as a giant unilamellar vesicle (GUV) and red blood cells (RBCs). We also consider the effect of cytoskeleton on the lipid membrane dynamics as a model for RBC dynamics, and incorporate coarse-grained water molecules to account for hydrodynamic interactions. The interaction between the coarse-grained water molecules (explicit solvent molecules) is modeled as a Lennard-Jones (L-J) potential. To demonstrate that the proposed methods do capture the observed dynamics of vesicles and RBCs, we focus on two sets of LAMMPS simulations: 1. Vesicle shape transitions with enclosed volume; 2. RBC shape transitions with different enclosed volume. Finally utilizing the parallel computing capability in LAMMPS, we provide some timing results for parallel coarse-grained simulations to illustrate that it is possible to use LAMMPS to simulate large-scale realistic complex biological membranes for more than 1 ms.
SediFoam: A general-purpose, open-source CFD-DEM solver for particle-laden flow with emphasis on sediment transport

NASA Astrophysics Data System (ADS)

Sun, Rui; Xiao, Heng

2016-04-01

With the growth of available computational resource, CFD-DEM (computational fluid dynamics-discrete element method) becomes an increasingly promising and feasible approach for the study of sediment transport. Several existing CFD-DEM solvers are applied in chemical engineering and mining industry. However, a robust CFD-DEM solver for the simulation of sediment transport is still desirable. In this work, the development of a three-dimensional, massively parallel, and open-source CFD-DEM solver SediFoam is detailed. This solver is built based on open-source solvers OpenFOAM and LAMMPS. OpenFOAM is a CFD toolbox that can perform three-dimensional fluid flow simulations on unstructured meshes; LAMMPS is a massively parallel DEM solver for molecular dynamics. Several validation tests of SediFoam are performed using cases of a wide range of complexities. The results obtained in the present simulations are consistent with those in the literature, which demonstrates the capability of SediFoam for sediment transport applications. In addition to the validation test, the parallel efficiency of SediFoam is studied to test the performance of the code for large-scale and complex simulations. The parallel efficiency tests show that the scalability of SediFoam is satisfactory in the simulations using up to O(107) particles.
Coarse-grained simulation of DNA using LAMMPS : An implementation of the oxDNA model and its applications.

PubMed

Henrich, Oliver; Gutiérrez Fosado, Yair Augusto; Curk, Tine; Ouldridge, Thomas E

2018-05-10

During the last decade coarse-grained nucleotide models have emerged that allow us to study DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.
Coding coarse grained polymer model for LAMMPS and its application to polymer crystallization

NASA Astrophysics Data System (ADS)

Luo, Chuanfu; Sommer, Jens-Uwe

2009-08-01

We present a patch code for LAMMPS to implement a coarse grained (CG) model of poly(vinyl alcohol) (PVA). LAMMPS is a powerful molecular dynamics (MD) simulator developed at Sandia National Laboratories. Our patch code implements tabulated angular potential and Lennard-Jones-9-6 (LJ96) style interaction for PVA. Benefited from the excellent parallel efficiency of LAMMPS, our patch code is suitable for large-scale simulations. This CG-PVA code is used to study polymer crystallization, which is a long-standing unsolved problem in polymer physics. By using parallel computing, cooling and heating processes for long chains are simulated. The results show that chain-folded structures resembling the lamellae of polymer crystals are formed during the cooling process. The evolution of the static structure factor during the crystallization transition indicates that long-range density order appears before local crystalline packing. This is consistent with some experimental observations by small/wide angle X-ray scattering (SAXS/WAXS). During the heating process, it is found that the crystalline regions are still growing until they are fully melted, which can be confirmed by the evolution both of the static structure factor and average stem length formed by the chains. This two-stage behavior indicates that melting of polymer crystals is far from thermodynamic equilibrium. Our results concur with various experiments. It is the first time that such growth/reorganization behavior is clearly observed by MD simulations. Our code can be easily used to model other type of polymers by providing a file containing the tabulated angle potential data and a set of appropriate parameters. Program summaryProgram title: lammps-cgpva Catalogue identifier: AEDE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDE_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU's GPL No. of lines in distributed program, including test data, etc.: 940 798 No. of bytes in distributed program, including test data, etc.: 12 536 245 Distribution format: tar.gz Programming language: C++/MPI Computer: Tested on Intel-x86 and AMD64 architectures. Should run on any architecture providing a C++ compiler Operating system: Tested under Linux. Any other OS with C++ compiler and MPI library should suffice Has the code been vectorized or parallelized?: Yes RAM: Depends on system size and how many CPUs are used Classification: 7.7 External routines: LAMMPS ( http://lammps.sandia.gov/), FFTW ( http://www.fftw.org/) Nature of problem: Implementing special tabular angle potentials and Lennard-Jones-9-6 style interactions of a coarse grained polymer model for LAMMPS code. Solution method: Cubic spline interpolation of input tabulated angle potential data. Restrictions: The code is based on a former version of LAMMPS. Unusual features.: Any special angular potential can be used if it can be tabulated. Running time: Seconds to weeks, depending on system size, speed of CPU and how many CPUs are used. The test run provided with the package takes about 5 minutes on 4 AMD's opteron (2.6 GHz) CPUs. References:D. Reith, H. Meyer, F. Müller-Plathe, Macromolecules 34 (2001) 2335-2345. H. Meyer, F. Müller-Plathe, J. Chem. Phys. 115 (2001) 7807. H. Meyer, F. Müller-Plathe, Macromolecules 35 (2002) 1241-1252.
LAMMPS strong scaling performance optimization on Blue Gene/Q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coffman, Paul; Jiang, Wei; Romero, Nichols A.

2014-11-12

LAMMPS "Large-scale Atomic/Molecular Massively Parallel Simulator" is an open-source molecular dynamics package from Sandia National Laboratories. Significant performance improvements in strong-scaling and time-to-solution for this application on IBM's Blue Gene/Q have been achieved through computational optimizations of the OpenMP versions of the short-range Lennard-Jones term of the CHARMM force field and the long-range Coulombic interaction implemented with the PPPM (particle-particle-particle mesh) algorithm, enhanced by runtime parameter settings controlling thread utilization. Additionally, MPI communication performance improvements were made to the PPPM calculation by re-engineering the parallel 3D FFT to use MPICH collectives instead of point-to-point. Performance testing was done using anmore » 8.4-million atom simulation scaling up to 16 racks on the Mira system at Argonne Leadership Computing Facility (ALCF). Speedups resulting from this effort were in some cases over 2x.« less
An analytical benchmark and a Mathematica program for MD codes: Testing LAMMPS on the 2nd generation Brenner potential

NASA Astrophysics Data System (ADS)

Favata, Antonino; Micheletti, Andrea; Ryu, Seunghwa; Pugno, Nicola M.

2016-10-01

An analytical benchmark and a simple consistent Mathematica program are proposed for graphene and carbon nanotubes, that may serve to test any molecular dynamics code implemented with REBO potentials. By exploiting the benchmark, we checked results produced by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) when adopting the second generation Brenner potential, we made evident that this code in its current implementation produces results which are offset from those of the benchmark by a significant amount, and provide evidence of the reason.
LAMMPS integrated materials engine (LIME) for efficient automation of particle-based simulations: application to equation of state generation

NASA Astrophysics Data System (ADS)

Barnes, Brian C.; Leiter, Kenneth W.; Becker, Richard; Knap, Jaroslaw; Brennan, John K.

2017-07-01

We describe the development, accuracy, and efficiency of an automation package for molecular simulation, the large-scale atomic/molecular massively parallel simulator (LAMMPS) integrated materials engine (LIME). Heuristics and algorithms employed for equation of state (EOS) calculation using a particle-based model of a molecular crystal, hexahydro-1,3,5-trinitro-s-triazine (RDX), are described in detail. The simulation method for the particle-based model is energy-conserving dissipative particle dynamics, but the techniques used in LIME are generally applicable to molecular dynamics simulations with a variety of particle-based models. The newly created tool set is tested through use of its EOS data in plate impact and Taylor anvil impact continuum simulations of solid RDX. The coarse-grain model results from LIME provide an approach to bridge the scales from atomistic simulations to continuum simulations.
Exploiting Data Similarity to Reduce Memory Footprints

DTIC Science & Technology

2011-01-01

leslie3d Fortran Computational Fluid Dynamics (CFD) application 122. tachyon C Parallel Ray Tracing application 128.GAPgeofem C and Fortran Simulates...benefits most from SBLLmalloc; LAMMPS, which shows moderate similarity from primarily zero pages; and 122. tachyon , a parallel ray- tracing application...similarity across MPI tasks. They primarily are zero- pages although a small fraction (≈10%) are non-zero pages. 122. tachyon is an image rendering

Computational modeling of magnetic particle margination within blood flow through LAMMPS

NASA Astrophysics Data System (ADS)

Ye, Huilin; Shen, Zhiqiang; Li, Ying

2017-11-01

We develop a multiscale and multiphysics computational method to investigate the transport of magnetic particles as drug carriers in blood flow under influence of hydrodynamic interaction and external magnetic field. A hybrid coupling method is proposed to handle red blood cell (RBC)-fluid interface (CFI) and magnetic particle-fluid interface (PFI), respectively. Immersed boundary method (IBM)-based velocity coupling is used to account for CFI, which is validated by tank-treading and tumbling behaviors of a single RBC in simple shear flow. While PFI is captured by IBM-based force coupling, which is verified through movement of a single magnetic particle under non-uniform external magnetic field and breakup of a magnetic chain in rotating magnetic field. These two components are seamlessly integrated within the LAMMPS framework, which is a highly parallelized molecular dynamics solver. In addition, we also implement a parallelized lattice Boltzmann simulator within LAMMPS to handle the fluid flow simulation. Based on the proposed method, we explore the margination behaviors of magnetic particles and magnetic chains within blood flow. We find that the external magnetic field can be used to guide the motion of these magnetic materials and promote their margination to the vascular wall region. Moreover, the scaling performance and speedup test further confirm the high efficiency and robustness of proposed computational method. Therefore, it provides an efficient way to simulate the transport of nanoparticle-based drug carriers within blood flow in a large scale. The simulation results can be applied in the design of efficient drug delivery vehicles that optimally accumulate within diseased tissue, thus providing better imaging sensitivity, therapeutic efficacy and lower toxicity.
Coupling LAMMPS with Lattice Boltzmann fluid solver: theory, implementation, and applications

NASA Astrophysics Data System (ADS)

Tan, Jifu; Sinno, Talid; Diamond, Scott

2016-11-01

Studying of fluid flow coupled with solid has many applications in biological and engineering problems, e.g., blood cell transport, particulate flow, drug delivery. We present a partitioned approach to solve the coupled Multiphysics problem. The fluid motion is solved by the Lattice Boltzmann method, while the solid displacement and deformation is simulated by Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). The coupling is achieved through the immersed boundary method so that the expensive remeshing step is eliminated. The code can model both rigid and deformable solids. The code also shows very good scaling results. It was validated with classic problems such as migration of rigid particles, ellipsoid particle's orbit in shear flow. Examples of the applications in blood flow, drug delivery, platelet adhesion and rupture are also given in the paper. NIH.
Thermalized Drude Oscillators with the LAMMPS Molecular Dynamics Simulator.

PubMed

Dequidt, Alain; Devémy, Julien; Pádua, Agílio A H

2016-01-25

LAMMPS is a very customizable molecular dynamics simulation software, which can be used to simulate a large diversity of systems. We introduce a new package for simulation of polarizable systems with LAMMPS using thermalized Drude oscillators. The implemented functionalities are described and are illustrated by examples. The implementation was validated by comparing simulation results with published data and using a reference software. Computational performance is also analyzed.
LAMMPS Project Report for the Trinity KNL Open Science Period.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moore, Stan Gerald; Thompson, Aidan P.; Wood, Mitchell

LAMMPS is a classical molecular dynamics code (lammps.sandia.gov) used to model materials science problems at Sandia National Laboratories and around the world. LAMMPS was one of three Sandia codes selected to participate in the Trinity KNL (TR2) Open Science period. During this period, three different problems of interest were investigated using LAMMPS. The first was benchmarking KNL performance using different force field models. The second was simulating void collapse in shocked HNS energetic material using an all-atom model. The third was simulating shock propagation through poly-crystalline RDX energetic material using a coarse-grain model, the results of which were used inmore » an ACM Gordon Bell Prize submission. This report describes the results of these simulations, lessons learned, and some hardware issues found on Trinity KNL as part of this work.« less
Parallel algorithm for multiscale atomistic/continuum simulations using LAMMPS

NASA Astrophysics Data System (ADS)

Pavia, F.; Curtin, W. A.

2015-07-01

Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. We use the robust Coupled Atomistic/Discrete-Dislocation (CADD) displacement-coupling method, but without the transfer of dislocations between atoms and continuum. The main purpose of the work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending CADD-type coupling to 3D. Validation of the implementation includes the demonstration of (i) stability in finite-temperature dynamics using Langevin dynamics, (ii) elimination of wave reflections due to large dynamic events occurring in the MD region and (iii) the absence of spurious forces acting on dislocations due to the MD/FE coupling, for dislocations further than 10 Å from the coupling boundary. A first non-trivial example application of dislocation glide and bowing around obstacles is shown, for dislocation lengths of ∼50 nm using fewer than 1 000 000 atoms but reproducing results of extremely large atomistic simulations at much lower computational cost.
LAMMPS framework for dynamic bonding and an application modeling DNA

NASA Astrophysics Data System (ADS)

Svaneborg, Carsten

2012-08-01

We have extended the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) to support directional bonds and dynamic bonding. The framework supports stochastic formation of new bonds, breakage of existing bonds, and conversion between bond types. Bond formation can be controlled to limit the maximal functionality of a bead with respect to various bond types. Concomitant with the bond dynamics, angular and dihedral interactions are dynamically introduced between newly connected triplets and quartets of beads, where the interaction type is determined from the local pattern of bead and bond types. When breaking bonds, all angular and dihedral interactions involving broken bonds are removed. The framework allows chemical reactions to be modeled, and use it to simulate a simplistic, coarse-grained DNA model. The resulting DNA dynamics illustrates the power of the present framework. Catalogue identifier: AEME_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEME_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public Licence No. of lines in distributed program, including test data, etc.: 2 243 491 No. of bytes in distributed program, including test data, etc.: 771 Distribution format: tar.gz Programming language: C++ Computer: Single and multiple core servers Operating system: Linux/Unix/Windows Has the code been vectorized or parallelized?: Yes. The code has been parallelized by the use of MPI directives. RAM: 1 Gb Classification: 16.11, 16.12 Nature of problem: Simulating coarse-grain models capable of chemistry e.g. DNA hybridization dynamics. Solution method: Extending LAMMPS to handle dynamic bonding and directional bonds. Unusual features: Allows bonds to be created and broken while angular and dihedral interactions are kept consistent. Additional comments: The distribution file for this program is approximately 36 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead an html file giving details of how the program can be obtained is sent. Running time: Hours to days. The examples provided in the distribution take just seconds to run.
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure

NASA Astrophysics Data System (ADS)

Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen

2017-02-01

Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
CCC7-119 Reactive Molecular Dynamics Simulations of Hot Spot Growth in Shocked Energetic Materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, Aidan P.

2015-03-01

The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.
A parallel algorithm for step- and chain-growth polymerization in molecular dynamics.

PubMed

de Buyl, Pierre; Nies, Erik

2015-04-07

Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.
A parallel algorithm for step- and chain-growth polymerization in molecular dynamics

NASA Astrophysics Data System (ADS)

de Buyl, Pierre; Nies, Erik

2015-04-01

Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.
Atomistic Simulations of Surface Cross-Slip Nucleation in Face-Centered Cubic Nickel and Copper (Postprint)

DTIC Science & Technology

2013-02-15

molecular dynamics code, LAMMPS [9], developed at Sandia National Laboratory. The simulation cell is a rectangular parallelepiped, with the z-axis...with assigned energies within LAMMPs of greater than 4.42 eV (Ni) or 3.52 eV (Cu) (the energy of atoms in the stacking fault region), the partial...molecular dynamics code LAMMPS , which was developed at Sandia National Laboratory by Dr. Steve Plimpton and co-workers. This work was supported by the
DOE Office of Scientific and Technical Information (OSTI.GOV)

Rizzi, Silvio; Hereld, Mark; Insley, Joseph

In this work we perform in-situ visualization of molecular dynamics simulations, which can help scientists to visualize simulation output on-the-fly, without incurring storage overheads. We present a case study to couple LAMMPS, the large-scale molecular dynamics simulation code with vl3, our parallel framework for large-scale visualization and analysis. Our motivation is to identify effective approaches for covisualization and exploration of large-scale atomistic simulations at interactive frame rates.We propose a system of coupled libraries and describe its architecture, with an implementation that runs on GPU-based clusters. We present the results of strong and weak scalability experiments, as well as future researchmore » avenues based on our results.« less
Parallel multiscale simulations of a brain aneurysm

PubMed Central

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2012-01-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066
Parallel multiscale simulations of a brain aneurysm.

PubMed

Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Parallel multiscale simulations of a brain aneurysm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em, E-mail: george_karniadakis@brown.edu

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm.more » The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.« less
Update 0.2 to "pysimm: A python package for simulation of molecular systems"

NASA Astrophysics Data System (ADS)

Demidov, Alexander G.; Fortunato, Michael E.; Colina, Coray M.

2018-01-01

An update to the pysimm Python molecular simulation API is presented. A major part of the update is the implementation of a new interface with CASSANDRA - a modern, versatile Monte Carlo molecular simulation program. Several significant improvements in the LAMMPS communication module that allow better and more versatile simulation setup are reported as well. An example of an application implementing iterative CASSANDRA-LAMMPS interaction is illustrated.
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray

2015-12-01

Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Implementation of EAM and FS potentials in HOOMD-blue

NASA Astrophysics Data System (ADS)

Yang, Lin; Zhang, Feng; Travesset, Alex; Wang, Caizhuang; Ho, Kaiming

HOOMD-blue is a general-purpose software to perform classical molecular dynamics simulations entirely on GPUs. We provide full support for EAM and FS type potentials in HOOMD-blue, and report accuracy and efficiency benchmarks, including comparisons with the LAMMPS GPU package. Two problems were selected to test the accuracy: the determination of the glass transition temperature of Cu64.5Zr35.5 alloy using an FS potential and the calculation of pair distribution functions of Ni3Al using an EAM potential. In both cases, the results using HOOMD-blue are indistinguishable from those obtained by the GPU package in LAMMPS within statistical uncertainties. As tests for time efficiency, we benchmark time-steps per second using LAMMPS GPU and HOOMD-blue on one NVIDIA Tesla GPU. Compared to our typical LAMMPS simulations on one CPU cluster node which has 16 CPUs, LAMMPS GPU can be 3-3.5 times faster, and HOOMD-blue can be 4-5.5 times faster. We acknowledge the support from Laboratory Directed Research and Development (LDRD) of Ames Laboratory.
Voxel based parallel post processor for void nucleation and growth analysis of atomistic simulations of material fracture.

PubMed

Hemani, H; Warrier, M; Sakthivel, N; Chaturvedi, S

2014-05-01

Molecular dynamics (MD) simulations are used in the study of void nucleation and growth in crystals that are subjected to tensile deformation. These simulations are run for typically several hundred thousand time steps depending on the problem. We output the atom positions at a required frequency for post processing to determine the void nucleation, growth and coalescence due to tensile deformation. The simulation volume is broken up into voxels of size equal to the unit cell size of crystal. In this paper, we present the algorithm to identify the empty unit cells (voids), their connections (void size) and dynamic changes (growth and coalescence of voids) for MD simulations of large atomic systems (multi-million atoms). We discuss the parallel algorithms that were implemented and discuss their relative applicability in terms of their speedup and scalability. We also present the results on scalability of our algorithm when it is incorporated into MD software LAMMPS. Copyright © 2014 Elsevier Inc. All rights reserved.
Molecular Dynamics Simulations of an Idealized Shock Tube: N2 in Ar Bath Driven by He

NASA Astrophysics Data System (ADS)

Piskulich, Ezekiel Ashe; Sewell, Thomas D.; Thompson, Donald L.

2015-06-01

The dynamics of 10% N2 in Ar initially at 298 K in an idealized shock tube driven by He was studied using molecular dynamics. The simulations were performed using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) code. Nitrogen was modeled as a Morse oscillator and non-covalent interactions were approximated by the Buckingham exponential-6 pair potential. The initial pressures in the He driver gas and the driven N2/Ar gas were 1000 atm and 20 atm, respectively. Microcanonical trajectories were followed for 2 ns following release of the driver gas. Results for excitation and subsequent relaxation of the N2, as well as properties of the gas during the simulations, will be reported.

Spontaneous Athermal Cross-Slip Nucleation at Screw Dislocation Intersections in FCC Metals and L1(2) Intermetallics Investigated via Atomistic Simulations

DTIC Science & Technology

2013-01-01

LAMMPS [12], developed at Sandia National Labora- tory. The simulation cell is a rectangular parallelepiped having the x-axis oriented along the [1 1 0...cross-slip during deformation. Acknowledgements The authors acknowledge use of the 3d molecular dynamics code, LAMMPS , which was developed at Sandia
Hierarchical Petascale Simulation Framework For Stress Corrosion Cracking

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grama, Ananth

2013-12-18

A number of major accomplishments resulted from the project. These include: • Data Structures, Algorithms, and Numerical Methods for Reactive Molecular Dynamics. We have developed a range of novel data structures, algorithms, and solvers (amortized ILU, Spike) for use with ReaxFF and charge equilibration. • Parallel Formulations of ReactiveMD (Purdue ReactiveMolecular Dynamics Package, PuReMD, PuReMD-GPU, and PG-PuReMD) for Messaging, GPU, and GPU Cluster Platforms. We have developed efficient serial, parallel (MPI), GPU (Cuda), and GPU Cluster (MPI/Cuda) implementations. Our implementations have been demonstrated to be significantly better than the state of the art, both in terms of performance and scalability.more » • Comprehensive Validation in the Context of Diverse Applications. We have demonstrated the use of our software in diverse systems, including silica-water, silicon-germanium nanorods, and as part of other projects, extended it to applications ranging from explosives (RDX) to lipid bilayers (biomembranes under oxidative stress). • Open Source Software Packages for Reactive Molecular Dynamics. All versions of our soft- ware have been released over the public domain. There are over 100 major research groups worldwide using our software. • Implementation into the Department of Energy LAMMPS Software Package. We have also integrated our software into the Department of Energy LAMMPS software package.« less
Automated Algorithms for Quantum-Level Accuracy in Atomistic Simulations: LDRD Final Report.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, Aidan Patrick; Schultz, Peter Andrew; Crozier, Paul

2014-09-01

This report summarizes the result of LDRD project 12-0395, titled "Automated Algorithms for Quantum-level Accuracy in Atomistic Simulations." During the course of this LDRD, we have developed an interatomic potential for solids and liquids called Spectral Neighbor Analysis Poten- tial (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projectedmore » on to a basis of hyperspherical harmonics in four dimensions. The SNAP coef- ficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. Global optimization methods in the DAKOTA software package are used to seek out good choices of hyperparameters that define the overall structure of the SNAP potential. FitSnap.py, a Python-based software pack- age interfacing to both LAMMPS and DAKOTA is used to formulate the linear regression problem, solve it, and analyze the accuracy of the resultant SNAP potential. We describe a SNAP potential for tantalum that accurately reproduces a variety of solid and liquid properties. Most significantly, in contrast to existing tantalum potentials, SNAP correctly predicts the Peierls barrier for screw dislocation motion. We also present results from SNAP potentials generated for indium phosphide (InP) and silica (SiO 2 ). We describe efficient algorithms for calculating SNAP forces and energies in molecular dynamics simulations using massively parallel computers and advanced processor ar- chitectures. Finally, we briefly describe the MSM method for efficient calculation of electrostatic interactions on massively parallel computers.« less
Adaptively restrained molecular dynamics in LAMMPS

NASA Astrophysics Data System (ADS)

Kant Singh, Krishna; Redon, Stephane

2017-07-01

Adaptively restrained molecular dynamics (ARMD) is a recently introduced particles simulation method that switches positional degrees of freedom on and off during simulation in order to speed up calculations. In the NVE ensemble, ARMD allows users to trade between precision and speed while, in the NVT ensemble, it makes it possible to compute statistical averages faster. Despite the conceptual simplicity of the approach, however, integrating it in existing molecular dynamics packages is non-trivial, in particular since implemented potentials should a priori be rewritten to take advantage of frozen particles and achieve a speed-up. In this paper, we present novel algorithms for integrating ARMD in LAMMPS, a popular multi-purpose molecular simulation package. In particular, we demonstrate how to enable ARMD in LAMMPS without having to re-implement all available force fields. The proposed algorithms are assessed on four different benchmarks, and show how they allow us to speed up simulations up to one order of magnitude.
MEAM interatomic force calculation subroutine for LAMMPS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stukowski, A.

2010-10-25

Interatomic force and energy calculation subroutine tobe used with the molecular dynamics simulation code LAMMPS (Ref a.). The code evaluates the total energy and atomic forces (energy gradient) according to cubic spine-based variant (Ref b.) of the Modified Embedded Atom Method (MEAM).
MaMiCo: Software design for parallel molecular-continuum flow simulations

NASA Astrophysics Data System (ADS)

Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim

2016-03-01

The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum simulations, retaining sequential and parallel performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow simulations which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum simulations. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid simulations is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.
Molecular dynamics simulations of collision-induced absorption: Implementation in LAMMPS

NASA Astrophysics Data System (ADS)

Fakhardji, W.; Gustafsson, M.

2017-02-01

We pursue simulations of collision-induced absorption in a mixture of argon and xenon gas at room temperature by means of classical molecular dynamics. The established theoretical approach (Hartmann et al. 2011 J. Chem. Phys. 134 094316) is implemented with the molecular dynamics package LAMMPS. The bound state features in the absorption spectrum are well reproduced with the molecular dynamics simulation in comparison with a laboratory measurement. The magnitude of the computed absorption, however, is underestimated in a large part of the spectrum. We suggest some aspects of the simulation that could be improved.
Bond order potential module for LAMMPS

DOE Office of Scientific and Technical Information (OSTI.GOV)

2012-09-11

pair_bop is a module for performing energy calculations using the Bond Order Potential (BOP) for use in the parallel molecular dynamics code LAMMPS. The bop pair style computes BOP based upon quantum mechanical incorporating both sigma and pi bondings. By analytically deriving the BOP pair bop from quantum mechanical theory its transferability to different phases can approach that of quantum mechanical methods. This potential is extremely effective at modeling 111-V and II-VI compounds such as GaAs and CdTe. This potential is similar to the original BOP developed by Pettifor and later updated by Murdock et al. and Ward et al.
Models for twistable elastic polymers in Brownian dynamics, and their implementation for LAMMPS.

PubMed

Brackley, C A; Morozov, A N; Marenduzzo, D

2014-04-07

An elastic rod model for semi-flexible polymers is presented. Theory for a continuum rod is reviewed, and it is shown that a popular discretised model used in numerical simulations gives the correct continuum limit. Correlation functions relating to both bending and twisting of the rod are derived for both continuous and discrete cases, and results are compared with numerical simulations. Finally, two possible implementations of the discretised model in the multi-purpose molecular dynamics software package LAMMPS are described.
An Elastic Plastic Contact Model with Strain Hardening for the LAMMPS Granular Package

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuhr, Bryan; Brake, Matthew Robert; Lechman, Jeremy B.

2015-03-01

The following details the implementation of an analytical elastic plastic contact model with strain hardening for normal im pacts into the LAMMPS granular package. The model assumes that, upon impact, the co llision has a period of elastic loading followed by a period of mixed elastic plas tic loading, with contributions to each mechanism estimated by a hyperbolic seca nt weight function. This function is implemented in the LAMMPS source code as the pair style gran/ep/history. Preliminary tests, simulating the pouring of pure nickel spheres, showed the elastic/plastic model took 1.66x as long as similar runs using gran/hertz/history.
Coupling molecular dynamics with lattice Boltzmann method based on the immersed boundary method

NASA Astrophysics Data System (ADS)

Tan, Jifu; Sinno, Talid; Diamond, Scott

2017-11-01

The study of viscous fluid flow coupled with rigid or deformable solids has many applications in biological and engineering problems, e.g., blood cell transport, drug delivery, and particulate flow. We developed a partitioned approach to solve this coupled Multiphysics problem. The fluid motion was solved by Palabos (Parallel Lattice Boltzmann Solver), while the solid displacement and deformation was simulated by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator). The coupling was achieved through the immersed boundary method (IBM). The code modeled both rigid and deformable solids exposed to flow. The code was validated with the classic problem of rigid ellipsoid particle orbit in shear flow, blood cell stretching test and effective blood viscosity, and demonstrated essentially linear scaling over 16 cores. An example of the fluid-solid coupling was given for flexible filaments (drug carriers) transport in a flowing blood cell suspensions, highlighting the advantages and capabilities of the developed code. NIH 1U01HL131053-01A1.
Sensitivity Analysis and Uncertainty Quantification for the LAMMPS Molecular Dynamics Simulation Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Picard, Richard Roy; Bhat, Kabekode Ghanasham

2017-07-18

We examine sensitivity analysis and uncertainty quantification for molecular dynamics simulation. Extreme (large or small) output values for the LAMMPS code often occur at the boundaries of input regions, and uncertainties in those boundary values are overlooked by common SA methods. Similarly, input values for which code outputs are consistent with calibration data can also occur near boundaries. Upon applying approaches in the literature for imprecise probabilities (IPs), much more realistic results are obtained than for the complacent application of standard SA and code calibration.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Antonelli, Perry Edward

A low-level model-to-model interface is presented that will enable independent models to be linked into an integrated system of models. The interface is based on a standard set of functions that contain appropriate export and import schemas that enable models to be linked with no changes to the models themselves. These ideas are presented in the context of a specific multiscale material problem that couples atomistic-based molecular dynamics calculations to continuum calculations of fluid ow. These simulations will be used to examine the influence of interactions of the fluid with an adjacent solid on the fluid ow. The interface willmore » also be examined by adding it to an already existing modeling code, Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) and comparing it with our own molecular dynamics code.« less
Optimizing legacy molecular dynamics software with directive-based offload

NASA Astrophysics Data System (ADS)

Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.

2015-10-01

Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.
Brownian dynamics simulations of lipid bilayer membrane with hydrodynamic interactions in LAMMPS

NASA Astrophysics Data System (ADS)

Fu, Szu-Pei; Young, Yuan-Nan; Peng, Zhangli; Yuan, Hongyan

2016-11-01

Lipid bilayer membranes have been extensively studied by coarse-grained molecular dynamics simulations. Numerical efficiencies have been reported in the cases of aggressive coarse-graining, where several lipids are coarse-grained into a particle of size 4 6 nm so that there is only one particle in the thickness direction. Yuan et al. proposed a pair-potential between these one-particle-thick coarse-grained lipid particles to capture the mechanical properties of a lipid bilayer membrane (such as gel-fluid-gas phase transitions of lipids, diffusion, and bending rigidity). In this work we implement such interaction potential in LAMMPS to simulate large-scale lipid systems such as vesicles and red blood cells (RBCs). We also consider the effect of cytoskeleton on the lipid membrane dynamics as a model for red blood cell (RBC) dynamics, and incorporate coarse-grained water molecules to account for hydrodynamic interactions. The interaction between the coarse-grained water molecules (explicit solvent molecules) is modeled as a Lennard-Jones (L-J) potential. We focus on two sets of LAMMPS simulations: 1. Vesicle shape transitions with varying enclosed volume; 2. RBC shape transitions with different enclosed volume. This work is funded by NSF under Grant DMS-1222550.
Brownian dynamics simulations of lipid bilayer membrane with hydrodynamic interactions in LAMMPS

NASA Astrophysics Data System (ADS)

Fu, Szu-Pei; Young, Yuan-Nan; Peng, Zhangli; Yuan, Hongyan

Lipid bilayer membranes have been extensively studied by coarse-grained molecular dynamics simulations. Numerical efficiency has been reported in the cases of aggressive coarse-graining, where several lipids are coarse-grained into a particle of size 4 6 nm so that there is only one particle in the thickness direction. Yuan et al. proposed a pair-potential between these one-particle-thick coarse-grained lipid particles to capture the mechanical properties of a lipid bilayer membrane (such as gel-fluid-gas phase transitions of lipids, diffusion, and bending rigidity). In this work we implement such interaction potential in LAMMPS to simulate large-scale lipid systems such as vesicles and red blood cells (RBCs). We also consider the effect of cytoskeleton on the lipid membrane dynamics as a model for red blood cell (RBC) dynamics, and incorporate coarse-grained water molecules to account for hydrodynamic interactions. The interaction between the coarse-grained water molecules (explicit solvent molecules) is modeled as a Lennard-Jones (L-J) potential. We focus on two sets of LAMMPS simulations: 1. Vesicle shape transitions with varying enclosed volume; 2. RBC shape transitions with different enclosed volume.
SpecTAD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zamora, Richard; Voter, Arthur; Uberuaga, Bla

2017-10-23

The SpecTAD software represents a refactoring of the Temperature Accelerated Dynamics (TAD2) code authored by Arthur F. Voter and Blas P. Uberuaga (LA-CC-02-05). SpecTAD extends the capabilities of TAD2, by providing algorithms for both temporal and spatial parallelism. The novel algorithms for temporal parallelism include both speculation and replication based techniques. SpecTAD also offers the optional capability to dynamically link to the open-source LAMMPS package.
MaMiCo: Transient multi-instance molecular-continuum flow simulation on supercomputers

NASA Astrophysics Data System (ADS)

Neumann, Philipp; Bian, Xin

2017-11-01

We present extensions of the macro-micro-coupling tool MaMiCo, which was designed to couple continuum fluid dynamics solvers with discrete particle dynamics. To enable local extraction of smooth flow field quantities especially on rather short time scales, sampling over an ensemble of molecular dynamics simulations is introduced. We provide details on these extensions including the transient coupling algorithm, open boundary forcing, and multi-instance sampling. Furthermore, we validate the coupling in Couette flow using different particle simulation software packages and particle models, i.e. molecular dynamics and dissipative particle dynamics. Finally, we demonstrate the parallel scalability of the molecular-continuum simulations by using up to 65 536 compute cores of the supercomputer Shaheen II located at KAUST. Program Files doi:http://dx.doi.org/10.17632/w7rgdrhb85.1 Licensing provisions: BSD 3-clause Programming language: C, C++ External routines/libraries: For compiling: SCons, MPI (optional) Subprograms used: ESPResSo, LAMMPS, ls1 mardyn, waLBerla For installation procedures of the MaMiCo interfaces, see the README files in the respective code directories located in coupling/interface/impl. Journal reference of previous version: P. Neumann, H. Flohr, R. Arora, P. Jarmatz, N. Tchipev, H.-J. Bungartz. MaMiCo: Software design for parallel molecular-continuum flow simulations, Computer Physics Communications 200: 324-335, 2016 Does the new version supersede the previous version?: Yes. The functionality of the previous version is completely retained in the new version. Nature of problem: Coupled molecular-continuum simulation for multi-resolution fluid dynamics: parts of the domain are resolved by molecular dynamics or another particle-based solver whereas large parts are covered by a mesh-based CFD solver, e.g. a lattice Boltzmann automaton. Solution method: We couple existing MD and CFD solvers via MaMiCo (macro-micro coupling tool). Data exchange and coupling algorithmics are abstracted and incorporated in MaMiCo. Once an algorithm is set up in MaMiCo, it can be used and extended, even if other solvers are used (as soon as the respective interfaces are implemented/available). Reasons for the new version: We have incorporated a new algorithm to simulate transient molecular-continuum systems and to automatically sample data over multiple MD runs that can be executed simultaneously (on, e.g., a compute cluster). MaMiCo has further been extended by an interface to incorporate boundary forcing to account for open molecular dynamics boundaries. Besides support for coupling with various MD and CFD frameworks, the new version contains a test case that allows to run molecular-continuum Couette flow simulations out-of-the-box. No external tools or simulation codes are required anymore. However, the user is free to switch from the included MD simulation package to LAMMPS. For details on how to run the transient Couette problem, see the file README in the folder coupling/tests, Remark on MaMiCo V1.1. Summary of revisions: Open boundary forcing; Multi-instance MD sampling; support for transient molecular-continuum systems Restrictions: Currently, only single-centered systems are supported. For access to the LAMMPS-based implementation of DPD boundary forcing, please contact Xin Bian, xin.bian@tum.de. Additional comments: Please see file license_mamico.txt for further details regarding distribution and advertising of this software.
QMMMW: A wrapper for QM/MM simulations with QUANTUM ESPRESSO and LAMMPS

NASA Astrophysics Data System (ADS)

Ma, Changru; Martin-Samos, Layla; Fabris, Stefano; Laio, Alessandro; Piccinin, Simone

2015-10-01

We present QMMMW, a new program aimed at performing Quantum Mechanics/Molecular Mechanics (QM/MM) molecular dynamics. The package operates as a wrapper that patches PWscf code included in the QUANTUM ESPRESSO distribution and LAMMPS Molecular Dynamics Simulator. It is designed with a paradigm based on three guidelines: (i) minimal amount of modifications on the parent codes, (ii) flexibility and computational efficiency of the communication layer and (iii) accuracy of the Hamiltonian describing the interaction between the QM and MM subsystems. These three features are seldom present simultaneously in other implementations of QMMM. The QMMMW project is hosted by qe-forge at
Fast Model Generalized Pseudopotential Theory Interatomic Potential Routine

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-03-18

MGPT is an unclassified source code for the fast evaluation and application of quantum-based MGPT interatomic potentials for mrtals. The present version of MGPT has been developed entirely at LLNL, but is specifically designed for implementation in the open-source molecular0dynamics code LAMMPS maintained by Sandia National Laboratories. Using MGPT in LAMMPS, with separate input potential data, one can perform large-scale atomistic simulations of the structural, thermodynamic, defeat and mechanical properties of transition metals with quantum-mechanical realism.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, Ashish Kumar, E-mail: ashish.memech@gmail.com; Singh, Akhileshwar; Mokhalingam, A.

Atomistic simulations were conducted to estimate the effect of the carbon nanotube (CNT) reinforcement on the mechanical behavior of CNT-reinforced aluminum (Al) nanocomposite. The periodic system of CNT-Al nanocomposite was built and simulated using molecular dynamics (MD) software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator). The mechanical properties of the nanocomposite were investigated by the application of uniaxial load on one end of the representative volume element (RVE) and fixing the other end. The interactions between the atoms of Al were modeled using embedded atom method (EAM) potentials, whereas Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential was used for themore » interactions among carbon atoms and these pair potentials are coupled with the Lennard-Jones (LJ) potential. The results show that the incorporation of CNT into the Al matrix can increase the Young’s modulus of the nanocomposite substantially. In the present case, i.e. for approximately 9 with % reinforcement of CNT can increase the axial Young’s modulus of the Al matrix up to 77 % as compared to pure Al.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less
Multiresolution molecular mechanics: Implementation and efficiency

NASA Astrophysics Data System (ADS)

Biyikli, Emre; To, Albert C.

2017-01-01

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
GPU-Accelerated Molecular Dynamics Simulation to Study Liquid Crystal Phase Transition Using Coarse-Grained Gay-Berne Anisotropic Potential.

PubMed

Chen, Wenduo; Zhu, Youliang; Cui, Fengchao; Liu, Lunyang; Sun, Zhaoyan; Chen, Jizhong; Li, Yunqi

2016-01-01

Gay-Berne (GB) potential is regarded as an accurate model in the simulation of anisotropic particles, especially for liquid crystal (LC) mesogens. However, its computational complexity leads to an extremely time-consuming process for large systems. Here, we developed a GPU-accelerated molecular dynamics (MD) simulation with coarse-grained GB potential implemented in GALAMOST package to investigate the LC phase transitions for mesogens in small molecules, main-chain or side-chain polymers. For identical mesogens in three different molecules, on cooling from fully isotropic melts, the small molecules form a single-domain smectic-B phase, while the main-chain LC polymers prefer a single-domain nematic phase as a result of connective restraints in neighboring mesogens. The phase transition of side-chain LC polymers undergoes a two-step process: nucleation of nematic islands and formation of multi-domain nematic texture. The particular behavior originates in the fact that the rotational orientation of the mesogenes is hindered by the polymer backbones. Both the global distribution and the local orientation of mesogens are critical for the phase transition of anisotropic particles. Furthermore, compared with the MD simulation in LAMMPS, our GPU-accelerated code is about 4 times faster than the GPU version of LAMMPS and at least 200 times faster than the CPU version of LAMMPS. This study clearly shows that GPU-accelerated MD simulation with GB potential in GALAMOST can efficiently handle systems with anisotropic particles and interactions, and accurately explore phase differences originated from molecular structures.
GPU-Accelerated Molecular Dynamics Simulation to Study Liquid Crystal Phase Transition Using Coarse-Grained Gay-Berne Anisotropic Potential

PubMed Central

Cui, Fengchao; Liu, Lunyang; Sun, Zhaoyan; Chen, Jizhong; Li, Yunqi

2016-01-01

Gay-Berne (GB) potential is regarded as an accurate model in the simulation of anisotropic particles, especially for liquid crystal (LC) mesogens. However, its computational complexity leads to an extremely time-consuming process for large systems. Here, we developed a GPU-accelerated molecular dynamics (MD) simulation with coarse-grained GB potential implemented in GALAMOST package to investigate the LC phase transitions for mesogens in small molecules, main-chain or side-chain polymers. For identical mesogens in three different molecules, on cooling from fully isotropic melts, the small molecules form a single-domain smectic-B phase, while the main-chain LC polymers prefer a single-domain nematic phase as a result of connective restraints in neighboring mesogens. The phase transition of side-chain LC polymers undergoes a two-step process: nucleation of nematic islands and formation of multi-domain nematic texture. The particular behavior originates in the fact that the rotational orientation of the mesogenes is hindered by the polymer backbones. Both the global distribution and the local orientation of mesogens are critical for the phase transition of anisotropic particles. Furthermore, compared with the MD simulation in LAMMPS, our GPU-accelerated code is about 4 times faster than the GPU version of LAMMPS and at least 200 times faster than the CPU version of LAMMPS. This study clearly shows that GPU-accelerated MD simulation with GB potential in GALAMOST can efficiently handle systems with anisotropic particles and interactions, and accurately explore phase differences originated from molecular structures. PMID:26986851
Molecular dynamics study of mechanical properties of carbon nanotube reinforced aluminum composites

NASA Astrophysics Data System (ADS)

Srivastava, Ashish Kumar; Mokhalingam, A.; Singh, Akhileshwar; Kumar, Dinesh

2016-05-01

Atomistic simulations were conducted to estimate the effect of the carbon nanotube (CNT) reinforcement on the mechanical behavior of CNT-reinforced aluminum (Al) nanocomposite. The periodic system of CNT-Al nanocomposite was built and simulated using molecular dynamics (MD) software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator). The mechanical properties of the nanocomposite were investigated by the application of uniaxial load on one end of the representative volume element (RVE) and fixing the other end. The interactions between the atoms of Al were modeled using embedded atom method (EAM) potentials, whereas Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential was used for the interactions among carbon atoms and these pair potentials are coupled with the Lennard-Jones (LJ) potential. The results show that the incorporation of CNT into the Al matrix can increase the Young's modulus of the nanocomposite substantially. In the present case, i.e. for approximately 9 with % reinforcement of CNT can increase the axial Young's modulus of the Al matrix up to 77 % as compared to pure Al.
LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E)

DTIC Science & Technology

2014-03-01

LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E) by James P. Larentzos, John K. Brennan, Joshua D. Moore, and...MD 21005-5069 ARL-TR-6863 March 2014 LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E) James P...13 September 2013 4. TITLE AND SUBTITLE LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E) 5a. CONTRACT NUMBER 5b
User Manual and Source Code for a LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E)

DTIC Science & Technology

2014-06-01

User Manual and Source Code for a LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E) by James P. Larentzos...Laboratory Aberdeen Proving Ground, MD 21005-5069 ARL-SR-290 June 2014 User Manual and Source Code for a LAMMPS Implementation of Constant...3. DATES COVERED (From - To) September 2013–February 2014 4. TITLE AND SUBTITLE User Manual and Source Code for a LAMMPS Implementation of
ASC-ATDM Performance Portability Requirements for 2015-2019

DOE Office of Scientific and Technical Information (OSTI.GOV)

Edwards, Harold C.; Trott, Christian Robert

This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less
Efficient molecular dynamics simulations with many-body potentials on graphics processing units

NASA Astrophysics Data System (ADS)

Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari

2017-09-01

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).
Modeling of crack growth under mixed-mode loading by a molecular dynamics method and a linear fracture mechanics approach

NASA Astrophysics Data System (ADS)

Stepanova, L. V.

2017-12-01

Atomistic simulations of the central crack growth process in an infinite plane medium under mixed-mode loading using Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code, are performed. The inter-atomic potential used in this investigation is the Embedded Atom Method (EAM) potential. Plane specimens with an initial central crack are subjected to mixed-mode loadings. The simulation cell contains 400,000 atoms. The crack propagation direction angles under different values of the mixity parameter in a wide range of values from pure tensile loading to pure shear loading in a wide range of temperatures (from 0.1 K to 800 K) are obtained and analyzed. It is shown that the crack propagation direction angles obtained by molecular dynamics coincide with the crack propagation direction angles given by the multi-parameter fracture criteria based on the strain energy density and the multi-parameter description of the crack-tip fields. The multi-parameter fracture criteria are based on the multi-parameter stress field description taking into account the higher order terms of the Williams series expansion of the crack tip fields.
Structural properties of atactic polystyrene adsorbed onto solid surfaces.

PubMed

Tatek, Yergou B; Tsige, Mesfin

2011-11-07

In the present work, we are studying the local conformation of chains in a thin film of polystyrene adsorbed on a solid substrate by using atomistically detailed simulations. The simulations are carried out by using the readily available and massively parallel molecular dynamics code known as LAMMPS. In particular, a special emphasis is given to the density and orientation of side chains (which consist of phenyl groups and methylene units) at solid/polymer and polymer/vacuum interfaces. Three types of substrates were used in our study: α-quartz, graphite, and amorphous silica. Our investigation was restricted to atactic polystyrene. Our results show that the density and structural properties of side chains depend on the type of surface. An excess of phenyl rings is observed near the α-quartz substrate while the film adsorbed on graphite is depleted in C(6)H(5). Moreover, the orientation of the rings and methylene units on the substrate/film interface show a strong dependence on the type of the substrate, while the rings at the film/vacuum interface show a marked tendency to point outward, away from the film. The results we obtained are in a large part in good agreement with previous experimental and simulation results.
Particle-based simulations of self-motile suspensions

NASA Astrophysics Data System (ADS)

Hinz, Denis F.; Panchenko, Alexander; Kim, Tae-Yeon; Fried, Eliot

2015-11-01

A simple model for simulating flows of active suspensions is investigated. The approach is based on dissipative particle dynamics. While the model is potentially applicable to a wide range of self-propelled particle systems, the specific class of self-motile bacterial suspensions is considered as a modeling scenario. To mimic the rod-like geometry of a bacterium, two dissipative particle dynamics particles are connected by a stiff harmonic spring to form an aggregate dissipative particle dynamics molecule. Bacterial motility is modeled through a constant self-propulsion force applied along the axis of each such aggregate molecule. The model accounts for hydrodynamic interactions between self-propelled agents through the pairwise dissipative interactions conventional to dissipative particle dynamics. Numerical simulations are performed using a customized version of the open-source software package LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) software package. Detailed studies of the influence of agent concentration, pairwise dissipative interactions, and Stokes friction on the statistics of the system are provided. The simulations are used to explore the influence of hydrodynamic interactions in active suspensions. For high agent concentrations in combination with dominating pairwise dissipative forces, strongly correlated motion patterns and a fluid-like spectral distributions of kinetic energy are found. In contrast, systems dominated by Stokes friction exhibit weaker spatial correlations of the velocity field. These results indicate that hydrodynamic interactions may play an important role in the formation of spatially extended structures in active suspensions.
Optimizing legacy molecular dynamics software with directive-based offload

DOE PAGES

Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; ...

2015-05-14

The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also resultmore » in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.« less
Peridynamics with LAMMPS : a user guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lehoucq, Richard B.; Silling, Stewart Andrew; Plimpton, Steven James

2008-01-01

Peridynamics is a nonlocal formulation of continuum mechanics. The discrete peridynamic model has the same computational structure as a molecular dynamic model. This document details the implementation of a discrete peridynamic model within the LAMMPS molecular dynamic code. This document provides a brief overview of the peridynamic model of a continuum, then discusses how the peridynamic model is discretized, and overviews the LAMMPS implementation. A nontrivial example problem is also included.
Mathematical modeling of the crack growth in linear elastic isotropic materials by conventional fracture mechanics approaches and by molecular dynamics method: crack propagation direction angle under mixed mode loading

NASA Astrophysics Data System (ADS)

Stepanova, Larisa; Bronnikov, Sergej

2018-03-01

The crack growth directional angles in the isotropic linear elastic plane with the central crack under mixed-mode loading conditions for the full range of the mixity parameter are found. Two fracture criteria of traditional linear fracture mechanics (maximum tangential stress and minimum strain energy density criteria) are used. Atomistic simulations of the central crack growth process in an infinite plane medium under mixed-mode loading using Large-scale Molecular Massively Parallel Simulator (LAMMPS), a classical molecular dynamics code, are performed. The inter-atomic potential used in this investigation is Embedded Atom Method (EAM) potential. The plane specimens with initial central crack were subjected to Mixed-Mode loadings. The simulation cell contains 400000 atoms. The crack propagation direction angles under different values of the mixity parameter in a wide range of values from pure tensile loading to pure shear loading in a wide diapason of temperatures (from 0.1 К to 800 К) are obtained and analyzed. It is shown that the crack propagation direction angles obtained by molecular dynamics method coincide with the crack propagation direction angles given by the multi-parameter fracture criteria based on the strain energy density and the multi-parameter description of the crack-tip fields.
Amorphous Carbon Nanospheres

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

Amorphous carbon nanosphere used as the anode material for Li-intercalation in Lithium-ion energy storage. This structure was obtained through a thermal annealing process at a temperature of 3000 degree Kelvin, simulated using the LAMMPS molecular dynamics code on the LCRC Fusion resource. Science: Kah Chun Lau and Larry Curtiss Visualization: Aaron Knoll, Mark Hereld and Michael E. Papka
EON: software for long time simulations of atomic scale systems

NASA Astrophysics Data System (ADS)

Chill, Samuel T.; Welborn, Matthew; Terrell, Rye; Zhang, Liang; Berthet, Jean-Claude; Pedersen, Andreas; Jónsson, Hannes; Henkelman, Graeme

2014-07-01

The EON software is designed for simulations of the state-to-state evolution of atomic scale systems over timescales greatly exceeding that of direct classical dynamics. States are defined as collections of atomic configurations from which a minimization of the potential energy gives the same inherent structure. The time evolution is assumed to be governed by rare events, where transitions between states are uncorrelated and infrequent compared with the timescale of atomic vibrations. Several methods for calculating the state-to-state evolution have been implemented in EON, including parallel replica dynamics, hyperdynamics and adaptive kinetic Monte Carlo. Global optimization methods, including simulated annealing, basin hopping and minima hopping are also implemented. The software has a client/server architecture where the computationally intensive evaluations of the interatomic interactions are calculated on the client-side and the state-to-state evolution is managed by the server. The client supports optimization for different computer architectures to maximize computational efficiency. The server is written in Python so that developers have access to the high-level functionality without delving into the computationally intensive components. Communication between the server and clients is abstracted so that calculations can be deployed on a single machine, clusters using a queuing system, large parallel computers using a message passing interface, or within a distributed computing environment. A generic interface to the evaluation of the interatomic interactions is defined so that empirical potentials, such as in LAMMPS, and density functional theory as implemented in VASP and GPAW can be used interchangeably. Examples are given to demonstrate the range of systems that can be modeled, including surface diffusion and island ripening of adsorbed atoms on metal surfaces, molecular diffusion on the surface of ice and global structural optimization of nanoparticles.
Molecular dynamic simulation of Ar-Kr mixture across a rough walled nanochannel: Velocity and temperature profiles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pooja,, E-mail: pupooja16@gmail.com; Ahluwalia, P. K., E-mail: pk-ahluwalia7@yahoo.com; Pathania, Y.

2015-05-15

This paper presents the results from a molecular dynamics simulation of mixture of argon and krypton in the Poiseuille flow across a rough walled nanochannel. The roughness effect on liquid nanoflows has recently drawn attention The computational software used for carrying out the molecular dynamics simulations is LAMMPS. The fluid flow takes place between two parallel plates and is bounded by horizontal rough walls in one direction and periodic boundary conditions are imposed in the other two directions. Each fluid atom interacts with other fluid atoms and wall atoms through Leenard-Jones (LJ) potential with a cut off distance of 5.0.more » To derive the flow a constant force is applied whose value is varied from 0.1 to 0.3 and velocity profiles and temperature profiles are noted for these values of forces. The velocity profile and temperature profiles are also looked at different channel widths of nanochannel and at different densities of mixture. The velocity profile and temperature profile of rough walled nanochannel are compared with that of smooth walled nanochannel and it is concluded that mean velocity increases with increase in channel width, force applied and decrease in density also with introduction of roughness in the walls of nanochannel mean velocity again increases and results also agree with the analytical solution of a Poiseuille flow.« less
Molecular dynamic simulation of Ar-Kr mixture across a rough walled nanochannel: Velocity & temperature profiles

NASA Astrophysics Data System (ADS)

Pooja, Pathania, Y.; Ahluwalia, P. K.

2015-05-01

This paper presents the results from a molecular dynamics simulation of mixture of argon and krypton in the Poiseuille flow across a rough walled nanochannel. The roughness effect on liquid nanoflows has recently drawn attention The computational software used for carrying out the molecular dynamics simulations is LAMMPS. The fluid flow takes place between two parallel plates and is bounded by horizontal rough walls in one direction and periodic boundary conditions are imposed in the other two directions. Each fluid atom interacts with other fluid atoms and wall atoms through Leenard-Jones (LJ) potential with a cut off distance of 5.0. To derive the flow a constant force is applied whose value is varied from 0.1 to 0.3 and velocity profiles and temperature profiles are noted for these values of forces. The velocity profile and temperature profiles are also looked at different channel widths of nanochannel and at different densities of mixture. The velocity profile and temperature profile of rough walled nanochannel are compared with that of smooth walled nanochannel and it is concluded that mean velocity increases with increase in channel width, force applied and decrease in density also with introduction of roughness in the walls of nanochannel mean velocity again increases and results also agree with the analytical solution of a Poiseuille flow.

Molecular Dynamics Simulation of Hydrogen Trapping on Sigma 5 Tungsten Grain Boundaries

NASA Astrophysics Data System (ADS)

Al-Shalash, Aws Mohammed Taha

Tungsten as a plasma facing material is the predominant contender for future Tokamak reactor environments. The interaction between the plasma particles and tungsten is crucial to be studied for successful usage and design of tungsten in the plasma facing components ensuring the reliability and longevity of the fusion reactors. The bombardment of the sigma 5 polycrystalline tungsten was modeled using the molecular dynamics simulation through the large-scale atomic/molecular massively parallel simulator (LAMMPS) code and Tersoff type interatomic potential. By simulating the operational conditions of the Tokamak reactors, the hydrogen trapping rate, implantation distribution, and bubble formation was investigated at various temperatures (300-1200 K) and various hydrogen incident energy (20-100 eV). The substrate's temperature increases the deflected H atoms, and increases the penetration depth for the ones that go through. As well, the lower temperature tungsten substrates retain more H atoms. Increasing the bombarded hydrogen's energy increases the trapping and retention rate and the depth of penetration. Another experiments were conducted to determine whether the Sigma5 grain boundary's (GB) location affects the trapping profiles in H. The findings are ranges from small effect on deflection rates at low H energies to no effect at high H energies. However, there is a considerable effect on shifting the trapping depth profile upward toward the surface when raising the GB closer to the surface. Hydrogen atoms are highly mobile on tungsten substrate, yet no bubble formation was witnessed.
Peridynamics with LAMMPS : a user guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lehoucq, Richard B.; Silling, Stewart Andrew; Seleson, Pablo

Peridynamics is a nonlocal extension of classical continuum mechanics. The discrete peridynamic model has the same computational structure as a molecular dynamics model. This document provides a brief overview of the peridynamic model of a continuum, then discusses how the peridynamic model is discretized within LAMMPS. An example problem is also included.
Accelerating Calculations of Reaction Dissipative Particle Dynamics in LAMMPS

DTIC Science & Technology

2017-05-17

order reaction mechanism, the best acceleration was 6.1 times. For a larger, more chemically detailed mechanism, the best acceleration exceeded 60 times...simulations at previously inaccessible scales. A principle feature of DPD-RX is its ability to model chemical reactions within each CG particle. The...change in composition due to chemical reactions is described by a system of ordinary differential equations (ODEs) that are evaluated at each DPD time
Molecular dynamics simulation of metal nanoislands growth

NASA Astrophysics Data System (ADS)

Kapralov, N. V.; Babich, E. S.; Redkov, A. V.

2017-11-01

We present the atomistic model and the simulation of a self-assembled growth of a silver nanoisland film and small groups of nanoislands on a glass substrate after thermal poling of the glass with a profiled electrode. The calculations were performed in molecular dynamics simulator LAMMPS taking into account the diffusion of the metal atoms towards and along the glass surface and their clustering. Lennard-Jones potential was used to describe metal-metal and metal-glass interaction. The potential parameters were determined to provide qualitative coincidence of the simulated configurations of the metal nanostructures and the experimental ones, such as an isolated nanoisland, a pair and a set of three nanoislands and a “plasmonic molecule”.
Using LAMMPS Software on the Peregrine System | High-Performance Computing

Science.gov Websites

-l walltime=4:00:00 # WALLTIME #PBS -l nodes=2:ppn=16 # Number of nodes and processes per node #PBS module purge module load impi-intel/2017.0.5 mkl/2017.0.5 lammps/11Aug17 mpirun -np 32 lmp -in lmp.in -l
UV-activated ZnO films on a flexible substrate for room temperature O 2 and H 2O sensing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacobs, Christopher B.; Maksov, Artem B.; Muckley, Eric S.

Here, we demonstrate that UV-light activation of polycrystalline ZnO films on flexible polyimide (Kapton) substrates can be used to detect and differentiate between environmental changes in oxygen and water vapor. The in-plane resistive and impedance properties of ZnO films, fabricated from bacteria-derived ZnS nanoparticles, exhibit unique resistive and capacitive responses to changes in O 2 and H 2O. We also propose that the distinctive responses to O 2 and H 2O adsorption on ZnO could be utilized to statistically discriminate between the two analytes. Molecular dynamic simulations (MD) of O 2 and H 2O adsorption energy on ZnO surfaces weremore » performed using the large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) with a reactive force-field (ReaxFF). Furthermore, these simulations suggest that the adsorption mechanisms differ for O 2 and H 2O adsorption on ZnO, and are governed by the surface termination and the extent of surface hydroxylation. Electrical response measurements, using DC resistance, AC impedance spectroscopy, and Kelvin Probe Force Microscopy (KPFM), demonstrate differences in response to O 2 and H 2O, confirming that different adsorption mechanisms are involved. Statistical and machine learning approaches were applied to demonstrate that by integrating the electrical and kinetic responses the flexible ZnO sensor can be used for detection and discrimination between O 2 and H 2O at low temperature.« less
UV-activated ZnO films on a flexible substrate for room temperature O 2 and H 2O sensing

DOE PAGES

Jacobs, Christopher B.; Maksov, Artem B.; Muckley, Eric S.; ...

2017-07-20

Here, we demonstrate that UV-light activation of polycrystalline ZnO films on flexible polyimide (Kapton) substrates can be used to detect and differentiate between environmental changes in oxygen and water vapor. The in-plane resistive and impedance properties of ZnO films, fabricated from bacteria-derived ZnS nanoparticles, exhibit unique resistive and capacitive responses to changes in O 2 and H 2O. We also propose that the distinctive responses to O 2 and H 2O adsorption on ZnO could be utilized to statistically discriminate between the two analytes. Molecular dynamic simulations (MD) of O 2 and H 2O adsorption energy on ZnO surfaces weremore » performed using the large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) with a reactive force-field (ReaxFF). Furthermore, these simulations suggest that the adsorption mechanisms differ for O 2 and H 2O adsorption on ZnO, and are governed by the surface termination and the extent of surface hydroxylation. Electrical response measurements, using DC resistance, AC impedance spectroscopy, and Kelvin Probe Force Microscopy (KPFM), demonstrate differences in response to O 2 and H 2O, confirming that different adsorption mechanisms are involved. Statistical and machine learning approaches were applied to demonstrate that by integrating the electrical and kinetic responses the flexible ZnO sensor can be used for detection and discrimination between O 2 and H 2O at low temperature.« less
Statistical study of defects caused by primary knock-on atoms in fcc Cu and bcc W using molecular dynamics

NASA Astrophysics Data System (ADS)

Warrier, M.; Bhardwaj, U.; Hemani, H.; Schneider, R.; Mutzke, A.; Valsakumar, M. C.

2015-12-01

We report on molecular Dynamics (MD) simulations carried out in fcc Cu and bcc W using the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) code to study (i) the statistical variations in the number of interstitials and vacancies produced by energetic primary knock-on atoms (PKA) (0.1-5 keV) directed in random directions and (ii) the in-cascade cluster size distributions. It is seen that around 60-80 random directions have to be explored for the average number of displaced atoms to become steady in the case of fcc Cu, whereas for bcc W around 50-60 random directions need to be explored. The number of Frenkel pairs produced in the MD simulations are compared with that from the Binary Collision Approximation Monte Carlo (BCA-MC) code SDTRIM-SP and the results from the NRT model. It is seen that a proper choice of the damage energy, i.e. the energy required to create a stable interstitial, is essential for the BCA-MC results to match the MD results. On the computational front it is seen that in-situ processing saves the need to input/output (I/O) atomic position data of several tera-bytes when exploring a large number of random directions and there is no difference in run-time because the extra run-time in processing data is offset by the time saved in I/O.
UV-activated ZnO films on a flexible substrate for room temperature O2 and H2O sensing.

PubMed

Jacobs, Christopher B; Maksov, Artem B; Muckley, Eric S; Collins, Liam; Mahjouri-Samani, Masoud; Ievlev, Anton; Rouleau, Christopher M; Moon, Ji-Won; Graham, David E; Sumpter, Bobby G; Ivanov, Ilia N

2017-07-20

We demonstrate that UV-light activation of polycrystalline ZnO films on flexible polyimide (Kapton) substrates can be used to detect and differentiate between environmental changes in oxygen and water vapor. The in-plane resistive and impedance properties of ZnO films, fabricated from bacteria-derived ZnS nanoparticles, exhibit unique resistive and capacitive responses to changes in O 2 and H 2 O. We propose that the distinctive responses to O 2 and H 2 O adsorption on ZnO could be utilized to statistically discriminate between the two analytes. Molecular dynamic simulations (MD) of O 2 and H 2 O adsorption energy on ZnO surfaces were performed using the large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) with a reactive force-field (ReaxFF). These simulations suggest that the adsorption mechanisms differ for O 2 and H 2 O adsorption on ZnO, and are governed by the surface termination and the extent of surface hydroxylation. Electrical response measurements, using DC resistance, AC impedance spectroscopy, and Kelvin Probe Force Microscopy (KPFM), demonstrate differences in response to O 2 and H 2 O, confirming that different adsorption mechanisms are involved. Statistical and machine learning approaches were applied to demonstrate that by integrating the electrical and kinetic responses the flexible ZnO sensor can be used for detection and discrimination between O 2 and H 2 O at low temperature.
Algorithms of GPU-enabled reactive force field (ReaxFF) molecular dynamics.

PubMed

Zheng, Mo; Li, Xiaoxia; Guo, Li

2013-04-01

Reactive force field (ReaxFF), a recent and novel bond order potential, allows for reactive molecular dynamics (ReaxFF MD) simulations for modeling larger and more complex molecular systems involving chemical reactions when compared with computation intensive quantum mechanical methods. However, ReaxFF MD can be approximately 10-50 times slower than classical MD due to its explicit modeling of bond forming and breaking, the dynamic charge equilibration at each time-step, and its one order smaller time-step than the classical MD, all of which pose significant computational challenges in simulation capability to reach spatio-temporal scales of nanometers and nanoseconds. The very recent advances of graphics processing unit (GPU) provide not only highly favorable performance for GPU enabled MD programs compared with CPU implementations but also an opportunity to manage with the computing power and memory demanding nature imposed on computer hardware by ReaxFF MD. In this paper, we present the algorithms of GMD-Reax, the first GPU enabled ReaxFF MD program with significantly improved performance surpassing CPU implementations on desktop workstations. The performance of GMD-Reax has been benchmarked on a PC equipped with a NVIDIA C2050 GPU for coal pyrolysis simulation systems with atoms ranging from 1378 to 27,283. GMD-Reax achieved speedups as high as 12 times faster than Duin et al.'s FORTRAN codes in Lammps on 8 CPU cores and 6 times faster than the Lammps' C codes based on PuReMD in terms of the simulation time per time-step averaged over 100 steps. GMD-Reax could be used as a new and efficient computational tool for exploiting very complex molecular reactions via ReaxFF MD simulation on desktop workstations. Copyright © 2013 Elsevier Inc. All rights reserved.
A molecular dynamics study on sI hydrogen hydrate.

PubMed

Mondal, S; Ghosh, S; Chattaraj, P K

2013-07-01

A molecular dynamics simulation is carried out to explore the possibility of using sI clathrate hydrate as hydrogen storage material. Metastable hydrogen hydrate structures are generated using the LAMMPS software. Different binding energies and radial distribution functions provide important insights into the behavior of the various types of hydrogen and oxygen atoms present in the system. Clathrate hydrate cages become more stable in the presence of guest molecules like hydrogen.
A mechanistic Individual-based Model of microbial communities.

PubMed

Jayathilake, Pahala Gedara; Gupta, Prashant; Li, Bowen; Madsen, Curtis; Oyebamiji, Oluwole; González-Cabaleiro, Rebeca; Rushton, Steve; Bridgens, Ben; Swailes, David; Allen, Ben; McGough, A Stephen; Zuliani, Paolo; Ofiteru, Irina Dana; Wilkinson, Darren; Chen, Jinju; Curtis, Tom

2017-01-01

Accurate predictive modelling of the growth of microbial communities requires the credible representation of the interactions of biological, chemical and mechanical processes. However, although biological and chemical processes are represented in a number of Individual-based Models (IbMs) the interaction of growth and mechanics is limited. Conversely, there are mechanically sophisticated IbMs with only elementary biology and chemistry. This study focuses on addressing these limitations by developing a flexible IbM that can robustly combine the biological, chemical and physical processes that dictate the emergent properties of a wide range of bacterial communities. This IbM is developed by creating a microbiological adaptation of the open source Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). This innovation should provide the basis for "bottom up" prediction of the emergent behaviour of entire microbial systems. In the model presented here, bacterial growth, division, decay, mechanical contact among bacterial cells, and adhesion between the bacteria and extracellular polymeric substances are incorporated. In addition, fluid-bacteria interaction is implemented to simulate biofilm deformation and erosion. The model predicts that the surface morphology of biofilms becomes smoother with increased nutrient concentration, which agrees well with previous literature. In addition, the results show that increased shear rate results in smoother and more compact biofilms. The model can also predict shear rate dependent biofilm deformation, erosion, streamer formation and breakup.
A mechanistic Individual-based Model of microbial communities

PubMed Central

Gupta, Prashant; Li, Bowen; Madsen, Curtis; Oyebamiji, Oluwole; González-Cabaleiro, Rebeca; Rushton, Steve; Bridgens, Ben; Swailes, David; Allen, Ben; McGough, A. Stephen; Zuliani, Paolo; Ofiteru, Irina Dana; Wilkinson, Darren; Chen, Jinju; Curtis, Tom

2017-01-01

Accurate predictive modelling of the growth of microbial communities requires the credible representation of the interactions of biological, chemical and mechanical processes. However, although biological and chemical processes are represented in a number of Individual-based Models (IbMs) the interaction of growth and mechanics is limited. Conversely, there are mechanically sophisticated IbMs with only elementary biology and chemistry. This study focuses on addressing these limitations by developing a flexible IbM that can robustly combine the biological, chemical and physical processes that dictate the emergent properties of a wide range of bacterial communities. This IbM is developed by creating a microbiological adaptation of the open source Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). This innovation should provide the basis for “bottom up” prediction of the emergent behaviour of entire microbial systems. In the model presented here, bacterial growth, division, decay, mechanical contact among bacterial cells, and adhesion between the bacteria and extracellular polymeric substances are incorporated. In addition, fluid-bacteria interaction is implemented to simulate biofilm deformation and erosion. The model predicts that the surface morphology of biofilms becomes smoother with increased nutrient concentration, which agrees well with previous literature. In addition, the results show that increased shear rate results in smoother and more compact biofilms. The model can also predict shear rate dependent biofilm deformation, erosion, streamer formation and breakup. PMID:28771505
Pizza.py Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Plimpton, Steve; Jones, Matt; Crozier, Paul

2006-01-01

Pizza.py is a loosely integrated collection of tools, many of which provide support for the LAMMPS molecular dynamics and ChemCell cell modeling packages. There are tools to create input files. convert between file formats, process log and dump files, create plots, and visualize and animate simulation snapshots. Software packages that are wrapped by Pizza.py. so they can invoked from within Python, include GnuPlot, MatLab, Raster3d. and RasMol. Pizza.py is written in Python and runs on any platform that supports Python. Pizza.py enhances the standard Python interpreter in a few simple ways. Its tools are Python modules which can be invokedmore » interactively, from scripts, or from GUIs when appropriate. Some of the tools require additional Python packages to be installed as part of the users Python. Others are wrappers on software packages (as listed above) which must be available on the users system. It is easy to modify or extend Pizza.py with new functionality or new tools, which need not have anything to do with LAMMPS or ChemCell.« less
Coarse-graining to the meso and continuum scales with molecular-dynamics-like models

NASA Astrophysics Data System (ADS)

Plimpton, Steve

Many engineering-scale problems that industry or the national labs try to address with particle-based simulations occur at length and time scales well beyond the most optimistic hopes of traditional coarse-graining methods for molecular dynamics (MD), which typically start at the atomic scale and build upward. However classical MD can be viewed as an engine for simulating particles at literally any length or time scale, depending on the models used for individual particles and their interactions. To illustrate I'll highlight several coarse-grained (CG) materials models, some of which are likely familiar to molecular-scale modelers, but others probably not. These include models for water droplet freezing on surfaces, dissipative particle dynamics (DPD) models of explosives where particles have internal state, CG models of nano or colloidal particles in solution, models for aspherical particles, Peridynamics models for fracture, and models of granular materials at the scale of industrial processing. All of these can be implemented as MD-style models for either soft or hard materials; in fact they are all part of our LAMMPS MD package, added either by our group or contributed by collaborators. Unlike most all-atom MD simulations, CG simulations at these scales often involve highly non-uniform particle densities. So I'll also discuss a load-balancing method we've implemented for these kinds of models, which can improve parallel efficiencies. From the physics point-of-view, these models may be viewed as non-traditional or ad hoc. But because they are MD-style simulations, there's an opportunity for physicists to add statistical mechanics rigor to individual models. Or, in keeping with a theme of this session, to devise methods that more accurately bridge models from one scale to the next.
Thermal conductivity of cross-linked polyethylene from molecular dynamics simulation

NASA Astrophysics Data System (ADS)

Xiong, Xue; Yang, Ming; Liu, Changlin; Li, Xiaobo; Tang, Dawei

2017-07-01

The thermal conductivity of cross-linked bulk polyethylene is studied using molecular dynamics simulation. The atomic structure of the cross-linked polyethylene (PEX) is generated through simulated bond formation using LAMMPS. The thermal conductivity of PEX is studied with different degrees of crosslinking, chain length, and tensile strain. Generally, the thermal conductivity increases with the increasing degree of crosslinking. When the length of the primitive chain increases, the thermal conductivity increases linearly. When the polymer is stretched along one direction, the thermal conductivity increases in the stretched direction and decreases in the direction perpendicular to it. However, the thermal conductivity varies slightly when the polymer is stretched in three directions simultaneously.
Quantum nuclear effects in water using centroid molecular dynamics

NASA Astrophysics Data System (ADS)

Kondratyuk, N. D.; Norman, G. E.; Stegailov, V. V.

2018-01-01

The quantum nuclear effects are studied in water using the method of centroid molecular dynamics (CMD). The aim is the calibration of CMD implementation in LAMMPS. The calculated intramolecular energy, atoms gyration radii and radial distribution functions are shown in comparison with previous works. The work is assumed to be the step toward to solution of the discrepancy between the simulation results and the experimental data of liquid n-alkane properties in our previous works.
pysimm: A Python Package for Simulation of Molecular Systems

NASA Astrophysics Data System (ADS)

Fortunato, Michael; Colina, Coray

pysimm, short for python simulation interface for molecular modeling, is a python package designed to facilitate the structure generation and simulation of molecular systems through convenient and programmatic access to object-oriented representations of molecular system data. This poster presents core features of pysimm and design philosophies that highlight a generalized methodology for incorporation of third-party software packages through API interfaces. The integration with the LAMMPS simulation package is explained to demonstrate this methodology. pysimm began as a back-end python library that powered a cloud-based application on nanohub.org for amorphous polymer simulation. The extension from a specific application library to general purpose simulation interface is explained. Additionally, this poster highlights the rapid development of new applications to construct polymer chains capable of controlling chain morphology such as molecular weight distribution and monomer composition.
Development of a Charge-Implicit ReaxFF Potential for Hydrocarbon Systems.

PubMed

Kański, Michał; Maciążek, Dawid; Postawa, Zbigniew; Ashraf, Chowdhury M; van Duin, Adri C T; Garrison, Barbara J

2018-01-18

Molecular dynamics (MD) simulations continue to make important contributions to understanding chemical and physical processes. Concomitant with the growth of MD simulations is the need to have interaction potentials that both represent the chemistry of the system and are computationally efficient. We propose a modification to the ReaxFF potential for carbon and hydrogen that eliminates the time-consuming charge equilibration, eliminates the acknowledged flaws of the electronegativity equalization method, includes an expanded training set for condensed phases, has a repulsive wall for simulations of energetic particle bombardment, and is compatible with the LAMMPS code. This charge-implicit ReaxFF potential is five times faster than the conventional ReaxFF potential for a simulation of keV particle bombardment with a sample size of over 800 000 atoms.
Investigation of the effect of wall friction on the flow rate in 2D and 3D Granular Flow

NASA Astrophysics Data System (ADS)

Carballo-Ramirez, Brenda; Pleau, Mollie; Easwar, Nalini; Birwa, Sumit; Shah, Neil; Tewari, Shubha

We have measured the mass flow rate of spherical steel spheres under gravity in vertical, straight-walled 2 and 3-dimensional hoppers, where the flow velocity is controlled by the opening size. Our measurements focus on the role of friction and its placement along the walls of the hopper. In the 2D case, an increase in the coefficient of static friction from μ = 0.2 to 0.6 is seen to decrease the flow rate significantly. We have changed the placement of frictional boundaries/regions from the front and back walls of the 2D hopper to the side walls and floor to investigate the relative importance of the different regions in determining the flow rate. Fits to the Beverloo equation show significant departure from the expected exponent of 1.5 in the case of 2D flow. In contrast, 3D flow rates do not show much dependence on wall friction and its placement. We compare the experimental data to numerical simulations of gravity driven hopper granular flow with varying frictional walls constructed using LAMMPS*. *http://lammps.sandia.gov Supported by NSF MRSEC DMR 0820506.

Calculations of lattice vibrational mode lifetimes using Jazz: a Python wrapper for LAMMPS

NASA Astrophysics Data System (ADS)

Gao, Y.; Wang, H.; Daw, M. S.

2015-06-01

Jazz is a new python wrapper for LAMMPS [1], implemented to calculate the lifetimes of vibrational normal modes based on forces as calculated for any interatomic potential available in that package. The anharmonic character of the normal modes is analyzed via the Monte Carlo-based moments approximation as is described in Gao and Daw [2]. It is distributed as open-source software and can be downloaded from the website http://jazz.sourceforge.net/.
Structure and dynamics of complex liquid water: Molecular dynamics simulation

NASA Astrophysics Data System (ADS)

S, Indrajith V.; Natesan, Baskaran

2015-06-01

We have carried out detailed structure and dynamical studies of complex liquid water using molecular dynamics simulations. Three different model potentials, namely, TIP3P, TIP4P and SPC-E have been used in the simulations, in order to arrive at the best possible potential function that could reproduce the structure of experimental bulk water. All the simulations were performed in the NVE micro canonical ensemble using LAMMPS. The radial distribution functions, gOO, gOH and gHH and the self diffusion coefficient, Ds, were calculated for all three models. We conclude from our results that the structure and dynamical parameters obtained for SPC-E model matched well with the experimental values, suggesting that among the models studied here, the SPC-E model gives the best structure and dynamics of bulk water.
Segregation formation, thermal and electronic properties of ternary cubic CdZnTe clusters: MD simulations and DFT calculations

NASA Astrophysics Data System (ADS)

Kurban, Mustafa; Erkoç, Şakir

2017-04-01

Surface and core formation, thermal and electronic properties of ternary cubic CdZnTe clusters are investigated by using classical molecular dynamics (MD) simulations and density functional theory (DFT) calculations. In this work, MD simulations of the CdZnTe clusters are performed by means of LAMMPS by using bond order potential (BOP). MD simulations are carried out at different temperatures to study the segregation phenomena of Cd, Zn and Te atoms, and deviation of clusters and heat capacity. After that, using optimized geometries obtained, excess charge on atoms, dipole moments, highest occupied molecular orbitals, lowest unoccupied molecular orbitals, HOMO-LUMO gaps (Eg) , total energies, spin density and the density of states (DOS) have been calculated with DFT. Simulation results such as heat capacity and segregation formation are compared with experimental bulk and theoretical results.
SC'11 Poster: A Highly Efficient MGPT Implementation for LAMMPS; with Strong Scaling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oppelstrup, T; Stukowski, A; Marian, J

2011-12-07

The MGPT potential has been implemented as a drop in package to the general molecular dynamics code LAMMPS. We implement an improved communication scheme that shrinks the communication layer thickness, and increases the load balancing. This results in unprecedented strong scaling, and speedup continuing beyond 1/8 atom/core. In addition, we have optimized the small matrix linear algebra with generic blocking (for all processors) and specific SIMD intrinsics for vectorization on Intel, AMD, and BlueGene CPUs.
Analysis of Decomposition for Structure I Methane Hydrate by Molecular Dynamics Simulation

NASA Astrophysics Data System (ADS)

Wei, Na; Sun, Wan-Tong; Meng, Ying-Feng; Liu, An-Qi; Zhou, Shou-Wei; Guo, Ping; Fu, Qiang; Lv, Xin

2018-05-01

Under multi-nodes of temperatures and pressures, microscopic decomposition mechanisms of structure I methane hydrate in contact with bulk water molecules have been studied through LAMMPS software by molecular dynamics simulation. Simulation system consists of 482 methane molecules in hydrate and 3027 randomly distributed bulk water molecules. Through analyses of simulation results, decomposition number of hydrate cages, density of methane molecules, radial distribution function for oxygen atoms, mean square displacement and coefficient of diffusion of methane molecules have been studied. A significant result shows that structure I methane hydrate decomposes from hydrate-bulk water interface to hydrate interior. As temperature rises and pressure drops, the stabilization of hydrate will weaken, decomposition extent will go deep, and mean square displacement and coefficient of diffusion of methane molecules will increase. The studies can provide important meanings for the microscopic decomposition mechanisms analyses of methane hydrate.
Incremental update of electrostatic interactions in adaptively restrained particle simulations.

PubMed

Edorh, Semeho Prince A; Redon, Stéphane

2018-04-06

The computation of long-range potentials is one of the demanding tasks in Molecular Dynamics. During the last decades, an inventive panoply of methods was developed to reduce the CPU time of this task. In this work, we propose a fast method dedicated to the computation of the electrostatic potential in adaptively restrained systems. We exploit the fact that, in such systems, only some particles are allowed to move at each timestep. We developed an incremental algorithm derived from a multigrid-based alternative to traditional Fourier-based methods. Our algorithm was implemented inside LAMMPS, a popular molecular dynamics simulation package. We evaluated the method on different systems. We showed that the new algorithm's computational complexity scales with the number of active particles in the simulated system, and is able to outperform the well-established Particle Particle Particle Mesh (P3M) for adaptively restrained simulations. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Development and application of a particle-particle particle-mesh Ewald method for dispersion interactions.

PubMed

Isele-Holder, Rolf E; Mitchell, Wayne; Ismail, Ahmed E

2012-11-07

For inhomogeneous systems with interfaces, the inclusion of long-range dispersion interactions is necessary to achieve consistency between molecular simulation calculations and experimental results. For accurate and efficient incorporation of these contributions, we have implemented a particle-particle particle-mesh Ewald solver for dispersion (r(-6)) interactions into the LAMMPS molecular dynamics package. We demonstrate that the solver's O(N log N) scaling behavior allows its application to large-scale simulations. We carefully determine a set of parameters for the solver that provides accurate results and efficient computation. We perform a series of simulations with Lennard-Jones particles, SPC/E water, and hexane to show that with our choice of parameters the dependence of physical results on the chosen cutoff radius is removed. Physical results and computation time of these simulations are compared to results obtained using either a plain cutoff or a traditional Ewald sum for dispersion.
Single-pass incremental force updates for adaptively restrained molecular dynamics.

PubMed

Singh, Krishna Kant; Redon, Stephane

2018-03-30

Adaptively restrained molecular dynamics (ARMD) allows users to perform more integration steps in wall-clock time by switching on and off positional degrees of freedoms. This article presents new, single-pass incremental force updates algorithms to efficiently simulate a system using ARMD. We assessed different algorithms for speedup measurements and implemented them in the LAMMPS MD package. We validated the single-pass incremental force update algorithm on four different benchmarks using diverse pair potentials. The proposed algorithm allows us to perform simulation of a system faster than traditional MD in both NVE and NVT ensembles. Moreover, ARMD using the new single-pass algorithm speeds up the convergence of observables in wall-clock time. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Study of percolation behavior depending on molecular structure design

NASA Astrophysics Data System (ADS)

Yu, Ji Woong; Lee, Won Bo

Each differently designed anisotropic nano-crystals(ANCs) are studied using Langevin dynamic simulation and their percolation behaviors are presented. Popular molecular dynamics software LAMMPS was used to design the system and perform the simulation. We calculated the minimum number density at which percolation occurs(i.e. percolation threshold), radial distribution function, and the average number of ANCs for a cluster. Electrical conductivity is improved when the number of transfers of electrons between ANCs, so called ''inter-hopping process'', which has the considerable contribution to resistance decreases and the number of inter-hopping process is directly related with the concentration of ANCs. Therefore, with the investigation of relationship between molecular architecture and percolation behavior, optimal design of ANC can be achieved.
Precise calculation of the local pressure tensor in Cartesian and spherical coordinates in LAMMPS

NASA Astrophysics Data System (ADS)

Nakamura, Takenobu; Kawamoto, Shuhei; Shinoda, Wataru

2015-05-01

An accurate and efficient algorithm for calculating the 3D pressure field has been developed and implemented in the open-source molecular dynamics package, LAMMPS. Additionally, an algorithm to compute the pressure profile along the radial direction in spherical coordinates has also been implemented. The latter is particularly useful for systems showing a spherical symmetry such as micelles and vesicles. These methods yield precise pressure fields based on the Irving-Kirkwood contour integration and are particularly useful for biomolecular force fields. The present methods are applied to several systems including a buckled membrane and a vesicle.
Study of Effect of Impacting Direction on Abrasive Nanometric Cutting Process with Molecular Dynamics

NASA Astrophysics Data System (ADS)

Li, Junye; Meng, Wenqing; Dong, Kun; Zhang, Xinming; Zhao, Weihong

2018-01-01

Abrasive flow polishing plays an important part in modern ultra-precision machining. Ultrafine particles suspended in the medium of abrasive flow removes the material in nanoscale. In this paper, three-dimensional molecular dynamics (MD) simulations are performed to investigate the effect of impacting direction on abrasive cutting process during abrasive flow polishing. The molecular dynamics simulation software Lammps was used to simulate the cutting of single crystal copper with SiC abrasive grains at different cutting angles (0o-45o). At a constant friction coefficient, we found a direct relation between cutting angle and cutting force, which ultimately increases the number of dislocation during abrasive flow machining. Our theoretical study reveal that a small cutting angle is beneficial for improving surface quality and reducing internal defects in the workpiece. However, there is no obvious relationship between cutting angle and friction coefficient.
Study of Effect of Impacting Direction on Abrasive Nanometric Cutting Process with Molecular Dynamics.

PubMed

Li, Junye; Meng, Wenqing; Dong, Kun; Zhang, Xinming; Zhao, Weihong

2018-01-11

Abrasive flow polishing plays an important part in modern ultra-precision machining. Ultrafine particles suspended in the medium of abrasive flow removes the material in nanoscale. In this paper, three-dimensional molecular dynamics (MD) simulations are performed to investigate the effect of impacting direction on abrasive cutting process during abrasive flow polishing. The molecular dynamics simulation software Lammps was used to simulate the cutting of single crystal copper with SiC abrasive grains at different cutting angles (0 o -45 o ). At a constant friction coefficient, we found a direct relation between cutting angle and cutting force, which ultimately increases the number of dislocation during abrasive flow machining. Our theoretical study reveal that a small cutting angle is beneficial for improving surface quality and reducing internal defects in the workpiece. However, there is no obvious relationship between cutting angle and friction coefficient.
Stability of Granular Packings Jammed under Gravity: Avalanches and Unjamming

NASA Astrophysics Data System (ADS)

Merrigan, Carl; Birwa, Sumit; Tewari, Shubha; Chakraborty, Bulbul

Granular avalanches indicate the sudden destabilization of a jammed state due to a perturbation. We propose that the perturbation needed depends on the entire force network of the jammed configuration. Some networks are stable, while others are fragile, leading to the unpredictability of avalanches. To test this claim, we simulated an ensemble of jammed states in a hopper using LAMMPS. These simulations were motivated by experiments with vibrated hoppers where the unjamming times followed power-law distributions. We compare the force networks for these simulated states with respect to their overall stability. The states are classified by how long they remain stable when subject to continuous vibrations. We characterize the force networks through both their real space geometry and representations in the associated force-tile space, extending this tool to jammed states with body forces. Supported by NSF Grant DMR1409093 and DGE1068620.
Simulating the dynamics of complex plasmas.

PubMed

Schwabe, M; Graves, D B

2013-08-01

Complex plasmas are low-temperature plasmas that contain micrometer-size particles in addition to the neutral gas particles and the ions and electrons that make up the plasma. The microparticles interact strongly and display a wealth of collective effects. Here we report on linked numerical simulations that reproduce many of the experimental results of complex plasmas. We model a capacitively coupled plasma with a fluid code written for the commercial package comsol. The output of this model is used to calculate forces on microparticles. The microparticles are modeled using the molecular dynamics package lammps, which we extended to include the forces from the plasma. Using this method, we are able to reproduce void formation, the separation of particles of different sizes into layers, lane formation, vortex formation, and other effects.
Experiment and simulation of the fabrication process of lithium-ion battery cathodes for determining microstructure and mechanical properties

NASA Astrophysics Data System (ADS)

Forouzan, Mehdi M.; Chao, Chien-Wei; Bustamante, Danilo; Mazzeo, Brian A.; Wheeler, Dean R.

2016-04-01

The fabrication process of Li-ion battery electrodes plays a prominent role in the microstructure and corresponding cell performance. Here, a mesoscale particle dynamics simulation is developed to relate the manufacturing process of a cathode containing Toda NCM-523 active material to physical and structural properties of the dried film. Particle interactions are simulated with shifted-force Lennard-Jones and granular Hertzian functions. LAMMPS, a freely available particle simulator, is used to generate particle trajectories and resulting predicted properties. To make simulations of the full film thickness feasible, the carbon binder domain (CBD) is approximated with μm-scale particles, each representing about 1000 carbon black particles and associated binder. Metrics for model parameterization and validation are measured experimentally and include the following: slurry viscosity, elasticity of the dried film, shrinkage ratio during drying, volume fraction of phases, slurry and dried film densities, and microstructure cross sections. Simulation results are in substantial agreement with experiment, showing that the simulations reasonably reproduce the relevant physics of particle arrangement during fabrication.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Wang, Peng; Plimpton, Steven J

The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Molecular dynamic simulation of Copper and Platinum nanoparticles Poiseuille flow in a nanochannels

NASA Astrophysics Data System (ADS)

Toghraie, Davood; Mokhtari, Majid; Afrand, Masoud

2016-10-01

In this paper, simulation of Poiseuille flow within nanochannel containing Copper and Platinum particles has been performed using molecular dynamic (MD). In this simulation LAMMPS code is used to simulate three-dimensional Poiseuille flow. The atomic interaction is governed by the modified Lennard-Jones potential. To study the wall effects on the surface tension and density profile, we placed two solid walls, one at the bottom boundary and the other at the top boundary. For solid-liquid interactions, the modified Lennard-Jones potential function was used. Velocity profiles and distribution of temperature and density have been obtained, and agglutination of nanoparticles has been discussed. It has also shown that with more particles, less time is required for the particles to fuse or agglutinate. Also, we can conclude that the agglutination time in nanochannel with Copper particles is faster that in Platinum nanoparticles. Finally, it is demonstrated that using nanoparticles raises thermal conduction in the channel.
Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue: A GPU-accelerated molecular dynamics software

NASA Astrophysics Data System (ADS)

Yang, Lin; Zhang, Feng; Wang, Cai-Zhuang; Ho, Kai-Ming; Travesset, Alex

2018-04-01

We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. We first discuss the details of our implementation and then report extensive benchmark tests. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu64.5Zr35.5, and pair correlation function g (r) of liquid Ni3Al. Our code scales well with the size of the simulating system on NVIDIA Tesla M40 and P100 GPUs. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. The source code can be accessed through the HOOMD-blue web page for free by any interested user.
Energy-free machine learning force field for aluminum.

PubMed

Kruglov, Ivan; Sergeev, Oleg; Yanilkin, Alexey; Oganov, Artem R

2017-08-17

We used the machine learning technique of Li et al. (PRL 114, 2015) for molecular dynamics simulations. Atomic configurations were described by feature matrix based on internal vectors, and linear regression was used as a learning technique. We implemented this approach in the LAMMPS code. The method was applied to crystalline and liquid aluminum and uranium at different temperatures and densities, and showed the highest accuracy among different published potentials. Phonon density of states, entropy and melting temperature of aluminum were calculated using this machine learning potential. The results are in excellent agreement with experimental data and results of full ab initio calculations.
Physics of Shock Compression and Release: NEMD Simulations of Tantalum and Silicon

NASA Astrophysics Data System (ADS)

Hahn, Eric; Meyers, Marc; Zhao, Shiteng; Remington, Bruce; Bringa, Eduardo; Germann, Tim; Ravelo, Ramon; Hammerberg, James

2015-06-01

Shock compression and release allow us to evaluate physical deformation and damage mechanisms occurring in extreme environments. SPaSM and LAMMPS molecular dynamics codes were employed to simulate single and polycrystalline tantalum and silicon at strain rates above 108 s-1. Visualization and analysis was accomplished using OVITO, Crystal Analysis Tool, and a redesigned orientation imaging function implemented into SPaSM. A comparison between interatomic potentials for both Si and Ta (as pertaining to shock conditions) is conducted and the influence on phase transformation and plastic relaxation is discussed. Partial dislocations, shear induced disordering, and metastable phase changes are observed in compressed silicon. For tantalum, the role of grain boundary and twin intersections are evaluated for their role in ductile spallation. Finally, the temperature dependent response of both Ta and Si is investigated.

Accurate and general treatment of electrostatic interaction in Hamiltonian adaptive resolution simulations

NASA Astrophysics Data System (ADS)

Heidari, M.; Cortes-Huerto, R.; Donadio, D.; Potestio, R.

2016-10-01

In adaptive resolution simulations the same system is concurrently modeled with different resolution in different subdomains of the simulation box, thereby enabling an accurate description in a small but relevant region, while the rest is treated with a computationally parsimonious model. In this framework, electrostatic interaction, whose accurate treatment is a crucial aspect in the realistic modeling of soft matter and biological systems, represents a particularly acute problem due to the intrinsic long-range nature of Coulomb potential. In the present work we propose and validate the usage of a short-range modification of Coulomb potential, the Damped shifted force (DSF) model, in the context of the Hamiltonian adaptive resolution simulation (H-AdResS) scheme. This approach, which is here validated on bulk water, ensures a reliable reproduction of the structural and dynamical properties of the liquid, and enables a seamless embedding in the H-AdResS framework. The resulting dual-resolution setup is implemented in the LAMMPS simulation package, and its customized version employed in the present work is made publicly available.
Sensitivity of Force Fields on Mechanical Properties of Metals Predicted by Atomistic Simulations

NASA Astrophysics Data System (ADS)

Rassoulinejad-Mousavi, Seyed Moein; Zhang, Yuwen

Increasing number of micro/nanoscale studies for scientific and engineering applications, leads to huge deployment of atomistic simulations such as molecular dynamics and Monte-Carlo simulation. Many complains from users in the simulation community arises for obtaining wrong results notwithstanding of correct simulation procedure and conditions. Improper choice of force field, known as interatomic potential is the likely causes. For the sake of users' assurance, convenience and time saving, several interatomic potentials are evaluated by molecular dynamics. Elastic properties of multiple FCC and BCC pure metallic species are obtained by LAMMPS, using different interatomic potentials designed for pure species and their alloys at different temperatures. The potentials created based on the Embedded Atom Method (EAM), Modified EAM (MEAM) and ReaX force fields, adopted from available open databases. Independent elastic stiffness constants of cubic single crystals for different metals are obtained. The results are compared with the experimental ones available in the literature and deviations for each force field are provided at each temperature. Using current work, users of these force fields can easily judge on the one they are going to designate for their problem.
Employing multi-GPU power for molecular dynamics simulation: an extension of GALAMOST

NASA Astrophysics Data System (ADS)

Zhu, You-Liang; Pan, Deng; Li, Zhan-Wei; Liu, Hong; Qian, Hu-Jun; Zhao, Yang; Lu, Zhong-Yuan; Sun, Zhao-Yan

2018-04-01

We describe the algorithm of employing multi-GPU power on the basis of Message Passing Interface (MPI) domain decomposition in a molecular dynamics code, GALAMOST, which is designed for the coarse-grained simulation of soft matters. The code of multi-GPU version is developed based on our previous single-GPU version. In multi-GPU runs, one GPU takes charge of one domain and runs single-GPU code path. The communication between neighbouring domains takes a similar algorithm of CPU-based code of LAMMPS, but is optimised specifically for GPUs. We employ a memory-saving design which can enlarge maximum system size at the same device condition. An optimisation algorithm is employed to prolong the update period of neighbour list. We demonstrate good performance of multi-GPU runs on the simulation of Lennard-Jones liquid, dissipative particle dynamics liquid, polymer and nanoparticle composite, and two-patch particles on workstation. A good scaling of many nodes on cluster for two-patch particles is presented.
Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

DOE PAGES

Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

2015-04-08

The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
OpenKIM - Building a Knowledgebase of Interatomic Models

NASA Astrophysics Data System (ADS)

Bierbaum, Matthew; Tadmor, Ellad; Elliott, Ryan; Wennblom, Trevor; Alemi, Alexander; Chen, Yan-Jiun; Karls, Daniel; Ludvik, Adam; Sethna, James

2014-03-01

The Knowledgebase of Interatomic Models (KIM) is an effort by the computational materials community to provide a standard interface for the development, characterization, and use of interatomic potentials. The KIM project has developed an API between simulation codes and interatomic models written in several different languages including C, Fortran, and Python. This interface is already supported in popular simulation environments such as LAMMPS and ASE, giving quick access to over a hundred compatible potentials that have been contributed so far. To compare and characterize models, we have developed a computational processing pipeline which automatically runs a series of tests for each model in the system, such as phonon dispersion relations and elastic constant calculations. To view the data from these tests, we created a rich set of interactive visualization tools located online. Finally, we created a Web repository to store and share these potentials, tests, and visualizations which can be found at https://openkim.org along with futher information.
Parametrization of Stillinger-Weber potential based on valence force field model: application to single-layer MoS2 and black phosphorus

NASA Astrophysics Data System (ADS)

Jiang, Jin-Wu

2015-08-01

We propose parametrizing the Stillinger-Weber potential for covalent materials starting from the valence force-field model. All geometrical parameters in the Stillinger-Weber potential are determined analytically according to the equilibrium condition for each individual potential term, while the energy parameters are derived from the valence force-field model. This parametrization approach transfers the accuracy of the valence force field model to the Stillinger-Weber potential. Furthermore, the resulting Stilliinger-Weber potential supports stable molecular dynamics simulations, as each potential term is at an energy-minimum state separately at the equilibrium configuration. We employ this procedure to parametrize Stillinger-Weber potentials for single-layer MoS2 and black phosphorous. The obtained Stillinger-Weber potentials predict an accurate phonon spectrum and mechanical behaviors. We also provide input scripts of these Stillinger-Weber potentials used by publicly available simulation packages including GULP and LAMMPS.
Parametrization of Stillinger-Weber potential based on valence force field model: application to single-layer MoS2 and black phosphorus.

PubMed

Jiang, Jin-Wu

2015-08-07

We propose parametrizing the Stillinger-Weber potential for covalent materials starting from the valence force-field model. All geometrical parameters in the Stillinger-Weber potential are determined analytically according to the equilibrium condition for each individual potential term, while the energy parameters are derived from the valence force-field model. This parametrization approach transfers the accuracy of the valence force field model to the Stillinger-Weber potential. Furthermore, the resulting Stilliinger-Weber potential supports stable molecular dynamics simulations, as each potential term is at an energy-minimum state separately at the equilibrium configuration. We employ this procedure to parametrize Stillinger-Weber potentials for single-layer MoS2 and black phosphorous. The obtained Stillinger-Weber potentials predict an accurate phonon spectrum and mechanical behaviors. We also provide input scripts of these Stillinger-Weber potentials used by publicly available simulation packages including GULP and LAMMPS.
Strong scaling of general-purpose molecular dynamics simulations on GPUs

NASA Astrophysics Data System (ADS)

Glaser, Jens; Nguyen, Trung Dac; Anderson, Joshua A.; Lui, Pak; Spiga, Filippo; Millan, Jaime A.; Morse, David C.; Glotzer, Sharon C.

2015-07-01

We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, 2013). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5 ×.
Implementing Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle Particle-Mesh

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue: A GPU-accelerated molecular dynamics software

DOE PAGES

Yang, Lin; Zhang, Feng; Wang, Cai-Zhuang; ...

2018-01-12

We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. We first discuss the details of our implementation and then report extensive benchmark tests. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu 64.5Zr 35.5, and pair correlation function of liquid Ni 3Al. Our code scales well with the size of the simulating systemmore » on NVIDIA Tesla M40 and P100 GPUs. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. In conclusion, the source code can be accessed through the HOOMD-blue web page for free by any interested user.« less
Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue: A GPU-accelerated molecular dynamics software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Lin; Zhang, Feng; Wang, Cai-Zhuang

We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. We first discuss the details of our implementation and then report extensive benchmark tests. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu 64.5Zr 35.5, and pair correlation function of liquid Ni 3Al. Our code scales well with the size of the simulating systemmore » on NVIDIA Tesla M40 and P100 GPUs. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. In conclusion, the source code can be accessed through the HOOMD-blue web page for free by any interested user.« less
Large-Scale Reactive Atomistic Simulation of Shock-induced Initiation Processes in Energetic Materials

NASA Astrophysics Data System (ADS)

Thompson, Aidan

2013-06-01

Initiation in energetic materials is fundamentally dependent on the interaction between a host of complex chemical and mechanical processes, occurring on scales ranging from intramolecular vibrations through molecular crystal plasticity up to hydrodynamic phenomena at the mesoscale. A variety of methods (e.g. quantum electronic structure methods (QM), non-reactive classical molecular dynamics (MD), mesoscopic continuum mechanics) exist to study processes occurring on each of these scales in isolation, but cannot describe how these processes interact with each other. In contrast, the ReaxFF reactive force field, implemented in the LAMMPS parallel MD code, allows us to routinely perform multimillion-atom reactive MD simulations of shock-induced initiation in a variety of energetic materials. This is done either by explicitly driving a shock-wave through the structure (NEMD) or by imposing thermodynamic constraints on the collective dynamics of the simulation cell e.g. using the Multiscale Shock Technique (MSST). These MD simulations allow us to directly observe how energy is transferred from the shockwave into other processes, including intramolecular vibrational modes, plastic deformation of the crystal, and hydrodynamic jetting at interfaces. These processes in turn cause thermal excitation of chemical bonds leading to initial chemical reactions, and ultimately to exothermic formation of product species. Results will be presented on the application of this approach to several important energetic materials, including pentaerythritol tetranitrate (PETN) and ammonium nitrate/fuel oil (ANFO). In both cases, we validate the ReaxFF parameterizations against QM and experimental data. For PETN, we observe initiation occurring via different chemical pathways, depending on the shock direction. For PETN containing spherical voids, we observe enhanced sensitivity due to jetting, void collapse, and hotspot formation, with sensitivity increasing with void size. For ANFO, we examine the effect of reaction rates on shock direction, fuel oil fraction, and crystal/fuel oil/void microstructural arrangement. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Dept. of Energy's National Nuclear Security Admin. under contract DEAC0494AL85000.
GneimoSim: A Modular Internal Coordinates Molecular Dynamics Simulation Package

PubMed Central

Larsen, Adrien B.; Wagner, Jeffrey R.; Kandel, Saugat; Salomon-Ferrer, Romelia; Vaidehi, Nagarajan; Jain, Abhinandan

2014-01-01

The Generalized Newton Euler Inverse Mass Operator (GNEIMO) method is an advanced method for internal coordinates molecular dynamics (ICMD). GNEIMO includes several theoretical and algorithmic advancements that address longstanding challenges with ICMD simulations. In this paper we describe the GneimoSim ICMD software package that implements the GNEIMO method. We believe that GneimoSim is the first software package to include advanced features such as the equipartition principle derived for internal coordinates, and a method for including the Fixman potential to eliminate systematic statistical biases introduced by the use of hard constraints. Moreover, by design, GneimoSim is extensible and can be easily interfaced with third party force field packages for ICMD simulations. Currently, GneimoSim includes interfaces to LAMMPS, OpenMM, Rosetta force field calculation packages. The availability of a comprehensive Python interface to the underlying C++ classes and their methods provides a powerful and versatile mechanism for users to develop simulation scripts to configure the simulation and control the simulation flow. GneimoSim has been used extensively for studying the dynamics of protein structures, refinement of protein homology models, and for simulating large scale protein conformational changes with enhanced sampling methods. GneimoSim is not limited to proteins and can also be used for the simulation of polymeric materials. PMID:25263538
GneimoSim: a modular internal coordinates molecular dynamics simulation package.

PubMed

Larsen, Adrien B; Wagner, Jeffrey R; Kandel, Saugat; Salomon-Ferrer, Romelia; Vaidehi, Nagarajan; Jain, Abhinandan

2014-12-05

The generalized Newton-Euler inverse mass operator (GNEIMO) method is an advanced method for internal coordinates molecular dynamics (ICMD). GNEIMO includes several theoretical and algorithmic advancements that address longstanding challenges with ICMD simulations. In this article, we describe the GneimoSim ICMD software package that implements the GNEIMO method. We believe that GneimoSim is the first software package to include advanced features such as the equipartition principle derived for internal coordinates, and a method for including the Fixman potential to eliminate systematic statistical biases introduced by the use of hard constraints. Moreover, by design, GneimoSim is extensible and can be easily interfaced with third party force field packages for ICMD simulations. Currently, GneimoSim includes interfaces to LAMMPS, OpenMM, and Rosetta force field calculation packages. The availability of a comprehensive Python interface to the underlying C++ classes and their methods provides a powerful and versatile mechanism for users to develop simulation scripts to configure the simulation and control the simulation flow. GneimoSim has been used extensively for studying the dynamics of protein structures, refinement of protein homology models, and for simulating large scale protein conformational changes with enhanced sampling methods. GneimoSim is not limited to proteins and can also be used for the simulation of polymeric materials. © 2014 Wiley Periodicals, Inc.
PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kylasa, S.B., E-mail: skylasa@purdue.edu; Aktulga, H.M., E-mail: hmaktulga@lbl.gov; Grama, A.Y., E-mail: ayg@cs.purdue.edu

2014-09-01

We present an efficient and highly accurate GP-GPU implementation of our community code, PuReMD, for reactive molecular dynamics simulations using the ReaxFF force field. PuReMD and its incorporation into LAMMPS (Reax/C) is used by a large number of research groups worldwide for simulating diverse systems ranging from biomembranes to explosives (RDX) at atomistic level of detail. The sub-femtosecond time-steps associated with ReaxFF strongly motivate significant improvements to per-timestep simulation time through effective use of GPUs. This paper presents, in detail, the design and implementation of PuReMD-GPU, which enables ReaxFF simulations on GPUs, as well as various performance optimization techniques wemore » developed to obtain high performance on state-of-the-art hardware. Comprehensive experiments on model systems (bulk water and amorphous silica) are presented to quantify the performance improvements achieved by PuReMD-GPU and to verify its accuracy. In particular, our experiments show up to 16× improvement in runtime compared to our highly optimized CPU-only single-core ReaxFF implementation. PuReMD-GPU is a unique production code, and is currently available on request from the authors.« less
Extended asymmetric hot region formation due to shockwave interactions following void collapse in shocked high explosive

NASA Astrophysics Data System (ADS)

Shan, Tzu-Ray; Wixom, Ryan R.; Thompson, Aidan P.

2016-08-01

In both continuum hydrodynamics simulations and also multimillion atom reactive molecular dynamics simulations of shockwave propagation in single crystal pentaerythritol tetranitrate (PETN) containing a cylindrical void, we observed the formation of an initial radially symmetric hot spot. By extending the simulation time to the nanosecond scale, however, we observed the transformation of the small symmetric hot spot into a longitudinally asymmetric hot region extending over a much larger volume. Performing reactive molecular dynamics shock simulations using the reactive force field (ReaxFF) as implemented in the LAMMPS molecular dynamics package, we showed that the longitudinally asymmetric hot region was formed by coalescence of the primary radially symmetric hot spot with a secondary triangular hot zone. We showed that the triangular hot zone coincided with a double-shocked region where the primary planar shockwave was overtaken by a secondary cylindrical shockwave. The secondary cylindrical shockwave originated in void collapse after the primary planar shockwave had passed over the void. A similar phenomenon was observed in continuum hydrodynamics shock simulations using the CTH hydrodynamics package. The formation and growth of extended asymmetric hot regions on nanosecond timescales has important implications for shock initiation thresholds in energetic materials.
Simulation of Initiation in Hexanitrostilbene

NASA Astrophysics Data System (ADS)

Thompson, Aidan; Shan, Tzu-Ray; Yarrington, Cole; Wixom, Ryan

We report on the effect of isolated voids and pairs of nearby voids on hot spot formation, growth and chemical reaction initiation in hexanitrostilbene (HNS) crystals subjected to shock loading. Large-scale, reactive molecular dynamics simulations are performed using the reactive force field (ReaxFF) as implemented in the LAMMPS software. The ReaxFF force field description for HNS has been validated previously by comparing the isothermal equation of state to available diamond anvil cell (DAC) measurements and density function theory (DFT) calculations. Micron-scale molecular dynamics simulations of a supported shockwave propagating in HNS crystal along the [010] orientation are performed (up = 1.25 km/s, Us =4.0 km/s, P = 11GPa.) We compare the effect on hot spot formation and growth rate of isolated cylindrical voids up to 0.1 µm in size with that of two 50nm voids set 100nm apart. Results from the micron-scale atomistic simulations are compared with hydrodynamics simulations. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lock- heed Martin Corporation, for the U.S. DOE National Nuclear Security Administration under Contract DE-AC04-94AL85000.
Dynamics and Solubility of He and CO 2 in Brine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ho, Tuan Anh; Tenney, Craig M.

2016-09-01

Molecular dynamics simulation was implemented using LAMMPS simulation package (1) to study the diffusivity of He 3 and CO 2 in NaCl aqueous solution. To simulate at infinite dilute gas concentration, we placed one He 3 or CO 2 molecule in an initial simulation box of 24x24x33Å 3 containing 512 water molecules and a certain number of NaCl molecules depending on the concentration. Initial configuration was set up by placing water, NaCl, and gas molecules into different regions in the simulation box. Calculating diffusion coefficient for one He or CO 2 molecule consistently yields poor results. To overcome this, formore » each simulation at specific conditions (i.e., temperature, pressure, and NaCl concentration), we conducted 50 simulations initiated from 50 different configurations. These configurations are obtained by performing the simulation starting from the initial configuration mentioned above in the NVE ensemble (i.e., constant number of particles, volume, and energy). for 100,000 time steps and collecting one configuration every 2,000 times step. The output temperature of this simulation is about 500K. The collected configurations were then equilibrated for 2ns in the NPT ensemble (i.e., constant number of particles, pressure, and temperature) followed by 9ns simulations in the NVT ensemble (i.e., constant number of particles, volume, and temperature). The time step is 1fs for all simulations.« less
Molecular Simulation of the Free Energy for the Accurate Determination of Phase Transition Properties of Molecular Solids

NASA Astrophysics Data System (ADS)

Sellers, Michael; Lisal, Martin; Brennan, John

2015-06-01

Investigating the ability of a molecular model to accurately represent a real material is crucial to model development and use. When the model simulates materials in extreme conditions, one such property worth evaluating is the phase transition point. However, phase transitions are often overlooked or approximated because of difficulty or inaccuracy when simulating them. Techniques such as super-heating or super-squeezing a material to induce a phase change suffer from inherent timescale limitations leading to ``over-driving,'' and dual-phase simulations require many long-time runs to seek out what frequently results in an inexact location of phase-coexistence. We present a compilation of methods for the determination of solid-solid and solid-liquid phase transition points through the accurate calculation of the chemical potential. The methods are applied to the Smith-Bharadwaj atomistic potential's representation of cyclotrimethylene trinitramine (RDX) to accurately determine its melting point (Tm) and the alpha to gamma solid phase transition pressure. We also determine Tm for a coarse-grain model of RDX, and compare its value to experiment and atomistic counterpart. All methods are employed via the LAMMPS simulator, resulting in 60-70 simulations that total 30-50 ns. Approved for public release. Distribution is unlimited.
Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset.

PubMed

Shirts, Michael R; Klein, Christoph; Swails, Jason M; Yin, Jian; Gilson, Michael K; Mobley, David L; Case, David A; Zhong, Ellen D

2017-01-01

We describe our efforts to prepare common starting structures and models for the SAMPL5 blind prediction challenge. We generated the starting input files and single configuration potential energies for the host-guest in the SAMPL5 blind prediction challenge for the GROMACS, AMBER, LAMMPS, DESMOND and CHARMM molecular simulation programs. All conversions were fully automated from the originally prepared AMBER input files using a combination of the ParmEd and InterMol conversion programs. We find that the energy calculations for all molecular dynamics engines for this molecular set agree to better than 0.1 % relative absolute energy for all energy components, and in most cases an order of magnitude better, when reasonable choices are made for different cutoff parameters. However, there are some surprising sources of statistically significant differences. Most importantly, different choices of Coulomb's constant between programs are one of the largest sources of discrepancies in energies. We discuss the measures required to get good agreement in the energies for equivalent starting configurations between the simulation programs, and the energy differences that occur when simulations are run with program-specific default simulation parameter values. Finally, we discuss what was required to automate this conversion and comparison.

Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset

PubMed Central

Shirts, Michael R.; Klein, Christoph; Swails, Jason M.; Yin, Jian; Gilson, Michael K.; Mobley, David L.; Case, David A.; Zhong, Ellen D.

2017-01-01

We describe our efforts to prepare common starting structures and models for the SAMPL5 blind prediction challenge. We generated the starting input files and single configuration potential energies for the host-guest in the SAMPL5 blind prediction challenge for the GROMACS, AMBER, LAMMPS, DESMOND and CHARMM molecular simulation programs. All conversions were fully automated from the originally prepared AMBER input files using a combination of the ParmEd and InterMol conversion programs. We find that the energy calculations for all molecular dynamics engines for this molecular set agree to a better than 0.1% relative absolute energy for all energy components, and in most cases an order of magnitude better, when reasonable choices are made for different cutoff parameters. However, there are some surprising sources of statistically significant differences. Most importantly, different choices of Coulomb’s constant between programs are one of the largest sources of discrepancies in energies. We discuss the measures required to get good agreement in the energies for equivalent starting configurations between the simulation programs, and the energy differences that occur when simulations are run with program-specific default simulation parameter values. Finally, we discuss what was required to automate this conversion and comparison. PMID:27787702
Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset

NASA Astrophysics Data System (ADS)

Shirts, Michael R.; Klein, Christoph; Swails, Jason M.; Yin, Jian; Gilson, Michael K.; Mobley, David L.; Case, David A.; Zhong, Ellen D.

2017-01-01

We describe our efforts to prepare common starting structures and models for the SAMPL5 blind prediction challenge. We generated the starting input files and single configuration potential energies for the host-guest in the SAMPL5 blind prediction challenge for the GROMACS, AMBER, LAMMPS, DESMOND and CHARMM molecular simulation programs. All conversions were fully automated from the originally prepared AMBER input files using a combination of the ParmEd and InterMol conversion programs. We find that the energy calculations for all molecular dynamics engines for this molecular set agree to better than 0.1 % relative absolute energy for all energy components, and in most cases an order of magnitude better, when reasonable choices are made for different cutoff parameters. However, there are some surprising sources of statistically significant differences. Most importantly, different choices of Coulomb's constant between programs are one of the largest sources of discrepancies in energies. We discuss the measures required to get good agreement in the energies for equivalent starting configurations between the simulation programs, and the energy differences that occur when simulations are run with program-specific default simulation parameter values. Finally, we discuss what was required to automate this conversion and comparison.
Self-assembly of polyelectrolyte surfactant complexes using large scale MD simulation

NASA Astrophysics Data System (ADS)

Goswami, Monojoy; Sumpter, Bobby

2014-03-01

Polyelectrolytes (PE) and surfactants are known to form interesting structures with varied properties in aqueous solutions. The morphological details of the PE-surfactant complexes depend on a combination of polymer backbone, electrostatic interactions and hydrophobic interactions. We study the self-assembly of cationic PE and anionic surfactants complexes in dilute condition. The importance of such complexes of PE with oppositely charged surfactants can be found in biological systems, such as immobilization of enzymes in polyelectrolyte complexes or nonspecific association of DNA with protein. Many useful properties of PE surfactant complexes come from the highly ordered structures of surfactant self-assembly inside the PE aggregate which has applications in industry. We do large scale molecular dynamics simulation using LAMMPS to understand the structure and dynamics of PE-surfactant systems. Our investigation shows highly ordered pearl-necklace structures that have been observed experimentally in biological systems. We investigate many different properties of PE-surfactant complexation for different parameter ranges that are useful for pharmaceutical, engineering and biological applications.
Thermal Conductivity of Twisted Bilayer Graphene Nanoribbons from Non-equilibrium Molecular Dynamics Study.

NASA Astrophysics Data System (ADS)

Li, Chenyang; Su, Shanshan; Ge, Supeng; Lake, Roger

Misorientation of the two layers of bilayer graphene affects both the electronic properties and the vibrational modes or phonons. The phonon density of modes is little affected by misorientation, however, zone-folding can allow new Umklapp scattering processes that could affect the phonon transport and thermal conductivity. To investigate this, we use NEMD molecular dynamics simulations as implemented in LAMMPS to study the thermal conductivity of the misoriented graphene bilayers. Seven commensurate misorientation angles varying from 6.01º to 48.36º have modeled and analyzed to understand how the misorientation angle affects the thermal conductivity of relatively wide ( 10 nm) misoriented bilayer graphene nanoribbons (m-BLGNRs). Within numerical accuracy, we find that the thermal conductivity of the m-BLGNRs for all of the simulated commensurate angles have the same thermal conductivity with AB stacked and AA stacked BLGNRs. These results indicate that neither the misorientation angle nor the stacking order affect the thermal conductivity of BLGNRs. This work was supported as part by the NSF #1307671.
Extended asymmetric hot region formation due to shockwave interactions following void collapse in shocked high explosive

DOE PAGES

Shan, Tzu -Ray; Wixom, Ryan R.; Thompson, Aidan P.

2016-08-01

In both continuum hydrodynamics simulations and also multimillion atom reactive molecular dynamics simulations of shockwave propagation in single crystal pentaerythritol tetranitrate (PETN) containing a cylindrical void, we observed the formation of an initial radially symmetric hot spot. By extending the simulation time to the nanosecond scale, however, we observed the transformation of the small symmetric hot spot into a longitudinally asymmetric hot region extending over a much larger volume. Performing reactive molecular dynamics shock simulations using the reactive force field (ReaxFF) as implemented in the LAMMPS molecular dynamics package, we showed that the longitudinally asymmetric hot region was formed bymore » coalescence of the primary radially symmetric hot spot with a secondary triangular hot zone. We showed that the triangular hot zone coincided with a double-shocked region where the primary planar shockwave was overtaken by a secondary cylindrical shockwave. The secondary cylindrical shockwave originated in void collapse after the primary planar shockwave had passed over the void. A similar phenomenon was observed in continuum hydrodynamics shock simulations using the CTH hydrodynamics package. Furthermore, the formation and growth of extended asymmetric hot regions on nanosecond timescales has important implications for shock initiation thresholds in energetic materials.« less
Extended asymmetric hot region formation due to shockwave interactions following void collapse in shocked high explosive

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Tzu -Ray; Wixom, Ryan R.; Thompson, Aidan P.

In both continuum hydrodynamics simulations and also multimillion atom reactive molecular dynamics simulations of shockwave propagation in single crystal pentaerythritol tetranitrate (PETN) containing a cylindrical void, we observed the formation of an initial radially symmetric hot spot. By extending the simulation time to the nanosecond scale, however, we observed the transformation of the small symmetric hot spot into a longitudinally asymmetric hot region extending over a much larger volume. Performing reactive molecular dynamics shock simulations using the reactive force field (ReaxFF) as implemented in the LAMMPS molecular dynamics package, we showed that the longitudinally asymmetric hot region was formed bymore » coalescence of the primary radially symmetric hot spot with a secondary triangular hot zone. We showed that the triangular hot zone coincided with a double-shocked region where the primary planar shockwave was overtaken by a secondary cylindrical shockwave. The secondary cylindrical shockwave originated in void collapse after the primary planar shockwave had passed over the void. A similar phenomenon was observed in continuum hydrodynamics shock simulations using the CTH hydrodynamics package. Furthermore, the formation and growth of extended asymmetric hot regions on nanosecond timescales has important implications for shock initiation thresholds in energetic materials.« less
Free-energy calculations using classical molecular simulation: application to the determination of the melting point and chemical potential of a flexible RDX model.

PubMed

Sellers, Michael S; Lísal, Martin; Brennan, John K

2016-03-21

We present an extension of various free-energy methodologies to determine the chemical potential of the solid and liquid phases of a fully-flexible molecule using classical simulation. The methods are applied to the Smith-Bharadwaj atomistic potential representation of cyclotrimethylene trinitramine (RDX), a well-studied energetic material, to accurately determine the solid and liquid phase Gibbs free energies, and the melting point (Tm). We outline an efficient technique to find the absolute chemical potential and melting point of a fully-flexible molecule using one set of simulations to compute the solid absolute chemical potential and one set of simulations to compute the solid-liquid free energy difference. With this combination, only a handful of simulations are needed, whereby the absolute quantities of the chemical potentials are obtained, for use in other property calculations, such as the characterization of crystal polymorphs or the determination of the entropy. Using the LAMMPS molecular simulator, the Frenkel and Ladd and pseudo-supercritical path techniques are adapted to generate 3rd order fits of the solid and liquid chemical potentials. Results yield the thermodynamic melting point Tm = 488.75 K at 1.0 atm. We also validate these calculations and compare this melting point to one obtained from a typical superheated simulation technique.
Simulation and experimental study of rheological properties of CeO2-water nanofluid

NASA Astrophysics Data System (ADS)

Loya, Adil; Stair, Jacqueline L.; Ren, Guogang

2015-10-01

Metal oxide nanoparticles offer great merits over controlling rheological, thermal, chemical and physical properties of solutions. The effectiveness of a nanoparticle to modify the properties of a fluid depends on its diffusive properties with respect to the fluid. In this study, rheological properties of aqueous fluids (i.e. water) were enhanced with the addition of CeO2 nanoparticles. This study was characterized by the outcomes of simulation and experimental results of nanofluids. The movement of nanoparticles in the fluidic media was simulated by a large-scale molecular thermal dynamic program (i.e. LAMMPS). The COMPASS force field was employed with smoothed particle hydrodynamic potential (SPH) and discrete particle dynamics potential (DPD). However, this study develops the understanding of how the rheological properties are affected due to the addition of nanoparticles in a fluid and the way DPD and SPH can be used for accurately estimating the rheological properties with Brownian effect. The rheological results of the simulation were confirmed by the convergence of the stress autocorrelation function, whereas experimental properties were measured using a rheometer. These rheological values of simulation were obtained and agreed within 5 % of the experimental values; they were identified and treated with a number of iterations and experimental tests. The results of the experiment and simulation show that 10 % CeO2 nanoparticles dispersion in water has a viscosity of 2.0-3.3 mPas.
Understanding the interfacial chain dynamics of fiber-reinforced polymer composite

NASA Astrophysics Data System (ADS)

Goswami, Monojoy; Carrillo, Jan-Michael; Naskar, Amit; Sumpter, Bobby

The polymer-fiber interface plays a major role in determining the structural and dynamical properties of fiber reinforced composite materials. We utilized LAMMPS MD package to understand the interfacial properties at the nanoscale. Coarse-grained flexible polymer chains are introduced to compare the various structures and dynamics of the polymer chains. Our preliminary simulation study shows that the rigidity of the polymer chain affects the interfacial morphology and dynamics of the chain on a flat surface. In this work, we identified the `immobile inter-phase' morphology and relate it to rheological properties. We calculated the viscoelastic properties, e.g., shear modulus and storage modulus, which are compared with experiments. MD simulations are used to show the variation of viscoelastic properties with polymer volume fraction. The nanoscale segmental and chain relaxation are calculated from the MD simulations and compared to the experimental data. These observations will be able to identify the fundamental physics behind the effect of the polymer-fiber interactions and orientation of the fiber to the overall rheological properties of the fiber reinforced polymer matrix. Funding for the project was provided by ORNLs Laboratory Directed Research and Development (LDRD) program.
Molecular-dynamic simulations of the thermophysical properties of hexanitrohexaazaisowurtzitane single crystal at high pressures and temperatures

NASA Astrophysics Data System (ADS)

Kozlova, S. A.; Gubin, S. A.; Maklashova, I. V.; Selezenev, A. A.

2017-11-01

Molecular dynamic simulations of isothermal compression parameters are performed for a hexanitrohexaazaisowurtzitane single crystal (C6H6O12N12) using a modified ReaxFF-log reactive force field. It is shown that the pressure-compression ratio curve for a single C6H6O12N12 crystal at constant temperature T = 300 K in pressure range P = 0.05-40 GPa is in satisfactory agreement with experimental compression isotherms obtained for a single C6H6O12N12 crystal. Hugoniot molecular-dynamic simulations of the shock-wave hydrostatic compression of a single C6H6O12N12 crystal are performed. Along with Hugoniot temperature-pressure curves, calculated shock-wave pressure-compression ratios for a single C6H6O12N12 crystal are obtained for a wide pressure range of P = 1-40 GPa. It is established that the percussive adiabat obtained for a single C6H6O12N12 crystal is in a good agreement with the experimental data. All calculations are performed using a LAMMPS molecular dynamics simulation software package that provides a ReaxFF-lg reactive force field to support the approach.
Young's moduli of carbon materials investigated by various classical molecular dynamics schemes

NASA Astrophysics Data System (ADS)

Gayk, Florian; Ehrens, Julian; Heitmann, Tjark; Vorndamme, Patrick; Mrugalla, Andreas; Schnack, Jürgen

2018-05-01

For many applications classical carbon potentials together with classical molecular dynamics are employed to calculate structures and physical properties of such carbon-based materials where quantum mechanical methods fail either due to the excessive size, irregular structure or long-time dynamics. Although such potentials, as for instance implemented in LAMMPS, yield reasonably accurate bond lengths and angles for several carbon materials such as graphene, it is not clear how accurate they are in terms of mechanical properties such as for instance Young's moduli. We performed large-scale classical molecular dynamics investigations of three carbon-based materials using the various potentials implemented in LAMMPS as well as the EDIP potential of Marks. We show how the Young's moduli vary with classical potentials and compare to experimental results. Since classical descriptions of carbon are bound to be approximations it is not astonishing that different realizations yield differing results. One should therefore carefully check for which observables a certain potential is suited. Our aim is to contribute to such a clarification.
An analytical bond-order potential for carbon

DOE PAGES

Zhou, Xiaowang; Ward, Donald K.; Foster, Michael E.

2015-05-27

Carbon is the most widely studied material today because it exhibits special properties not seen in any other materials when in nano dimensions such as nanotube and graphene. Reduction of material defects created during synthesis has become critical to realize the full potential of carbon structures. Molecular dynamics (MD) simulations, in principle, allow defect formation mechanisms to be studied with high fidelity, and can, therefore, help guide experiments for defect reduction. Such MD simulations must satisfy a set of stringent requirements. First, they must employ an interatomic potential formalism that is transferable to a variety of carbon structures. Second, themore » potential needs to be appropriately parameterized to capture the property trends of important carbon structures, in particular, diamond, graphite, graphene, and nanotubes. The potential must predict the crystalline growth of the correct phases during direct MD simulations of synthesis to achieve a predictive simulation of defect formation. An unlimited number of structures not included in the potential parameterization are encountered, thus the literature carbon potentials are often not sufficient for growth simulations. We have developed an analytical bond order potential for carbon, and have made it available through the public MD simulation package LAMMPS. We also demonstrate that our potential reasonably captures the property trends of important carbon phases. As a result, stringent MD simulations convincingly show that our potential accounts not only for the crystalline growth of graphene, graphite, and carbon nanotubes but also for the transformation of graphite to diamond at high pressure.« less
An analytical bond-order potential for carbon.

PubMed

Zhou, X W; Ward, D K; Foster, M E

2015-09-05

Carbon is the most widely studied material today because it exhibits special properties not seen in any other materials when in nano dimensions such as nanotube and graphene. Reduction of material defects created during synthesis has become critical to realize the full potential of carbon structures. Molecular dynamics (MD) simulations, in principle, allow defect formation mechanisms to be studied with high fidelity, and can, therefore, help guide experiments for defect reduction. Such MD simulations must satisfy a set of stringent requirements. First, they must employ an interatomic potential formalism that is transferable to a variety of carbon structures. Second, the potential needs to be appropriately parameterized to capture the property trends of important carbon structures, in particular, diamond, graphite, graphene, and nanotubes. Most importantly, the potential must predict the crystalline growth of the correct phases during direct MD simulations of synthesis to achieve a predictive simulation of defect formation. Because an unlimited number of structures not included in the potential parameterization are encountered, the literature carbon potentials are often not sufficient for growth simulations. We have developed an analytical bond order potential for carbon, and have made it available through the public MD simulation package LAMMPS. We demonstrate that our potential reasonably captures the property trends of important carbon phases. Stringent MD simulations convincingly show that our potential accounts not only for the crystalline growth of graphene, graphite, and carbon nanotubes but also for the transformation of graphite to diamond at high pressure. © 2015 Wiley Periodicals, Inc.
Novel 3D/VR interactive environment for MD simulations, visualization and analysis.

PubMed

Doblack, Benjamin N; Allis, Tim; Dávila, Lilian P

2014-12-18

The increasing development of computing (hardware and software) in the last decades has impacted scientific research in many fields including materials science, biology, chemistry and physics among many others. A new computational system for the accurate and fast simulation and 3D/VR visualization of nanostructures is presented here, using the open-source molecular dynamics (MD) computer program LAMMPS. This alternative computational method uses modern graphics processors, NVIDIA CUDA technology and specialized scientific codes to overcome processing speed barriers common to traditional computing methods. In conjunction with a virtual reality system used to model materials, this enhancement allows the addition of accelerated MD simulation capability. The motivation is to provide a novel research environment which simultaneously allows visualization, simulation, modeling and analysis. The research goal is to investigate the structure and properties of inorganic nanostructures (e.g., silica glass nanosprings) under different conditions using this innovative computational system. The work presented outlines a description of the 3D/VR Visualization System and basic components, an overview of important considerations such as the physical environment, details on the setup and use of the novel system, a general procedure for the accelerated MD enhancement, technical information, and relevant remarks. The impact of this work is the creation of a unique computational system combining nanoscale materials simulation, visualization and interactivity in a virtual environment, which is both a research and teaching instrument at UC Merced.
Empirical force field-based kinetic Monte Carlo simulation of precipitate evolution and growth in Al-Cu alloys

NASA Astrophysics Data System (ADS)

Joshi, Kaushik; Chaudhuri, Santanu

2016-10-01

Ability to accelerate the morphological evolution of nanoscale precipitates is a fundamental challenge for atomistic simulations. Kinetic Monte Carlo (KMC) methodology is an effective approach for accelerating the evolution of nanoscale systems that are dominated by so-called rare events. The quality and accuracy of energy landscape used in KMC calculations can be significantly improved using DFT-informed interatomic potentials. Using newly developed computational framework that uses molecular simulator LAMMPS as a library function inside KMC solver SPPARKS, we investigated formation and growth of Guiner-Preston (GP) zones in dilute Al-Cu alloys at different temperature and copper concentrations. The KMC simulations with angular dependent potential (ADP) predict formation of coherent disc-shaped monolayers of copper atoms (GPI zones) in early stage. Such monolayers are then gradually transformed into energetically favored GPII phase that has two aluminum layers sandwiched between copper layers. We analyzed the growth kinetics of KMC trajectory using Johnson-Mehl-Avrami (JMA) theory and obtained a phase transformation index close to 1.0. In the presence of grain boundaries, the KMC calculations predict the segregation of copper atoms near the grain boundaries instead of formation of GP zones. The computational framework presented in this work is based on open source potentials and MD simulator and can predict morphological changes during the evolution of the alloys in the bulk and around grain boundaries.
Atomistic Simulation of Initiation in Hexanitrostilbene

NASA Astrophysics Data System (ADS)

Shan, Tzu-Ray; Wixom, Ryan; Yarrington, Cole; Thompson, Aidan

2015-06-01

We report on the effect of cylindrical voids on hot spot formation, growth and chemical reaction initiation in hexanitrostilbene (HNS) crystals subjected to shock. Large-scale, reactive molecular dynamics simulations are performed using the reactive force field (ReaxFF) as implemented in the LAMMPS software. The ReaxFF force field description for HNS has been validated previously by comparing the isothermal equation of state to available diamond anvil cell (DAC) measurements and density function theory (DFT) calculations and by comparing the primary dissociation pathway to ab initio calculations. Micron-scale molecular dynamics simulations of a supported shockwave propagating through the HNS crystal along the [010] orientation are performed with an impact velocity (or particle velocity) of 1.25 km/s, resulting in shockwave propagation at 4.0 km/s in the bulk material and a bulk shock pressure of ~ 11GPa. The effect of cylindrical void sizes varying from 0.02 to 0.1 μm on hot spot formation and growth rate has been studied. Interaction between multiple voids in the HNS crystal and its effect on hot spot formation will also be addressed. Results from the micron-scale atomistic simulations are compared with hydrodynamics simulations. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. DOE National Nuclear Security Administration under Contract DE-AC04-94AL85000.
Novel 3D/VR Interactive Environment for MD Simulations, Visualization and Analysis

PubMed Central

Doblack, Benjamin N.; Allis, Tim; Dávila, Lilian P.

2014-01-01

The increasing development of computing (hardware and software) in the last decades has impacted scientific research in many fields including materials science, biology, chemistry and physics among many others. A new computational system for the accurate and fast simulation and 3D/VR visualization of nanostructures is presented here, using the open-source molecular dynamics (MD) computer program LAMMPS. This alternative computational method uses modern graphics processors, NVIDIA CUDA technology and specialized scientific codes to overcome processing speed barriers common to traditional computing methods. In conjunction with a virtual reality system used to model materials, this enhancement allows the addition of accelerated MD simulation capability. The motivation is to provide a novel research environment which simultaneously allows visualization, simulation, modeling and analysis. The research goal is to investigate the structure and properties of inorganic nanostructures (e.g., silica glass nanosprings) under different conditions using this innovative computational system. The work presented outlines a description of the 3D/VR Visualization System and basic components, an overview of important considerations such as the physical environment, details on the setup and use of the novel system, a general procedure for the accelerated MD enhancement, technical information, and relevant remarks. The impact of this work is the creation of a unique computational system combining nanoscale materials simulation, visualization and interactivity in a virtual environment, which is both a research and teaching instrument at UC Merced. PMID:25549300
Generalized ensemble method applied to study systems with strong first order transitions

DOE PAGES

Malolepsza, E.; Kim, J.; Keyes, T.

2015-09-28

At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub, where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM).more » This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. Lastly, the method is illustrated in a study of the very strong solid/liquid transition in water.« less
A bond-order potential for the Al–Cu–H ternary system

DOE PAGES

Zhou, X. W.; Ward, D. K.; Foster, M. E.

2018-02-27

Al-Based Al–Cu alloys have a very high strength to density ratio, and are therefore important materials for transportation systems including vehicles and aircrafts. These alloys also appear to have a high resistance to hydrogen embrittlement, and as a result, are being explored for hydrogen related applications. To enable fundamental studies of mechanical behavior of Al–Cu alloys under hydrogen environments, we have developed an Al–Cu–H bond-order potential according to the formalism implemented in the molecular dynamics code LAMMPS. Our potential not only fits well to properties of a variety of elemental and compound configurations (with coordination varying from 1 to 12)more » including small clusters, bulk lattices, defects, and surfaces, but also passes stringent molecular dynamics simulation tests that sample chaotic configurations. Careful studies verified that this Al–Cu–H potential predicts structural property trends close to experimental results and quantum-mechanical calculations; in addition, it properly captures Al–Cu, Al–H, and Cu–H phase diagrams and enables simulations of H 2 dissociation, chemisorption, and absorption on Al–Cu surfaces.« less
A bond-order potential for the Al–Cu–H ternary system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, X. W.; Ward, D. K.; Foster, M. E.

Al-Based Al–Cu alloys have a very high strength to density ratio, and are therefore important materials for transportation systems including vehicles and aircrafts. These alloys also appear to have a high resistance to hydrogen embrittlement, and as a result, are being explored for hydrogen related applications. To enable fundamental studies of mechanical behavior of Al–Cu alloys under hydrogen environments, we have developed an Al–Cu–H bond-order potential according to the formalism implemented in the molecular dynamics code LAMMPS. Our potential not only fits well to properties of a variety of elemental and compound configurations (with coordination varying from 1 to 12)more » including small clusters, bulk lattices, defects, and surfaces, but also passes stringent molecular dynamics simulation tests that sample chaotic configurations. Careful studies verified that this Al–Cu–H potential predicts structural property trends close to experimental results and quantum-mechanical calculations; in addition, it properly captures Al–Cu, Al–H, and Cu–H phase diagrams and enables simulations of H 2 dissociation, chemisorption, and absorption on Al–Cu surfaces.« less

Generalized ensemble method applied to study systems with strong first order transitions

NASA Astrophysics Data System (ADS)

Małolepsza, E.; Kim, J.; Keyes, T.

2015-09-01

At strong first-order phase transitions, the entropy versus energy or, at constant pressure, enthalpy, exhibits convex behavior, and the statistical temperature curve correspondingly exhibits an S-loop or back-bending. In the canonical and isothermal-isobaric ensembles, with temperature as the control variable, the probability density functions become bimodal with peaks localized outside of the S-loop region. Inside, states are unstable, and as a result simulation of equilibrium phase coexistence becomes impossible. To overcome this problem, a method was proposed by Kim, Keyes and Straub [1], where optimally designed generalized ensemble sampling was combined with replica exchange, and denoted generalized replica exchange method (gREM). This new technique uses parametrized effective sampling weights that lead to a unimodal energy distribution, transforming unstable states into stable ones. In the present study, the gREM, originally developed as a Monte Carlo algorithm, was implemented to work with molecular dynamics in an isobaric ensemble and coded into LAMMPS, a highly optimized open source molecular simulation package. The method is illustrated in a study of the very strong solid/liquid transition in water.
He bubble growth and interaction in W nano-tendrils

NASA Astrophysics Data System (ADS)

Smirnov, R. D.; Krasheninnikov, S. I.

2015-11-01

Tungsten plasma-facing components (PFCs) in fusion devices are exposed to variety of extreme plasma conditions, which can lead to alteration of tungsten micro-structure and degradation of the PFCs. In particular, it is known that filamentary nano-structures called fuzz can grow on helium plasma exposed tungsten surfaces. However, mechanism of the fuzz growth is still not fully understood. Existing experimental observations indicate that formation of helium nano-bubbles in tungsten plays essential role in fuzz formation and growth. In this work we investigate mechanisms of growth and interaction of helium bubbles in fuzz-like nano-tendrils using molecular dynamics simulations with LAMMPS code. We show that growth of the bubbles has anisotropic character producing complex stress field in the nano-tendrils with distinct compression and tension regions. We found that formation of large inter-bubble tension regions can cause lateral stretching and bending of the tendrils that consequently lead to their elongation and thinning at the stretching sites. The rate of nano-tendril growth due to the described mechanism is also evaluated from the simulations.
Behavior of a nano-particle and a polymer molecule in a nano-scale four-roll mill

NASA Astrophysics Data System (ADS)

Vo, Minh; Papavassiliou, Dimitrios

2016-11-01

The four-roll mill device could be used to create a mixed flow from purely extensional stresses to completely rotational through the proper selection of speed and direction of each of the four cylindrical rollers. Considerable research has been done with this device for macroscale rheological studies.. In our study, the dissipative particle dynamics (DPD) method was employed to investigate the behavior of a nano-sphere and a polymer molecule in different conditions within a four-roll mill device. Hydrophilic properties of each roll were generated by adjusting interaction parameters and using bounce back boundary condition at the solid surface. All simulations were run up to 4x106 time steps at room temperature using the open source LAMMPS package. After the flow in the system reached equilibrium, a nano-sphere and then a polymer chain were released at the center of the simulation box. Their trajectories were recorded at different shear rate conditions. The propagation of nanosphere in different rotational flow will be discussed. Additionally, the deformation of polymer chains will be compared to that in a simple shear flow.
CHARMM-GUI 10 Years for Biomolecular Modeling and Simulation

PubMed Central

Jo, Sunhwan; Cheng, Xi; Lee, Jumin; Kim, Seonghoon; Park, Sang-Jun; Patel, Dhilon S.; Beaven, Andrew H.; Lee, Kyu Il; Rui, Huan; Roux, Benoît; MacKerell, Alexander D.; Klauda, Jeffrey B.; Qi, Yifei

2017-01-01

CHARMM-GUI, http://www.charmm-gui.org, is a web-based graphical user interface that prepares complex biomolecular systems for molecular simulations. CHARMM-GUI creates input files for a number of programs including CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM. Since its original development in 2006, CHARMM-GUI has been widely adopted for various purposes and now contains a number of different modules designed to set up a broad range of simulations: (1) PDB Reader & Manipulator, Glycan Reader, and Ligand Reader & Modeler for reading and modifying molecules; (2) Quick MD Simulator, Membrane Builder, Nanodisc Builder, HMMM Builder, Monolayer Builder, Micelle Builder, and Hex Phase Builder for building all-atom simulation systems in various environments; (3) PACE CG Builder and Martini Maker for building coarse-grained simulation systems; (4) DEER Facilitator and MDFF/xMDFF Utilizer for experimentally guided simulations; (5) Implicit Solvent Modeler, PBEQ-Solver, and GCMC/BD Ion Simulator for implicit solvent related calculations; (6) Ligand Binder for ligand solvation and binding free energy simulations; and (7) Drude Prepper for preparation of simulations with the CHARMM Drude polarizable force field. Recently, new modules have been integrated into CHARMM-GUI, such as Glycolipid Modeler for generation of various glycolipid structures, and LPS Modeler for generation of lipopolysaccharide structures from various Gram-negative bacteria. These new features together with existing modules are expected to facilitate advanced molecular modeling and simulation thereby leading to an improved understanding of the molecular details of the structure and dynamics of complex biomolecular systems. Here, we briefly review these capabilities and discuss potential future directions in the CHARMM-GUI development project. PMID:27862047
CHARMM-GUI 10 years for biomolecular modeling and simulation.

PubMed

Jo, Sunhwan; Cheng, Xi; Lee, Jumin; Kim, Seonghoon; Park, Sang-Jun; Patel, Dhilon S; Beaven, Andrew H; Lee, Kyu Il; Rui, Huan; Park, Soohyung; Lee, Hui Sun; Roux, Benoît; MacKerell, Alexander D; Klauda, Jeffrey B; Qi, Yifei; Im, Wonpil

2017-06-05

CHARMM-GUI, http://www.charmm-gui.org, is a web-based graphical user interface that prepares complex biomolecular systems for molecular simulations. CHARMM-GUI creates input files for a number of programs including CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM. Since its original development in 2006, CHARMM-GUI has been widely adopted for various purposes and now contains a number of different modules designed to set up a broad range of simulations: (1) PDB Reader & Manipulator, Glycan Reader, and Ligand Reader & Modeler for reading and modifying molecules; (2) Quick MD Simulator, Membrane Builder, Nanodisc Builder, HMMM Builder, Monolayer Builder, Micelle Builder, and Hex Phase Builder for building all-atom simulation systems in various environments; (3) PACE CG Builder and Martini Maker for building coarse-grained simulation systems; (4) DEER Facilitator and MDFF/xMDFF Utilizer for experimentally guided simulations; (5) Implicit Solvent Modeler, PBEQ-Solver, and GCMC/BD Ion Simulator for implicit solvent related calculations; (6) Ligand Binder for ligand solvation and binding free energy simulations; and (7) Drude Prepper for preparation of simulations with the CHARMM Drude polarizable force field. Recently, new modules have been integrated into CHARMM-GUI, such as Glycolipid Modeler for generation of various glycolipid structures, and LPS Modeler for generation of lipopolysaccharide structures from various Gram-negative bacteria. These new features together with existing modules are expected to facilitate advanced molecular modeling and simulation thereby leading to an improved understanding of the structure and dynamics of complex biomolecular systems. Here, we briefly review these capabilities and discuss potential future directions in the CHARMM-GUI development project. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
TopoGromacs: Automated Topology Conversion from CHARMM to GROMACS within VMD.

PubMed

Vermaas, Josh V; Hardy, David J; Stone, John E; Tajkhorshid, Emad; Kohlmeyer, Axel

2016-06-27

Molecular dynamics (MD) simulation engines use a variety of different approaches for modeling molecular systems with force fields that govern their dynamics and describe their topology. These different approaches introduce incompatibilities between engines, and previously published software bridges the gaps between many popular MD packages, such as between CHARMM and AMBER or GROMACS and LAMMPS. While there are many structure building tools available that generate topologies and structures in CHARMM format, only recently have mechanisms been developed to convert their results into GROMACS input. We present an approach to convert CHARMM-formatted topology and parameters into a format suitable for simulation with GROMACS by expanding the functionality of TopoTools, a plugin integrated within the widely used molecular visualization and analysis software VMD. The conversion process was diligently tested on a comprehensive set of biological molecules in vacuo. The resulting comparison between energy terms shows that the translation performed was lossless as the energies were unchanged for identical starting configurations. By applying the conversion process to conventional benchmark systems that mimic typical modestly sized MD systems, we explore the effect of the implementation choices made in CHARMM, NAMD, and GROMACS. The newly available automatic conversion capability breaks down barriers between simulation tools and user communities and allows users to easily compare simulation programs and leverage their unique features without the tedium of constructing a topology twice.
CHARMM-GUI ligand reader and modeler for CHARMM force field generation of small molecules.

PubMed

Kim, Seonghoon; Lee, Jumin; Jo, Sunhwan; Brooks, Charles L; Lee, Hui Sun; Im, Wonpil

2017-06-05

Reading ligand structures into any simulation program is often nontrivial and time consuming, especially when the force field parameters and/or structure files of the corresponding molecules are not available. To address this problem, we have developed Ligand Reader & Modeler in CHARMM-GUI. Users can upload ligand structure information in various forms (using PDB ID, ligand ID, SMILES, MOL/MOL2/SDF file, or PDB/mmCIF file), and the uploaded structure is displayed on a sketchpad for verification and further modification. Based on the displayed structure, Ligand Reader & Modeler generates the ligand force field parameters and necessary structure files by searching for the ligand in the CHARMM force field library or using the CHARMM general force field (CGenFF). In addition, users can define chemical substitution sites and draw substituents in each site on the sketchpad to generate a set of combinatorial structure files and corresponding force field parameters for throughput or alchemical free energy simulations. Finally, the output from Ligand Reader & Modeler can be used in other CHARMM-GUI modules to build a protein-ligand simulation system for all supported simulation programs, such as CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM. Ligand Reader & Modeler is available as a functional module of CHARMM-GUI at http://www.charmm-gui.org/input/ligandrm. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Study of silicon crystal surface formation based on molecular dynamics simulation results

NASA Astrophysics Data System (ADS)

Barinovs, G.; Sabanskis, A.; Muiznieks, A.

2014-04-01

The equilibrium shape of <110>-oriented single crystal silicon nanowire, 8 nm in cross-section, was found from molecular dynamics simulations using LAMMPS molecular dynamics package. The calculated shape agrees well to the shape predicted from experimental observations of nanocavities in silicon crystals. By parametrization of the shape and scaling to a known value of {111} surface energy, Wulff form for solid-vapor interface was obtained. The Wulff form for solid-liquid interface was constructed using the same model of the shape as for the solid-vapor interface. The parameters describing solid-liquid interface shape were found using values of surface energies in low-index directions known from published molecular dynamics simulations. Using an experimental value of the liquid-vapor interface energy for silicon and graphical solution of Herring's equation, we constructed angular diagram showing relative equilibrium orientation of solid-liquid, liquid-vapor and solid-vapor interfaces at the triple phase line. The diagram gives quantitative predictions about growth angles for different growth directions and formation of facets on the solid-liquid and solid-vapor interfaces. The diagram can be used to describe growth ridges appearing on the crystal surface grown from a melt. Qualitative comparison to the ridges of a Float zone silicon crystal cone is given.
Micron-scale Reactive Atomistic Simulation of Void Collapse and Hotspot Growth in PETN

NASA Astrophysics Data System (ADS)

Thompson, Aidan; Shan, Tzu-Ray; Wixom, Ryan

2015-06-01

Material defects and other heterogeneities such as dislocations, micro-porosity, and grain boundaries play key roles in the shock-induced initiation of detonation in energetic materials. We performed non-equilibrium molecular dynamics simulations to explore the effect of nanoscale voids on hotspot growth and initiation in micron-scale pentaerythritol tetranitrate (PETN) crystals under weak shock loading (Up = 1.25 km/s; Us = 4.5 km/s). We used the ReaxFF potential implemented in LAMMPS. We built a pseudo-2D PETN crystal with dimensions 0.3 μm × 0.22 μm × 1.3 nm containing a 20 nm cylindrical void. Once the initial shockwave traversed the entire sample, the shock-front absorbing boundary condition was applied, allowing the simulation to continue beyond 1 nanosecond. Results show an exponentially increasing hotspot growth rate. The hotspot morphology is initially symmetric about the void axis, but strong asymmetry develops at later times, due to strong coupling between exothermic chemistry, temperature, and divergent secondary shockwaves emanating from the collapsing void. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. DOE National Nuclear Security Administration under Contract DE-AC04-94AL85000.
A Numerical Modeling Framework for Cohesive Sediment Transport Driven by Waves and Tidal Currents

DTIC Science & Technology

2012-09-30

for sediment transport. The successful extension to multi-dimensions is benefited from an open-source CFD package, OpenFOAM (www.openfoam.org). This...linz.at/Drupal/), which couples the fluid solver OpenFOAM with the Discrete Element Model (DEM) solver LIGGGHTS (an improved LAMMPS for granular flow
A LAMMPS implementation of volume-temperature replica exchange molecular dynamics

NASA Astrophysics Data System (ADS)

Liu, Liang-Chun; Kuo, Jer-Lai

2015-04-01

A driver module for executing volume-temperature replica exchange molecular dynamics (VTREMD) was developed for the LAMMPS package. As a patch code, the VTREMD module performs classical molecular dynamics (MD) with Monte Carlo (MC) decisions between MD runs. The goal of inserting the MC step was to increase the breadth of sampled configurational space. In this method, states receive better sampling by making temperature or density swaps with their neighboring states. As an accelerated sampling method, VTREMD is particularly useful to explore states at low temperatures, where systems are easily trapped in local potential wells. As functional examples, TIP4P/Ew and TIP4P/2005 water models were analyzed using VTREMD. The phase diagram in this study covered the deeply supercooled regime, and this test served as a suitable demonstration of the usefulness of VTREMD in overcoming the slow dynamics problem. To facilitate using the current code, attention was also paid on how to optimize the exchange efficiency by using grid allocation. VTREMD was useful for studying systems with rough energy landscapes, such as those with numerous local minima or multiple characteristic time scales.
Atomistic modeling of thermomechanical properties of SWNT/Epoxy nanocomposites

NASA Astrophysics Data System (ADS)

Fasanella, Nicholas; Sundararaghavan, Veera

2015-09-01

Molecular dynamics simulations are performed to compute thermomechanical properties of cured epoxy resins reinforced with pristine and covalently functionalized carbon nanotubes. A DGEBA-DDS epoxy network was built using the ‘dendrimer’ growth approach where 75% of available epoxy sites were cross-linked. The epoxy model is verified through comparisons to experiments, and simulations are performed on nanotube reinforced cross-linked epoxy matrix using the CVFF force field in LAMMPS. Full stiffness matrices and linear coefficient of thermal expansion vectors are obtained for the nanocomposite. Large increases in stiffness and large decreases in thermal expansion were seen along the direction of the nanotube for both nanocomposite systems when compared to neat epoxy. The direction transverse to nanotube saw a 40% increase in stiffness due to covalent functionalization over neat epoxy at 1 K whereas the pristine nanotube system only saw a 7% increase due to van der Waals effects. The functionalized SWNT/epoxy nanocomposite showed an additional 42% decrease in thermal expansion along the nanotube direction when compared to the pristine SWNT/epoxy nanocomposite. The stiffness matrices are rotated over every possible orientation to simulate the effects of an isotropic system of randomly oriented nanotubes in the epoxy. The randomly oriented covalently functionalized SWNT/Epoxy nanocomposites showed substantial improvements over the plain epoxy in terms of higher stiffness (200% increase) and lower thermal expansion (32% reduction). Through MD simulations, we develop means to build simulation cells, perform annealing to reach correct densities, compute thermomechanical properties and compare with experiments.
A Coarse Grained Model for Methylcellulose: Spontaneous Ring Formation at Elevated Temperature

NASA Astrophysics Data System (ADS)

Huang, Wenjun; Larson, Ronald

Methylcellulose (MC) is widely used as food additives and pharma applications, where its thermo-reversible gelation behavior plays an important role. To date the gelation mechanism is not well understood, and therefore attracts great research interest. In this study, we adopted coarse-grained (CG) molecular dynamics simulations to model the MC chains, including the homopolymers and random copolymers that models commercial METHOCEL A, in an implicit water environment, where each MC monomer modeled with a single bead. The simulations are carried using a LAMMPS program. We parameterized our CG model using the radial distribution functions from atomistic simulations of short MC oligomers, extrapolating the results to long chains. We used dissociation free energy to validate our CG model against the atomistic model. The CG model captured the effects of monomer substitution type and temperature from the atomistic simulations. We applied this CG model to simulate single chains up to 1000 monomers long and obtained persistence lengths that are close to those determined from experiment. We observed the chain collapse transition for random copolymer at 600 monomers long at 50C. The chain collapsed into a stable ring structure with outer diameter around 14nm, which appears to be a precursor to the fibril structure observed in the methylcellulose gel observed by Lodge et al. in the recent studies. Our CG model can be extended to other MC derivatives for studying the interaction between these polymers and small molecules, such as hydrophobic drugs.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Molecular-dynamics simulations of crosslinking and confinement effects on structure, segmental mobility and mechanics of filled elastomers

NASA Astrophysics Data System (ADS)

Davris, Theodoros; Lyulin, Alexey V.

2016-05-01

The significant drop of the storage modulus under uniaxial deformation (Payne effect) restrains the performance of the elastomer-based composites and the development of possible new applications. In this paper molecular-dynamics (MD) computer simulations using LAMMPS MD package have been performed to study the mechanical properties of a coarse-grained model of this family of nanocomposite materials. Our goal is to provide simulational insights into the viscoelastic properties of filled elastomers, and try to connect the macroscopic mechanics with composite microstructure, the strength of the polymer-filler interactions and the polymer mobility at different scales. To this end we simulate random copolymer films capped between two infinite solid (filler aggregate) walls. We systematically vary the strength of the polymer-substrate adhesion interactions, degree of polymer confinement (film thickness), polymer crosslinking density, and study their influence on the equilibrium and non-equilibrium structure, segmental dynamics, and the mechanical properties of the simulated systems. The glass-transition temperature increases once the mesh size became smaller than the chain radius of gyration; otherwise it remained invariant to mesh-size variations. This increase in the glass-transition temperature was accompanied by a monotonic slowing-down of segmental dynamics on all studied length scales. This observation is attributed to the correspondingly decreased width of the bulk density layer that was obtained in films whose thickness was larger than the end-to-end distance of the bulk polymer chains. To test this hypothesis additional simulations were performed in which the crystalline walls were replaced with amorphous or rough walls.
Simulating Fiber Ordering and Aggregation In Shear Flow Using Dissipative Particle Dynamics

NASA Astrophysics Data System (ADS)

Stimatze, Justin T.

We have developed a mesoscale simulation of fiber aggregation in shear flow using LAMMPS and its implementation of dissipative particle dynamics. Understanding fiber aggregation in shear flow and flow-induced microstructural fiber networks is critical to our interest in high-performance composite materials. Dissipative particle dynamics enables the consideration of hydrodynamic interactions between fibers through the coarse-grained simulation of the matrix fluid. Correctly simulating hydrodynamic interactions and accounting for fluid forces on the microstructure is required to correctly model the shear-induced aggregation process. We are able to determine stresses, viscosity, and fiber forces while simulating the evolution of a model fiber system undergoing shear flow. Fiber-fiber contact interactions are approximated by combinations of common pairwise forces, allowing the exploration of interaction-influenced fiber behaviors such as aggregation and bundling. We are then able to quantify aggregate structure and effective volume fraction for a range of relevant system and fiber-fiber interaction parameters. Our simulations have demonstrated several aggregate types dependent on system parameters such as shear rate, short-range attractive forces, and a resistance to relative rotation while in contact. A resistance to relative rotation at fiber-fiber contact points has been found to strongly contribute to an increased angle between neighboring aggregated fibers and therefore an increase in average aggregate volume fraction. This increase in aggregate volume fraction is strongly correlated with a significant enhancement of system viscosity, leading us to hypothesize that controlling the resistance to relative rotation during manufacturing processes is important when optimizing for desired composite material characteristics.
MDWiZ: a platform for the automated translation of molecular dynamics simulations.

PubMed

Rusu, Victor H; Horta, Vitor A C; Horta, Bruno A C; Lins, Roberto D; Baron, Riccardo

2014-03-01

A variety of popular molecular dynamics (MD) simulation packages were independently developed in the last decades to reach diverse scientific goals. However, such non-coordinated development of software, force fields, and analysis tools for molecular simulations gave rise to an array of software formats and arbitrary conventions for routine preparation and analysis of simulation input and output data. Different formats and/or parameter definitions are used at each stage of the modeling process despite largely contain redundant information between alternative software tools. Such Babel of languages that cannot be easily and univocally translated one into another poses one of the major technical obstacles to the preparation, translation, and comparison of molecular simulation data that users face on a daily basis. Here, we present the MDWiZ platform, a freely accessed online portal designed to aid the fast and reliable preparation and conversion of file formats that allows researchers to reproduce or generate data from MD simulations using different setups, including force fields and models with different underlying potential forms. The general structure of MDWiZ is presented, the features of version 1.0 are detailed, and an extensive validation based on GROMACS to LAMMPS conversion is presented. We believe that MDWiZ will be largely useful to the molecular dynamics community. Such fast format and force field exchange for a given system allows tailoring the chosen system to a given computer platform and/or taking advantage of a specific capabilities offered by different software engines. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Extension of NHWAVE to Couple LAMMPS for Modeling Wave Interactions with Arctic Ice Floes

DTIC Science & Technology

2014-09-30

baroclinic non-hydrostatic model”, Ocean Modelling [SUBMITTED]. Bateman , S. Shi, F., Orzech, M., Veeramony, J., and Calantoni, J., 2014, “Discrete...M., Shi, F., Calantoni, J., Bateman , S., and Veeramony, J., “Small-scale modeling of waves and floes in the Marginal Ice Zone”, 2014 Fall Meeting of the American Geophysical Union, [SUBMITTED].

Preliminary Evaluation Report on the Los Angeles City Schools SB 28 Demonstration Program in Mathematics. CSE Working Paper No. 1.

ERIC Educational Resources Information Center

Gordon, C. Wayne

The objectives of the Los Angeles Model Mathematics Project (LAMMP) are stated by the administration as improvement of mathematical skills and understanding of mathematical concepts; improvement of the pupils' self-image; identification of specific assets and limitations relating to the learning process; development and use of special…
Preliminary Evaluation Report on the Los Angeles City Schools, SB 28 Demonstration Program in Mathematics.

ERIC Educational Resources Information Center

Gordon, C. Wayne

The purpose of this preliminary report is to describe and evaluate the Los Angeles Model Mathematics Project (LAMMP). The objectives of this project include the improvement of mathematical skills and understanding of mathematical concepts, the improvement of students' self-image, the development of instructional materials and the assessment of…
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Parallel simulation today

NASA Technical Reports Server (NTRS)

Nicol, David; Fujimoto, Richard

1992-01-01

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Polymorphic improvement of Stillinger-Weber potential for InGaN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaowang W.; Jones, Reese E.; Chu, Kevin

A Stillinger-Weber potential is computationally very efficient for molecular dynamics simulations. Despite its simple mathematical form, the Stillinger-Weber potential can be easily parameterized to ensure that crystal structures with tetrahedral bond angles (e.g., diamond-cubic, zinc-blende, and wurtzite) are stable and have the lowest energy. As a result, the Stillinger-Weber potential has been widely used to study a variety of semiconductor elements and alloys. When studying an A-B binary system, however, the Stillinger-Weber potential is associated with two major drawbacks. First, it significantly overestimates the elastic constants of elements A and B, limiting its use for systems involving both compounds andmore » elements (e.g., an A/AB multilayer). Second, it prescribes equal energy for zinc-blende and wurtzite crystals, limiting its use for compounds with large stacking fault energies. Here in this paper, we utilize the polymorphic potential style recently implemented in LAMMPS to develop a modified Stillinger-Weber potential for InGaN that overcomes these two problems.« less
Polymorphic improvement of Stillinger-Weber potential for InGaN

NASA Astrophysics Data System (ADS)

Zhou, X. W.; Jones, R. E.; Chu, K.

2017-12-01

A Stillinger-Weber potential is computationally very efficient for molecular dynamics simulations. Despite its simple mathematical form, the Stillinger-Weber potential can be easily parameterized to ensure that crystal structures with tetrahedral bond angles (e.g., diamond-cubic, zinc-blende, and wurtzite) are stable and have the lowest energy. As a result, the Stillinger-Weber potential has been widely used to study a variety of semiconductor elements and alloys. When studying an A-B binary system, however, the Stillinger-Weber potential is associated with two major drawbacks. First, it significantly overestimates the elastic constants of elements A and B, limiting its use for systems involving both compounds and elements (e.g., an A/AB multilayer). Second, it prescribes equal energy for zinc-blende and wurtzite crystals, limiting its use for compounds with large stacking fault energies. Here, we utilize the polymorphic potential style recently implemented in LAMMPS to develop a modified Stillinger-Weber potential for InGaN that overcomes these two problems.
Extended Tersoff potential for boron nitride: Energetics and elastic properties of pristine and defective h -BN

NASA Astrophysics Data System (ADS)

Los, J. H.; Kroes, J. M. H.; Albe, K.; Gordillo, R. M.; Katsnelson, M. I.; Fasolino, A.

2017-11-01

We present an extended Tersoff potential for boron nitride (BN-ExTeP) for application in large scale atomistic simulations. BN-ExTeP accurately describes the main low energy B, N, and BN structures and yields quantitatively correct trends in the bonding as a function of coordination. The proposed extension of the bond order, added to improve the dependence of bonding on the chemical environment, leads to an accurate description of point defects in hexagonal BN (h -BN) and cubic BN (c -BN). We have implemented this potential in the molecular dynamics LAMMPS code and used it to determine some basic properties of pristine 2D h -BN and the elastic properties of defective h -BN as a function of defect density at zero temperature. Our results show that there is a strong correlation between the size of the static corrugation induced by the defects and the weakening of the in-plane elastic moduli.
Polymorphic improvement of Stillinger-Weber potential for InGaN

DOE PAGES

Zhou, Xiaowang W.; Jones, Reese E.; Chu, Kevin

2017-12-21

A Stillinger-Weber potential is computationally very efficient for molecular dynamics simulations. Despite its simple mathematical form, the Stillinger-Weber potential can be easily parameterized to ensure that crystal structures with tetrahedral bond angles (e.g., diamond-cubic, zinc-blende, and wurtzite) are stable and have the lowest energy. As a result, the Stillinger-Weber potential has been widely used to study a variety of semiconductor elements and alloys. When studying an A-B binary system, however, the Stillinger-Weber potential is associated with two major drawbacks. First, it significantly overestimates the elastic constants of elements A and B, limiting its use for systems involving both compounds andmore » elements (e.g., an A/AB multilayer). Second, it prescribes equal energy for zinc-blende and wurtzite crystals, limiting its use for compounds with large stacking fault energies. Here in this paper, we utilize the polymorphic potential style recently implemented in LAMMPS to develop a modified Stillinger-Weber potential for InGaN that overcomes these two problems.« less
Molecular dynamics simulation of temperature effects on deposition of Cu film on Si by magnetron sputtering

NASA Astrophysics Data System (ADS)

Zhu, Guo; Sun, Jiangping; Zhang, Libin; Gan, Zhiyin

2018-06-01

The temperature effects on the growth of Cu thin film on Si (0 0 1) in the context of magnetron sputtering deposition were systematically studied using molecular dynamics (MD) method. To improve the comparability of simulation results at varying temperatures, the initial status data of incident Cu atoms used in all simulations were read from an identical file via LAMMPS-Python interface. In particular, crystalline microstructure, interface mixing and internal stress of Cu thin film deposited at different temperatures were investigated in detail. With raising the substrate temperature, the interspecies mixed volume and the proportion of face-centered cubic (fcc) structure in the deposited film both increased, while the internal compressive stress decreased. It was found that the fcc structure in the deposited Cu thin films was 〈1 1 1〉 oriented, which was reasonably explained by surface energy minimization and the selectivity of bombardment energy to the crystalline planes. The quantified analysis of interface mixing revealed that the diffusion of Cu atoms dominated the interface mixing, and the injection of incident Cu atoms resulted in the densification of phase near the film-substrate interface. More important, the distribution of atomic stress indicated that the compressive stress was mainly originated from the film-substrate interface, which might be attributed to the densification of interfacial phase at the initial stage of film deposition.
Rule-based spatial modeling with diffusing, geometrically constrained molecules.

PubMed

Gruenert, Gerd; Ibrahim, Bashar; Lenser, Thorsten; Lohel, Maiko; Hinze, Thomas; Dittrich, Peter

2010-06-07

We suggest a new type of modeling approach for the coarse grained, particle-based spatial simulation of combinatorially complex chemical reaction systems. In our approach molecules possess a location in the reactor as well as an orientation and geometry, while the reactions are carried out according to a list of implicitly specified reaction rules. Because the reaction rules can contain patterns for molecules, a combinatorially complex or even infinitely sized reaction network can be defined. For our implementation (based on LAMMPS), we have chosen an already existing formalism (BioNetGen) for the implicit specification of the reaction network. This compatibility allows to import existing models easily, i.e., only additional geometry data files have to be provided. Our simulations show that the obtained dynamics can be fundamentally different from those simulations that use classical reaction-diffusion approaches like Partial Differential Equations or Gillespie-type spatial stochastic simulation. We show, for example, that the combination of combinatorial complexity and geometric effects leads to the emergence of complex self-assemblies and transportation phenomena happening faster than diffusion (using a model of molecular walkers on microtubules). When the mentioned classical simulation approaches are applied, these aspects of modeled systems cannot be observed without very special treatment. Further more, we show that the geometric information can even change the organizational structure of the reaction system. That is, a set of chemical species that can in principle form a stationary state in a Differential Equation formalism, is potentially unstable when geometry is considered, and vice versa. We conclude that our approach provides a new general framework filling a gap in between approaches with no or rigid spatial representation like Partial Differential Equations and specialized coarse-grained spatial simulation systems like those for DNA or virus capsid self-assembly.
Rule-based spatial modeling with diffusing, geometrically constrained molecules

PubMed Central

2010-01-01

Background We suggest a new type of modeling approach for the coarse grained, particle-based spatial simulation of combinatorially complex chemical reaction systems. In our approach molecules possess a location in the reactor as well as an orientation and geometry, while the reactions are carried out according to a list of implicitly specified reaction rules. Because the reaction rules can contain patterns for molecules, a combinatorially complex or even infinitely sized reaction network can be defined. For our implementation (based on LAMMPS), we have chosen an already existing formalism (BioNetGen) for the implicit specification of the reaction network. This compatibility allows to import existing models easily, i.e., only additional geometry data files have to be provided. Results Our simulations show that the obtained dynamics can be fundamentally different from those simulations that use classical reaction-diffusion approaches like Partial Differential Equations or Gillespie-type spatial stochastic simulation. We show, for example, that the combination of combinatorial complexity and geometric effects leads to the emergence of complex self-assemblies and transportation phenomena happening faster than diffusion (using a model of molecular walkers on microtubules). When the mentioned classical simulation approaches are applied, these aspects of modeled systems cannot be observed without very special treatment. Further more, we show that the geometric information can even change the organizational structure of the reaction system. That is, a set of chemical species that can in principle form a stationary state in a Differential Equation formalism, is potentially unstable when geometry is considered, and vice versa. Conclusions We conclude that our approach provides a new general framework filling a gap in between approaches with no or rigid spatial representation like Partial Differential Equations and specialized coarse-grained spatial simulation systems like those for DNA or virus capsid self-assembly. PMID:20529264
Simulation Exploration through Immersive Parallel Planes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas J; Bush, Brian W; Gruchalla, Kenny M

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
Simulation Exploration through Immersive Parallel Planes: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
A path-level exact parallelization strategy for sequential simulation

NASA Astrophysics Data System (ADS)

Peredo, Oscar F.; Baeza, Daniel; Ortiz, Julián M.; Herrero, José R.

2018-01-01

Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

PubMed Central

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers.

PubMed

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics

NASA Astrophysics Data System (ADS)

Wang, Han; Zhang, Linfeng; Han, Jiequn; E, Weinan

2018-07-01

Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

Parallel tempering for the traveling salesman problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Percus, Allon; Wang, Richard; Hyman, Jeffrey

We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less
Round Robin Study: Molecular Simulation of Thermodynamic Properties from Models with Internal Degrees of Freedom.

PubMed

Schappals, Michael; Mecklenfeld, Andreas; Kröger, Leif; Botan, Vitalie; Köster, Andreas; Stephan, Simon; García, Edder J; Rutkai, Gabor; Raabe, Gabriele; Klein, Peter; Leonhard, Kai; Glass, Colin W; Lenhard, Johannes; Vrabec, Jadran; Hasse, Hans

2017-09-12

Thermodynamic properties are often modeled by classical force fields which describe the interactions on the atomistic scale. Molecular simulations are used for retrieving thermodynamic data from such models, and many simulation techniques and computer codes are available for that purpose. In the present round robin study, the following fundamental question is addressed: Will different user groups working with different simulation codes obtain coinciding results within the statistical uncertainty of their data? A set of 24 simple simulation tasks is defined and solved by five user groups working with eight molecular simulation codes: DL_POLY, GROMACS, IMC, LAMMPS, ms2, NAMD, Tinker, and TOWHEE. Each task consists of the definition of (1) a pure fluid that is described by a force field and (2) the conditions under which that property is to be determined. The fluids are four simple alkanes: ethane, propane, n-butane, and iso-butane. All force fields consider internal degrees of freedom: OPLS, TraPPE, and a modified OPLS version with bond stretching vibrations. Density and potential energy are determined as a function of temperature and pressure on a grid which is specified such that all states are liquid. The user groups worked independently and reported their results to a central instance. The full set of results was disclosed to all user groups only at the end of the study. During the study, the central instance gave only qualitative feedback. The results reveal the challenges of carrying out molecular simulations. Several iterations were needed to eliminate gross errors. For most simulation tasks, the remaining deviations between the results of the different groups are acceptable from a practical standpoint, but they are often outside of the statistical errors of the individual simulation data. However, there are also cases where the deviations are unacceptable. This study highlights similarities between computer experiments and laboratory experiments, which are both subject not only to statistical error but also to systematic error.
LiquidLib: A comprehensive toolbox for analyzing classical and ab initio molecular dynamics simulations of liquids and liquid-like matter with applications to neutron scattering experiments

NASA Astrophysics Data System (ADS)

Walter, Nathan P.; Jaiswal, Abhishek; Cai, Zhikun; Zhang, Yang

2018-07-01

Neutron scattering is a powerful experimental technique for characterizing the structure and dynamics of materials on the atomic or molecular scale. However, the interpretation of experimental data from neutron scattering is oftentimes not trivial, partly because scattering methods probe ensemble-averaged information in the reciprocal space. Therefore, computer simulations, such as classical and ab initio molecular dynamics, are frequently used to unravel the time-dependent atomistic configurations that can reproduce the scattering patterns and thus assist in the understanding of the microscopic origin of certain properties of materials. LiquidLib is a post-processing package for analyzing the trajectory of atomistic simulations of liquids and liquid-like matter with application to neutron scattering experiments. From an atomistic simulation, LiquidLib provides the computation of various statistical quantities including the pair distribution function, the weighted and unweighted structure factors, the mean squared displacement, the non-Gaussian parameter, the four-point correlation function, the velocity auto correlation function, the self and collective van Hove correlation functions, the self and collective intermediate scattering functions, and the bond orientational order parameter. LiquidLib analyzes atomistic trajectories generated from packages such as LAMMPS, GROMACS, and VASP. It also offers an extendable platform to conveniently integrate new quantities into the library and integrate simulation trajectories of other file formats for analysis. Weighting the quantities by element-specific neutron-scattering lengths provides results directly comparable to neutron scattering measurements. Lastly, LiquidLib is independent of dimensionality, which allows analysis of trajectories in two, three, and higher dimensions. The code is beginning to find worldwide use.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
On the suitability of the connection machine for direct particle simulation

NASA Technical Reports Server (NTRS)

Dagum, Leonard

1990-01-01

The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

NASA Astrophysics Data System (ADS)

Yan, Hui; Wang, K. G.; Jones, Jim E.

2016-06-01

A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Suppressing correlations in massively parallel simulations of lattice models

NASA Astrophysics Data System (ADS)

Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle

2017-11-01

For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, Y.; Xiong, Y. Y.; Chen, S. Y., E-mail: sychen531@163.com

2016-04-15

The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognizedmore » as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.« less
Random number generators for large-scale parallel Monte Carlo simulations on FPGA

NASA Astrophysics Data System (ADS)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
A sweep algorithm for massively parallel simulation of circuit-switched networks

NASA Technical Reports Server (NTRS)

Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

1992-01-01

A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Nunes, Ruben; Almeida, José A.

2010-08-01

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Underwood, Keith D; Ulmer, Craig D.; Thompson, David

Field programmable gate arrays (FPGAs) have been used as alternative computational de-vices for over a decade; however, they have not been used for traditional scientific com-puting due to their perceived lack of floating-point performance. In recent years, there hasbeen a surge of interest in alternatives to traditional microprocessors for high performancecomputing. Sandia National Labs began two projects to determine whether FPGAs wouldbe a suitable alternative to microprocessors for high performance scientific computing and,if so, how they should be integrated into the system. We present results that indicate thatFPGAs could have a significant impact on future systems. FPGAs have thepotentialtohave ordermore » of magnitude levels of performance wins on several key algorithms; however,there are serious questions as to whether the system integration challenge can be met. Fur-thermore, there remain challenges in FPGA programming and system level reliability whenusing FPGA devices.4 AcknowledgmentArun Rodrigues provided valuable support and assistance in the use of the Structural Sim-ulation Toolkit within an FPGA context. Curtis Janssen and Steve Plimpton provided valu-able insights into the workings of two Sandia applications (MPQC and LAMMPS, respec-tively).5« less
Relation of Parallel Discrete Event Simulation algorithms with physical models

NASA Astrophysics Data System (ADS)

Shchur, L. N.; Shchur, L. V.

2015-09-01

We extend concept of local simulation times in parallel discrete event simulation (PDES) in order to take into account architecture of the current hardware and software in high-performance computing. We shortly review previous research on the mapping of PDES on physical problems, and emphasise how physical results may help to predict parallel algorithms behaviour.
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.

DTIC Science & Technology

1994-11-01

inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP
Xyce parallel electronic simulator users guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Data parallel sorting for particle simulation

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1992-01-01

Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator

DOE PAGES

Engelmann, Christian; Naughton, III, Thomas J.

2016-03-22

Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matchingmore » algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.« less

A Systems Approach to Scalable Transportation Network Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2006-01-01

Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
ANNarchy: a code generation approach to neural simulations on parallel hardware

PubMed Central

Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

2015-01-01

Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Xyce parallel electronic simulator : users' guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-artmore » algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.« less
A fast sorting algorithm for a hypersonic rarefied flow particle simulation on the connection machine

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1989-01-01

The data parallel implementation of a particle simulation for hypersonic rarefied flow described by Dagum associates a single parallel data element with each particle in the simulation. The simulated space is divided into discrete regions called cells containing a variable and constantly changing number of particles. The implementation requires a global sort of the parallel data elements so as to arrange them in an order that allows immediate access to the information associated with cells in the simulation. Described here is a very fast algorithm for performing the necessary ranking of the parallel data elements. The performance of the new algorithm is compared with that of the microcoded instruction for ranking on the Connection Machine.
Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Parallel discrete-event simulation of FCFS stochastic queueing networks

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Progress in Unsteady Turbopump Flow Simulations

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Chan, William; Kwak, Dochan; Williams, Robert

2002-01-01

This viewgraph presentation discusses unsteady flow simulations for a turbopump intended for a reusable launch vehicle (RLV). The simulation process makes use of computational grids and parallel processing. The architecture of the parallel computers used is discussed, as is the scripting of turbopump simulations.
A derivation and scalable implementation of the synchronous parallel kinetic Monte Carlo method for simulating long-time dynamics

NASA Astrophysics Data System (ADS)

Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2017-10-01

Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.
A hybrid parallel framework for the cellular Potts model simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Yi; He, Kejing; Dong, Shoubin

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
Parallel discrete event simulation: A shared memory approach

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Effects of wall friction on flow in a quasi-2D hopper

NASA Astrophysics Data System (ADS)

Shah, Neil; Birwa, Sumit; Carballo-Ramirez, Brenda; Pleau, Mollie; Easwar, Nalini; Tewari, Shubha

Our experiments on the gravity-driven flow of spherical particles in a vertical hopper examine how the flow rate varies with opening size and wall friction. We report here on a model simulation using LAMMPS of the experimental geometry, a quasi-2D hopper. Keeping inter-particle friction fixed, the coefficient of friction at the walls is varied from 0.0 to 0.9 for a range of opening sizes. Our simulations find a steady rate of flow at each wall friction and outlet size. The Janssen effect attributes the constant rate of flow of a granular column to the column height independence of the pressure at the base, since the weight of the grains is borne in part by friction at the walls. However, we observe a constant flow regime even in the absence of wall friction, suggesting that wall friction may not be a necessary condition for pressure saturation. The observed velocities of particles near the opening are used to extrapolate their starting positions had they been in free fall. In contrast to scaling predictions, our data suggest that the height of this free-fall arch does not vary with opening size for higher frictional coefficients. We analyze the velocity traces of particles to see the range over which contact interactions remain collisional as they approach the hopper outlet.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A.; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Xyce Parallel Electronic Simulator Users' Guide Version 6.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

Jones, Mark Howard

1987-01-01

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Methods of parallel computation applied on granular simulations

NASA Astrophysics Data System (ADS)

Martins, Gustavo H. B.; Atman, Allbens P. F.

2017-06-01

Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.
Turbomachinery CFD on parallel computers

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

1992-01-01

The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
Massively parallel multicanonical simulations

NASA Astrophysics Data System (ADS)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Solar wind interaction with Venus and Mars in a parallel hybrid code

NASA Astrophysics Data System (ADS)

Jarvinen, Riku; Sandroos, Arto

2013-04-01

We discuss the development and applications of a new parallel hybrid simulation, where ions are treated as particles and electrons as a charge-neutralizing fluid, for the interaction between the solar wind and Venus and Mars. The new simulation code under construction is based on the algorithm of the sequential global planetary hybrid model developed at the Finnish Meteorological Institute (FMI) and on the Corsair parallel simulation platform also developed at the FMI. The FMI's sequential hybrid model has been used for studies of plasma interactions of several unmagnetized and weakly magnetized celestial bodies for more than a decade. Especially, the model has been used to interpret in situ particle and magnetic field observations from plasma environments of Mars, Venus and Titan. Further, Corsair is an open source MPI (Message Passing Interface) particle and mesh simulation platform, mainly aimed for simulations of diffusive shock acceleration in solar corona and interplanetary space, but which is now also being extended for global planetary hybrid simulations. In this presentation we discuss challenges and strategies of parallelizing a legacy simulation code as well as possible applications and prospects of a scalable parallel hybrid model for the solar wind interactions of Venus and Mars.
Acoustic simulation in architecture with parallel algorithm

NASA Astrophysics Data System (ADS)

Li, Xiaohong; Zhang, Xinrong; Li, Dan

2004-03-01

In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

Surface Modification Engineered Assembly of Novel Quantum Dot Architectures for Advanced Applications

DTIC Science & Technology

2008-02-09

Campbell, S. Ogata, and F. Shimojo, “ Multimillion atom simulations of nanosystems on parallel computers,” in Proceedings of the International...nanomesas: multimillion -atom molecular dynamics simulations on parallel computers,” J. Appl. Phys. 94, 6762 (2003). 21. P. Vashishta, R. K. Kalia...and A. Nakano, “ Multimillion atom molecular dynamics simulations of nanoparticles on parallel computers,” Journal of Nanoparticle Research 5, 119-135
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.

PubMed

Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S

2017-10-01

An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT, or to design OCT systems with improved performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

NASA Technical Reports Server (NTRS)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Crashworthiness simulations with DYNA3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schauer, D.A.; Hoover, C.G.; Kay, G.J.

1996-04-01

Current progress in parallel algorithm research and applications in vehicle crash simulation is described for the explicit, finite element algorithms in DYNA3D. Problem partitioning methods and parallel algorithms for contact at material interfaces are the two challenging algorithm research problems that are addressed. Two prototype parallel contact algorithms have been developed for treating the cases of local and arbitrary contact. Demonstration problems for local contact are crashworthiness simulations with 222 locally defined contact surfaces and a vehicle/barrier collision modeled with arbitrary contact. A simulation of crash tests conducted for a vehicle impacting a U-channel small sign post embedded in soilmore » has been run on both the serial and parallel versions of DYNA3D. A significant reduction in computational time has been observed when running these problems on the parallel version. However, to achieve maximum efficiency, complex problems must be appropriately partitioned, especially when contact dominates the computation.« less
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

PubMed

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
n-body simulations using message passing parallel computers.

NASA Astrophysics Data System (ADS)

Grama, A. Y.; Kumar, V.; Sameh, A.

The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.
A conservative approach to parallelizing the Sharks World simulation

NASA Technical Reports Server (NTRS)

Nicol, David M.; Riffe, Scott E.

1990-01-01

Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
AC losses in horizontally parallel HTS tapes for possible wireless power transfer applications

NASA Astrophysics Data System (ADS)

Shen, Boyang; Geng, Jianzhao; Zhang, Xiuchang; Fu, Lin; Li, Chao; Zhang, Heng; Dong, Qihuan; Ma, Jun; Gawith, James; Coombs, T. A.

2017-12-01

This paper presents the concept of using horizontally parallel HTS tapes with AC loss study, and the investigation on possible wireless power transfer (WPT) applications. An example of three parallel HTS tapes was proposed, whose AC loss study was carried out both from experiment using electrical method; and simulation using 2D H-formulation on the FEM platform of COMSOL Multiphysics. The electromagnetic induction around the three parallel tapes was monitored using COMSOL simulation. The electromagnetic induction and AC losses generated by a conventional three turn coil was simulated as well, and then compared to the case of three parallel tapes with the same AC transport current. The analysis demonstrates that HTS parallel tapes could be potentially used into wireless power transfer systems, which could have lower total AC losses than conventional HTS coils.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.« less
Extension of NHWAVE to Couple LAMMPS for Modeling Wave Interactions with Arctic Ice Floes

DTIC Science & Technology

2015-09-30

Modelling, in press. Orzech, M., Shi, F., Veeramony, J., Bateman , S., Calantoni, J., and Kirby, J. T., 2015, “Incorporating floating surface...objects into a fully dispersive surface wave model”, Ocean Modelling, submitted. Bateman , S. Shi, F., Orzech, M., Veeramony, J., and Calantoni, J., 2014...Orzech, M., Shi, F., Calantoni, J., Bateman , S., and Veeramony, J., “Small-scale modeling of waves and floes in the Marginal Ice Zone”, 2014 Fall Meeting of the American Geophysical Union.
On efficiency of fire simulation realization: parallelization with greater number of computational meshes

NASA Astrophysics Data System (ADS)

Valasek, Lukas; Glasa, Jan

2017-12-01

Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
Development of a parallel FE simulator for modeling the whole trans-scale failure process of rock from meso- to engineering-scale

NASA Astrophysics Data System (ADS)

Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao

2017-01-01

Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmalz, Mark S

2011-07-24

Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

PubMed

Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

2016-10-01

Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
Real-time electron dynamics for massively parallel excited-state simulations

NASA Astrophysics Data System (ADS)

Andrade, Xavier

The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over themore » $$\\mu$$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.« less
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fathany, Maulana Yusuf, E-mail: myfathany@gmail.com; Fuada, Syifaul, E-mail: fsyifaul@gmail.com; Lawu, Braham Lawas, E-mail: bram-labs@rocketmail.com

2016-04-19

This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials

NASA Astrophysics Data System (ADS)

Thompson, A. P.; Swiler, L. P.; Trott, C. R.; Foiles, S. M.; Tucker, G. J.

2015-03-01

We present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1]. The SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.

Parallel-Processing Test Bed For Simulation Software

NASA Technical Reports Server (NTRS)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
The Structure and Properties of Silica Glass Nanostructures using Novel Computational Systems

NASA Astrophysics Data System (ADS)

Doblack, Benjamin N.

The structure and properties of silica glass nanostructures are examined using computational methods in this work. Standard synthesis methods of silica and its associated material properties are first discussed in brief. A review of prior experiments on this amorphous material is also presented. Background and methodology for the simulation of mechanical tests on amorphous bulk silica and nanostructures are later presented. A new computational system for the accurate and fast simulation of silica glass is also presented, using an appropriate interatomic potential for this material within the open-source molecular dynamics computer program LAMMPS. This alternative computational method uses modern graphics processors, Nvidia CUDA technology and specialized scientific codes to overcome processing speed barriers common to traditional computing methods. In conjunction with a virtual reality system used to model select materials, this enhancement allows the addition of accelerated molecular dynamics simulation capability. The motivation is to provide a novel research environment which simultaneously allows visualization, simulation, modeling and analysis. The research goal of this project is to investigate the structure and size dependent mechanical properties of silica glass nanohelical structures under tensile MD conditions using the innovative computational system. Specifically, silica nanoribbons and nanosprings are evaluated which revealed unique size dependent elastic moduli when compared to the bulk material. For the nanoribbons, the tensile behavior differed widely between the models simulated, with distinct characteristic extended elastic regions. In the case of the nanosprings simulated, more clear trends are observed. In particular, larger nanospring wire cross-sectional radii (r) lead to larger Young's moduli, while larger helical diameters (2R) resulted in smaller Young's moduli. Structural transformations and theoretical models are also analyzed to identify possible factors which might affect the mechanical response of silica nanostructures under tension. The work presented outlines an innovative simulation methodology, and discusses how results can be validated against prior experimental and simulation findings. The ultimate goal is to develop new computational methods for the study of nanostructures which will make the field of materials science more accessible, cost effective and efficient.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

PubMed

Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

2013-09-15

A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.
Applications of New Surrogate Global Optimization Algorithms including Efficient Synchronous and Asynchronous Parallelism for Calibration of Expensive Nonlinear Geophysical Simulation Models.

NASA Astrophysics Data System (ADS)

Shoemaker, C. A.; Pang, M.; Akhtar, T.; Bindel, D.

2016-12-01

New parallel surrogate global optimization algorithms are developed and applied to objective functions that are expensive simulations (possibly with multiple local minima). The algorithms can be applied to most geophysical simulations, including those with nonlinear partial differential equations. The optimization does not require simulations be parallelized. Asynchronous (and synchronous) parallel execution is available in the optimization toolbox "pySOT". The parallel algorithms are modified from serial to eliminate fine grained parallelism. The optimization is computed with open source software pySOT, a Surrogate Global Optimization Toolbox that allows user to pick the type of surrogate (or ensembles), the search procedure on surrogate, and the type of parallelism (synchronous or asynchronous). pySOT also allows the user to develop new algorithms by modifying parts of the code. In the applications here, the objective function takes up to 30 minutes for one simulation, and serial optimization can take over 200 hours. Results from Yellowstone (NSF) and NCSS (Singapore) supercomputers are given for groundwater contaminant hydrology simulations with applications to model parameter estimation and decontamination management. All results are compared with alternatives. The first results are for optimization of pumping at many wells to reduce cost for decontamination of groundwater at a superfund site. The optimization runs with up to 128 processors. Superlinear speed up is obtained for up to 16 processors, and efficiency with 64 processors is over 80%. Each evaluation of the objective function requires the solution of nonlinear partial differential equations to describe the impact of spatially distributed pumping and model parameters on model predictions for the spatial and temporal distribution of groundwater contaminants. The second application uses an asynchronous parallel global optimization for groundwater quality model calibration. The time for a single objective function evaluation varies unpredictably, so efficiency is improved with asynchronous parallel calculations to improve load balancing. The third application (done at NCSS) incorporates new global surrogate multi-objective parallel search algorithms into pySOT and applies it to a large watershed calibration problem.
Look-ahead Dynamic Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-10-20

Look-ahead dynamic simulation software system incorporates the high performance parallel computing technologies, significantly reduces the solution time for each transient simulation case, and brings the dynamic simulation analysis into on-line applications to enable more transparency for better reliability and asset utilization. It takes the snapshot of the current power grid status, functions in parallel computing the system dynamic simulation, and outputs the transient response of the power system in real time.
A modified Stillinger-Weber potential for TlBr and its polymorphic extension

DOE PAGES

Zhou, Xiaowang; Foster, Michael E.; Jones, Reese E.; ...

2015-04-30

TlBr is promising for g- and x- radiation detection, but suffers from rapid performance degradation under the operating external electric fields. To enable molecular dynamics (MD) studies of this degradation, we have developed a Stillinger-Weber type of TlBr interatomic potential. During this process, we have also addressed two problems of wider interests. First, the conventional Stillinger-Weber potential format is only applicable for tetrahedral structures (e.g., diamond-cubic, zinc-blende, or wurtzite). Here we have modified the analytical functions of the Stillinger-Weber potential so that it can now be used for other crystal structures. Second, past modifications of interatomic potentials cannot always bemore » applied by a broad community because any new analytical functions of the potential would require corresponding changes in the molecular dynamics codes. Here we have developed a polymorphic potential model that simultaneously incorporates Stillinger-Weber, Tersoff, embedded-atom method, and any variations (i.e., modified functions) of these potentials. As a result, we have implemented this polymorphic model in MD code LAMMPS, and demonstrated that our TlBr potential enables stable MD simulations under external electric fields.« less
Structure of jammed configurations and their relation to unjamming times

NASA Astrophysics Data System (ADS)

Birwa, Sumit Kumar; Merrigan, Carl; Chakraborty, Bulbul; Tewari, Shubha

The distribution of the times for the cessation of flow of grains falling under gravity in a vertical hopper is known to be exponential. Recent experiments have shown, however, that the time lapse between avalanches follows a power-law distribution when the hopper is unjammed using periodic vertical vibrations. The reasons for this distribution of the unjamming times, which indicates the time needed for an applied continuous perturbation to induce another avalanche, are not well understood. We report on a numerical simulation of granular hopper flow using LAMMPS in which we seek to understand the origin and scope of this behavior. We find that cessation of flow is related to the formation of a stable arch that spans the system. However, the actual structure of the jammed configuration varies and is closely related to the unjamming time. We find that the symmetry of the arches is an important parameter in determining the strength of the jammed configurations. Using different force thresholds, we have characterized the contact networks around the arches which provides stability to the packed structure and analyzed the strength of various jammed configurations. Supported by NSF Grant DMR1409093 and DGE1068620.
Numerical integration of the extended variable generalized Langevin equation with a positive Prony representable memory kernel.

PubMed

Baczewski, Andrew D; Bond, Stephen D

2013-07-28

Generalized Langevin dynamics (GLD) arise in the modeling of a number of systems, ranging from structured fluids that exhibit a viscoelastic mechanical response, to biological systems, and other media that exhibit anomalous diffusive phenomena. Molecular dynamics (MD) simulations that include GLD in conjunction with external and/or pairwise forces require the development of numerical integrators that are efficient, stable, and have known convergence properties. In this article, we derive a family of extended variable integrators for the Generalized Langevin equation with a positive Prony series memory kernel. Using stability and error analysis, we identify a superlative choice of parameters and implement the corresponding numerical algorithm in the LAMMPS MD software package. Salient features of the algorithm include exact conservation of the first and second moments of the equilibrium velocity distribution in some important cases, stable behavior in the limit of conventional Langevin dynamics, and the use of a convolution-free formalism that obviates the need for explicit storage of the time history of particle velocities. Capability is demonstrated with respect to accuracy in numerous canonical examples, stability in certain limits, and an exemplary application in which the effect of a harmonic confining potential is mapped onto a memory kernel.
Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

NASA Astrophysics Data System (ADS)

Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

2011-10-01

SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

NASA Astrophysics Data System (ADS)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-01

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

2004-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for allmore » numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.« less
Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak

1996-01-01

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

NASA Astrophysics Data System (ADS)

Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

2016-07-01

Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
Methodology of modeling and measuring computer architectures for plasma simulations

NASA Technical Reports Server (NTRS)

Wang, L. P. T.

1977-01-01

A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
PENTACLE: Parallelized particle-particle particle-tree code for planet formation

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Oshino, Shoichi; Fujii, Michiko S.; Hori, Yasunori

2017-10-01

We have newly developed a parallelized particle-particle particle-tree code for planet formation, PENTACLE, which is a parallelized hybrid N-body integrator executed on a CPU-based (super)computer. PENTACLE uses a fourth-order Hermite algorithm to calculate gravitational interactions between particles within a cut-off radius and a Barnes-Hut tree method for gravity from particles beyond. It also implements an open-source library designed for full automatic parallelization of particle simulations, FDPS (Framework for Developing Particle Simulator), to parallelize a Barnes-Hut tree algorithm for a memory-distributed supercomputer. These allow us to handle 1-10 million particles in a high-resolution N-body simulation on CPU clusters for collisional dynamics, including physical collisions in a planetesimal disc. In this paper, we show the performance and the accuracy of PENTACLE in terms of \\tilde{R}_cut and a time-step Δt. It turns out that the accuracy of a hybrid N-body simulation is controlled through Δ t / \\tilde{R}_cut and Δ t / \\tilde{R}_cut ˜ 0.1 is necessary to simulate accurately the accretion process of a planet for ≥106 yr. For all those interested in large-scale particle simulations, PENTACLE, customized for planet formation, will be freely available from https://github.com/PENTACLE-Team/PENTACLE under the MIT licence.
Numerical characteristics of quantum computer simulation

NASA Astrophysics Data System (ADS)

Chernyavskiy, A.; Khamitov, K.; Teplov, A.; Voevodin, V.; Voevodin, Vl.

2016-12-01

The simulation of quantum circuits is significantly important for the implementation of quantum information technologies. The main difficulty of such modeling is the exponential growth of dimensionality, thus the usage of modern high-performance parallel computations is relevant. As it is well known, arbitrary quantum computation in circuit model can be done by only single- and two-qubit gates, and we analyze the computational structure and properties of the simulation of such gates. We investigate the fact that the unique properties of quantum nature lead to the computational properties of the considered algorithms: the quantum parallelism make the simulation of quantum gates highly parallel, and on the other hand, quantum entanglement leads to the problem of computational locality during simulation. We use the methodology of the AlgoWiki project (algowiki-project.org) to analyze the algorithm. This methodology consists of theoretical (sequential and parallel complexity, macro structure, and visual informational graph) and experimental (locality and memory access, scalability and more specific dynamic characteristics) parts. Experimental part was made by using the petascale Lomonosov supercomputer (Moscow State University, Russia). We show that the simulation of quantum gates is a good base for the research and testing of the development methods for data intense parallel software, and considered methodology of the analysis can be successfully used for the improvement of the algorithms in quantum information science.
Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Design of object-oriented distributed simulation classes

NASA Technical Reports Server (NTRS)

Schoeffler, James D. (Principal Investigator)

1995-01-01

Distributed simulation of aircraft engines as part of a computer aided design package is being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for 'Numerical Propulsion Simulation System'. NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT 'Actor' model of a concurrent object and uses 'connectors' to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.

Design of Object-Oriented Distributed Simulation Classes

NASA Technical Reports Server (NTRS)

Schoeffler, James D.

1995-01-01

Distributed simulation of aircraft engines as part of a computer aided design package being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for "Numerical Propulsion Simulation System". NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT "Actor" model of a concurrent object and uses "connectors" to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Partitioning and packing mathematical simulation models for calculation on parallel computers

NASA Technical Reports Server (NTRS)

Arpasi, D. J.; Milner, E. J.

1986-01-01

The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

NASA Astrophysics Data System (ADS)

Yi, Hongsuk

2014-03-01

Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.

PubMed

Slażyński, Leszek; Bohte, Sander

2012-01-01

The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques

DOT National Transportation Integrated Search

1995-01-01

Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

PubMed

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-07-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
On extending parallelism to serial simulators

NASA Technical Reports Server (NTRS)

Nicol, David; Heidelberger, Philip

1994-01-01

This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
The cost of conservative synchronization in parallel discrete event simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1990-01-01

The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
A New Parallel Boundary Condition for Turbulence Simulations in Stellarators

NASA Astrophysics Data System (ADS)

Martin, Mike F.; Landreman, Matt; Dorland, William; Xanthopoulos, Pavlos

2017-10-01

For gyrokinetic simulations of core turbulence, the ``twist-and-shift'' parallel boundary condition (Beer et al., PoP, 1995), which involves a shift in radial wavenumber proportional to the global shear and a quantization of the simulation domain's aspect ratio, is the standard choice. But as this condition was derived under the assumption of axisymmetry, ``twist-and-shift'' as it stands is formally incorrect for turbulence simulations in stellarators. Moreover, for low-shear stellarators like W7X and HSX, the use of a global shear in the traditional boundary condition places an inflexible constraint on the aspect ratio of the domain, requiring more grid points to fully resolve its extent. Here, we present a parallel boundary condition for ``stellarator-symmetric'' simulations that relies on the local shear along a field line. This boundary condition is similar to ``twist-and-shift'', but has an added flexibility in choosing the parallel length of the domain based on local shear consideration in order to optimize certain parameters such as the aspect ratio of the simulation domain.
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations

PubMed Central

Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka

2011-01-01

Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Parallelizing Timed Petri Net simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1993-01-01

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Parallel computing method for simulating hydrological processesof large rivers under climate change

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, Y.

2016-12-01

Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Research in parallel computing

NASA Technical Reports Server (NTRS)

Ortega, James M.; Henderson, Charles

1994-01-01

This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Measuring movement towards improved emergency obstetric care in rural Kenya with implementation of the PRONTO simulation and team training program.

PubMed

Dettinger, Julia C; Kamau, Stephen; Calkins, Kimberly; Cohen, Susanna R; Cranmer, John; Kibore, Minnie; Gachuno, Onesmus; Walker, Dilys

2018-02-01

As the proportion of facility-based births increases, so does the need to ensure that mothers and their newborns receive quality care. Developing facility-oriented obstetric and neonatal training programs grounded in principles of teamwork utilizing simulation-based training for emergency response is an important strategy for improving the quality care. This study uses 3 dimensions of the Kirkpatrick Model to measure the impact of PRONTO International (PRONTO) simulation-based training as part of the Linda Afya ya Mama na Mtoto (LAMMP, Protect the Health of mother and child) in Kenya. Changes in knowledge of obstetric and neonatal emergency response, self-efficacy, and teamwork were analyzed using longitudinal, fixed-effects, linear regression models. Participants from 26 facilities participated in the training between 2013 and 2014. The results demonstrate improvements in knowledge, self-efficacy, and teamwork self-assessment. When comparing pre-Module I scores with post-training scores, improvements range from 9 to 24 percentage points (p values < .0001 to .026). Compared to baseline, post-Module I and post-Module II (3 months later) scores in these domains were similar. The intervention not only improved participant teamwork skills, obstetric and neonatal knowledge, and self-efficacy but also fostered sustained changes at 3 months. The proportion of facilities achieving self-defined strategic goals was high: 95.8% of the 192 strategic goals. Participants rated the PRONTO intervention as extremely useful, with an overall score of 1.4 out of 5 (1, extremely useful; 5, not at all useful). Evaluation of how these improvements affect maternal and perinatal clinical outcomes is forthcoming. © 2018 John Wiley & Sons Ltd.
Evaluation of copper, aluminum, and nickel interatomic potentials on predicting the elastic properties

NASA Astrophysics Data System (ADS)

Rassoulinejad-Mousavi, Seyed Moein; Mao, Yijin; Zhang, Yuwen

2016-06-01

Choice of appropriate force field is one of the main concerns of any atomistic simulation that needs to be seriously considered in order to yield reliable results. Since investigations on the mechanical behavior of materials at micro/nanoscale have been becoming much more widespread, it is necessary to determine an adequate potential which accurately models the interaction of the atoms for desired applications. In this framework, reliability of multiple embedded atom method based interatomic potentials for predicting the elastic properties was investigated. Assessments were carried out for different copper, aluminum, and nickel interatomic potentials at room temperature which is considered as the most applicable case. Examined force fields for the three species were taken from online repositories of National Institute of Standards and Technology, as well as the Sandia National Laboratories, the LAMMPS database. Using molecular dynamic simulations, the three independent elastic constants, C11, C12, and C44, were found for Cu, Al, and Ni cubic single crystals. Voigt-Reuss-Hill approximation was then implemented to convert elastic constants of the single crystals into isotropic polycrystalline elastic moduli including bulk modulus, shear modulus, and Young's modulus as well as Poisson's ratio. Simulation results from massive molecular dynamic were compared with available experimental data in the literature to justify the robustness of each potential for each species. Eventually, accurate interatomic potentials have been recommended for finding each of the elastic properties of the pure species. Exactitude of the elastic properties was found to be sensitive to the choice of the force fields. Those potentials that were fitted for a specific compound may not necessarily work accurately for all the existing pure species. Tabulated results in this paper might be used as a benchmark to increase assurance of using the interatomic potential that was designated for a problem.

Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

NASA Technical Reports Server (NTRS)

Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

1992-01-01

A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

NASA Astrophysics Data System (ADS)

Furuichi, M.; Nishiura, D.

2015-12-01

Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
A tool for simulating parallel branch-and-bound methods

NASA Astrophysics Data System (ADS)

Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

2016-01-01

The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Parallel replica dynamics with a heterogeneous distribution of barriers: Application to n-hexadecane pyrolysis

NASA Astrophysics Data System (ADS)

Kum, Oyeon; Dickson, Brad M.; Stuart, Steven J.; Uberuaga, Blas P.; Voter, Arthur F.

2004-11-01

Parallel replica dynamics simulation methods appropriate for the simulation of chemical reactions in molecular systems with many conformational degrees of freedom have been developed and applied to study the microsecond-scale pyrolysis of n-hexadecane in the temperature range of 2100-2500 K. The algorithm uses a transition detection scheme that is based on molecular topology, rather than energetic basins. This algorithm allows efficient parallelization of small systems even when using more processors than particles (in contrast to more traditional parallelization algorithms), and even when there are frequent conformational transitions (in contrast to previous implementations of the parallel replica algorithm). The parallel efficiency for pyrolysis initiation reactions was over 90% on 61 processors for this 50-atom system. The parallel replica dynamics technique results in reaction probabilities that are statistically indistinguishable from those obtained from direct molecular dynamics, under conditions where both are feasible, but allows simulations at temperatures as much as 1000 K lower than direct molecular dynamics simulations. The rate of initiation displayed Arrhenius behavior over the entire temperature range, with an activation energy and frequency factor of Ea=79.7 kcal/mol and log A/s-1=14.8, respectively, in reasonable agreement with experiment and empirical kinetic models. Several interesting unimolecular reaction mechanisms were observed in simulations of the chain propagation reactions above 2000 K, which are not included in most coarse-grained kinetic models. More studies are needed in order to determine whether these mechanisms are experimentally relevant, or specific to the potential energy surface used.
Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

NASA Technical Reports Server (NTRS)

Jones, W. H.

1983-01-01

The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomura, K; Seymour, R; Wang, W

2009-02-17

A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A

2012-01-01

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM

NASA Astrophysics Data System (ADS)

de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl

2002-03-01

We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov; Weare, Jonathan Q., E-mail: weare@uchicago.edu; Weare, John H., E-mail: jweare@ucsd.edu

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for themore » trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.« less
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL DATA SETS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skory, Stephen; Turk, Matthew J.; Norman, Michael L.

2010-11-15

Modern N-body cosmological simulations contain billions (10{sup 9}) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, suchmore » that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit {sup yt}, an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 2000{sup 3} particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.« less
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using amore » variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.« less
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

PubMed

Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

NASA Astrophysics Data System (ADS)

Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

2017-11-01

In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations

PubMed Central

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-01-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
Application of integration algorithms in a parallel processing environment for the simulation of jet engines

NASA Technical Reports Server (NTRS)

Krosel, S. M.; Milner, E. J.

1982-01-01

The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

PubMed

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

NASA Technical Reports Server (NTRS)

Sawyer, Darren Charles

1994-01-01

The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
Constitutive Model Calibration via Autonomous Multiaxial Experimentation (Postprint)

DTIC Science & Technology

2016-09-17

test machine. Experimental data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain...data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain conditions. Optimization methods...be used directly in finite element simulations of more complex geometries. Keywords Axial/torsional experimentation • Plasticity • Constitutive model

Parallel Implementation of the Discontinuous Galerkin Method

NASA Technical Reports Server (NTRS)

Baggag, Abdalkader; Atkins, Harold; Keyes, David

1999-01-01

This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Providing a parallel and distributed capability for JMASS using SPEEDES

NASA Astrophysics Data System (ADS)

Valinski, Maria; Driscoll, Jonathan; McGraw, Robert M.; Meyer, Bob

2002-07-01

The Joint Modeling And Simulation System (JMASS) is a Tri-Service simulation environment that supports engineering and engagement-level simulations. As JMASS is expanded to support other Tri-Service domains, the current set of modeling services must be expanded for High Performance Computing (HPC) applications by adding support for advanced time-management algorithms, parallel and distributed topologies, and high speed communications. By providing support for these services, JMASS can better address modeling domains requiring parallel computationally intense calculations such clutter, vulnerability and lethality calculations, and underwater-based scenarios. A risk reduction effort implementing some HPC services for JMASS using the SPEEDES (Synchronous Parallel Environment for Emulation and Discrete Event Simulation) Simulation Framework has recently concluded. As an artifact of the JMASS-SPEEDES integration, not only can HPC functionality be brought to the JMASS program through SPEEDES, but an additional HLA-based capability can be demonstrated that further addresses interoperability issues. The JMASS-SPEEDES integration provided a means of adding HLA capability to preexisting JMASS scenarios through an implementation of the standard JMASS port communication mechanism that allows players to communicate.
CUBE: Information-optimized parallel cosmological N-body simulation code

NASA Astrophysics Data System (ADS)

Yu, Hao-Ran; Pen, Ue-Li; Wang, Xin

2018-05-01

CUBE, written in Coarray Fortran, is a particle-mesh based parallel cosmological N-body simulation code. The memory usage of CUBE can approach as low as 6 bytes per particle. Particle pairwise (PP) force, cosmological neutrinos, spherical overdensity (SO) halofinder are included.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu

2007-01-01

An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

PubMed

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Acceleration of the matrix multiplication of Radiance three phase daylighting simulations with parallel computing on heterogeneous hardware of personal computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2013-05-23

Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach wasmore » evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.« less
Using parallel computing for the display and simulation of the space debris environment

NASA Astrophysics Data System (ADS)

Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.

2011-07-01

Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment

NASA Astrophysics Data System (ADS)

Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter

Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Xyce Parallel Electronic Simulator Users' Guide Version 6.7.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one tomore » develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. All other trademarks are property of their respective owners. Contacts World Wide Web http://xyce.sandia.gov https://info.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only) Bug Reports (Sandia only) http://joseki-vm.sandia.gov/bugzilla http://morannon.sandia.gov/bugzilla« less
Developing parallel GeoFEST(P) using the PYRAMID AMR library

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.

2004-01-01

The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†

PubMed Central

Khan, Md. Ashfaquzzaman; Herbordt, Martin C.

2011-01-01

Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.

PubMed

Khan, Md Ashfaquzzaman; Herbordt, Martin C

2011-07-20

Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

NASA Technical Reports Server (NTRS)

Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

1989-01-01

Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
A parallel implementation of an off-lattice individual-based model of multicellular populations

NASA Astrophysics Data System (ADS)

Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe

2015-07-01

As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Octree-based, GPU implementation of a continuous cellular automaton for the simulation of complex, evolving surfaces

NASA Astrophysics Data System (ADS)

Ferrando, N.; Gosálvez, M. A.; Cerdá, J.; Gadea, R.; Sato, K.

2011-03-01

Presently, dynamic surface-based models are required to contain increasingly larger numbers of points and to propagate them over longer time periods. For large numbers of surface points, the octree data structure can be used as a balance between low memory occupation and relatively rapid access to the stored data. For evolution rules that depend on neighborhood states, extended simulation periods can be obtained by using simplified atomistic propagation models, such as the Cellular Automata (CA). This method, however, has an intrinsic parallel updating nature and the corresponding simulations are highly inefficient when performed on classical Central Processing Units (CPUs), which are designed for the sequential execution of tasks. In this paper, a series of guidelines is presented for the efficient adaptation of octree-based, CA simulations of complex, evolving surfaces into massively parallel computing hardware. A Graphics Processing Unit (GPU) is used as a cost-efficient example of the parallel architectures. For the actual simulations, we consider the surface propagation during anisotropic wet chemical etching of silicon as a computationally challenging process with a wide-spread use in microengineering applications. A continuous CA model that is intrinsically parallel in nature is used for the time evolution. Our study strongly indicates that parallel computations of dynamically evolving surfaces simulated using CA methods are significantly benefited by the incorporation of octrees as support data structures, substantially decreasing the overall computational time and memory usage.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
A real-time, dual processor simulation of the rotor system research aircraft

NASA Technical Reports Server (NTRS)

Mackie, D. B.; Alderete, T. S.

1977-01-01

A real-time, man-in-the loop, simulation of the rotor system research aircraft (RSRA) was conducted. The unique feature of this simulation was that two digital computers were used in parallel to solve the equations of the RSRA mathematical model. The design, development, and implementation of the simulation are documented. Program validation was discussed, and examples of data recordings are given. This simulation provided an important research tool for the RSRA project in terms of safe and cost-effective design analysis. In addition, valuable knowledge concerning parallel processing and a powerful simulation hardware and software system was gained.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

NASA Astrophysics Data System (ADS)

Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

2018-01-01

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
A fully coupled method for massively parallel simulation of hydraulically driven fractures in 3-dimensions: FULLY COUPLED PARALLEL SIMULATION OF HYDRAULIC FRACTURES IN 3-D

DOE PAGES

Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...

2016-09-18

This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
A fully coupled method for massively parallel simulation of hydraulically driven fractures in 3-dimensions: FULLY COUPLED PARALLEL SIMULATION OF HYDRAULIC FRACTURES IN 3-D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.

This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, Aidan P.; Swiler, Laura P.; Trott, Christian R.

2015-03-15

Here, we present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1].more » The SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.« less
Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, A.P., E-mail: athomps@sandia.gov; Swiler, L.P., E-mail: lpswile@sandia.gov; Trott, C.R., E-mail: crtrott@sandia.gov

2015-03-15

We present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1]. Themore » SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.« less
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Applying Parallel Processing Techniques to Tether Dynamics Simulation

NASA Technical Reports Server (NTRS)

Wells, B. Earl

1996-01-01

The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.
Parallel discrete event simulation using shared memory

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1988-01-01

With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earth Sciences Division; Zhang, Keni; Zhang, Keni

TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

NASA Technical Reports Server (NTRS)

Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

1995-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Parallel-plate transmission line type of EMP simulators: Systematic review and recommendations

NASA Astrophysics Data System (ADS)

Giri, D. V.; Liu, T. K.; Tesche, F. M.; King, R. W. P.

1980-05-01

This report presents various aspects of the two-parallel-plate transmission line type of EMP simulator. Much of the work is the result of research efforts conducted during the last two decades at the Air Force Weapons Laboratory, and in industries/universities as well. The principal features of individual simulator components are discussed. The report also emphasizes that it is imperative to hybridize our understanding of individual components so that we can draw meaningful conclusions of simulator performance as a whole.
MMS Observations and Hybrid Simulations of Surface Ripples at a Marginally Quasi-Parallel Shock

NASA Astrophysics Data System (ADS)

Gingell, Imogen; Schwartz, Steven J.; Burgess, David; Johlander, Andreas; Russell, Christopher T.; Burch, James L.; Ergun, Robert E.; Fuselier, Stephen; Gershman, Daniel J.; Giles, Barbara L.; Goodrich, Katherine A.; Khotyaintsev, Yuri V.; Lavraud, Benoit; Lindqvist, Per-Arne; Strangeway, Robert J.; Trattner, Karlheinz; Torbert, Roy B.; Wei, Hanying; Wilder, Frederick

2017-11-01

Simulations and observations of collisionless shocks have shown that deviations of the nominal local shock normal orientation, that is, surface waves or ripples, are expected to propagate in the ramp and overshoot of quasi-perpendicular shocks. Here we identify signatures of a surface ripple propagating during a crossing of Earth's marginally quasi-parallel (θBn˜45∘) or quasi-parallel bow shock on 27 November 2015 06:01:44 UTC by the Magnetospheric Multiscale (MMS) mission and determine the ripple's properties using multispacecraft methods. Using two-dimensional hybrid simulations, we confirm that surface ripples are a feature of marginally quasi-parallel and quasi-parallel shocks under the observed solar wind conditions. In addition, since these marginally quasi-parallel and quasi-parallel shocks are expected to undergo a cyclic reformation of the shock front, we discuss the impact of multiple sources of nonstationarity on shock structure. Importantly, ripples are shown to be transient phenomena, developing faster than an ion gyroperiod and only during the period of the reformation cycle when a newly developed shock ramp is unaffected by turbulence in the foot. We conclude that the change in properties of the ripple observed by MMS is consistent with the reformation of the shock front over a time scale of an ion gyroperiod.
Parallel Transport with Sheath and Collisional Effects in Global Electrostatic Turbulent Transport in FRCs

NASA Astrophysics Data System (ADS)

Bao, Jian; Lau, Calvin; Kuley, Animesh; Lin, Zhihong; Fulton, Daniel; Tajima, Toshiki; Tri Alpha Energy, Inc. Team

2017-10-01

Collisional and turbulent transport in a field reversed configuration (FRC) is studied in global particle simulation by using GTC (gyrokinetic toroidal code). The global FRC geometry is incorporated in GTC by using a field-aligned mesh in cylindrical coordinates, which enables global simulation coupling core and scrape-off layer (SOL) across the separatrix. Furthermore, fully kinetic ions are implemented in GTC to treat magnetic-null point in FRC core. Both global simulation coupling core and SOL regions and independent SOL region simulation have been carried out to study turbulence. In this work, the ``logical sheath boundary condition'' is implemented to study parallel transport in the SOL. This method helps to relax time and spatial steps without resolving electron plasma frequency and Debye length, which enables turbulent transports simulation with sheath effects. We will study collisional and turbulent SOL parallel transport with mirror geometry and sheath boundary condition in C2-W divertor.
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.

PubMed

Higginson, J S; Neptune, R R; Anderson, F C

2005-09-01

Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
Simulation of Hypervelocity Impact on Aluminum-Nextel-Kevlar Orbital Debris Shields

NASA Technical Reports Server (NTRS)

Fahrenthold, Eric P.

2000-01-01

An improved hybrid particle-finite element method has been developed for hypervelocity impact simulation. The method combines the general contact-impact capabilities of particle codes with the true Lagrangian kinematics of large strain finite element formulations. Unlike some alternative schemes which couple Lagrangian finite element models with smooth particle hydrodynamics, the present formulation makes no use of slidelines or penalty forces. The method has been implemented in a parallel, three dimensional computer code. Simulations of three dimensional orbital debris impact problems using this parallel hybrid particle-finite element code, show good agreement with experiment and good speedup in parallel computation. The simulations included single and multi-plate shields as well as aluminum and composite shielding materials. at an impact velocity of eleven kilometers per second.
Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

NASA Technical Reports Server (NTRS)

Yamakov, Vesselin I.

2016-01-01

This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.
An Empirical Development of Parallelization Guidelines for Time-Driven Simulation

DTIC Science & Technology

1989-12-01

wives, who though not Cub fans, put on a good show during our trip, to waich some games . I would also like to recognize the help of my professors at...program parallelization. in this research effort a Ballistic Missile Defense (BMD) time driven simulation program, developed by DESE Research and...continuously, or continuously with discrete changes superimposed. The distinguishing feature of these simulations is the interaction between discretely
Xyce Parallel Electronic Simulator : reference guide, version 2.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Predicting Protein Structure Using Parallel Genetic Algorithms.

DTIC Science & Technology

1994-12-01

Molecular dynamics attempts to simulate the protein folding process. However, the time steps required for this simulation are on the order of one...harmonics. These two factors have limited molecular dynamics simulations to less than a few nanoseconds (10-9 sec), even on today’s fastest supercomputers...By " Predicting rotein Structure D istribticfiar.. ................ Using Parallel Genetic Algorithms ,Avaiu " ’ •"... Dist THESIS I IGeorge H
Xyce™ Parallel Electronic Simulator Reference Guide Version 6.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

VizieR Online Data Catalog: Solar wind 3D magnetohydrodynamic simulation (Chhiber+, 2017)

NASA Astrophysics Data System (ADS)

Chhiber, R.; Subedi, P.; Usmanov, A. V.; Matthaeus, W. H.; Ruffolo, D.; Goldstein, M. L.; Parashar, T. N.

2017-08-01

We use a three-dimensional magnetohydrodynamic simulation of the solar wind to calculate cosmic-ray diffusion coefficients throughout the inner heliosphere (2Rȯ-3au). The simulation resolves large-scale solar wind flow, which is coupled to small-scale fluctuations through a turbulence model. Simulation results specify background solar wind fields and turbulence parameters, which are used to compute diffusion coefficients and study their behavior in the inner heliosphere. The parallel mean free path (mfp) is evaluated using quasi-linear theory, while the perpendicular mfp is determined from nonlinear guiding center theory with the random ballistic interpretation. Several runs examine varying turbulent energy and different solar source dipole tilts. We find that for most of the inner heliosphere, the radial mfp is dominated by diffusion parallel to the mean magnetic field; the parallel mfp remains at least an order of magnitude larger than the perpendicular mfp, except in the heliospheric current sheet, where the perpendicular mfp may be a few times larger than the parallel mfp. In the ecliptic region, the perpendicular mfp may influence the radial mfp at heliocentric distances larger than 1.5au; our estimations of the parallel mfp in the ecliptic region at 1 au agree well with the Palmer "consensus" range of 0.08-0.3au. Solar activity increases perpendicular diffusion and reduces parallel diffusion. The parallel mfp mostly varies with rigidity (P) as P.33, and the perpendicular mfp is weakly dependent on P. The mfps are weakly influenced by the choice of long-wavelength power spectra. (2 data files).
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

PubMed

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
Ion and Electron Energization in Guide Field Reconnection Outflows with Kinetic Riemann Simulations and Parallel Shock Simulations

NASA Astrophysics Data System (ADS)

Zhang, Q.; Drake, J. F.; Swisdak, M.

2017-12-01

How ions and electrons are energized in magnetic reconnection outflows is an essential topic throughout the heliosphere. Here we carry out guide field PIC Riemann simulations to explore the ion and electron energization mechanisms far downstream of the x-line. Riemann simulations, with their simple magnetic geometry, facilitate the study of the reconnection outflow far downstream of the x-line in much more detail than is possible with conventional reconnection simulations. We find that the ions get accelerated at rotational discontinuities, counter stream, and give rise to two slow shocks. We demonstrate that the energization mechanism at the slow shocks is essentially the same as that of parallel electrostatic shocks. Also, the electron confining electric potential at the slow shocks is driven by the counterstreaming beams, which tend to break the quasi-neutrality. Based on this picture, we build a kinetic model to self consistently predict the downstream ion and electron temperatures. Additional explorations using parallel shock simulations also imply that in a very low beta(0.001 0.01 for a modest guide field) regime, electron energization will be insignificant compared to the ion energization. Our model and the parallel shock simulations might be used as simple tools to understand and estimate the energization of ions and electrons and the energy partition far downstream of the x-line.
Parallel Tensor Compression for Large-Scale Scientific Data.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
Synchronous parallel system for emulation and discrete event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S. (Inventor)

1992-01-01

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.

PubMed

Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik

2008-03-01

Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
Synchronous Parallel System for Emulation and Discrete Event Simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S. (Inventor)

2001-01-01

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
Nanodroplet impact onto solid platinum surface: Spreading and bouncing

NASA Astrophysics Data System (ADS)

Lussier, Daniel; Ventikos, Yiannis

2009-11-01

The impact of droplets onto solid surfaces is found in a huge variety of natural and technological applications, from rain drops splashing on the pavement, to material manufacturing by molten droplet deposition. Taking inspiration from existing microfluidic technologies (i.e. lab-on-chip), there is increasing interest in the use of nanodroplets (D < 100 nm) for a number of applications such as drug delivery and semiconductor device manufacturing. However, as the size of the droplet is reduced into the nanoscale, the direct use of previously obtained macroscopic results is not guaranteed. At the nanoscale, important effects due to the molecular nature of the fluid, thermal fluctuations and reduced dimensionality can play a critical role in determining system dynamics. In this paper we present the results of large-scale, fully atomistic, three-dimensional molecular dynamics (MD) simulation of an argon nanodroplet (D = 18 nm, 54 000 atoms) impact onto a solid platinum surface, using the LAMMPS software package. The fluid argon is modeled using the well-known Lennard-Jones (LJ) potential, while the embedded-atom model (EAM) potential is used for the solid platinum. By varying both the impact velocities (10-1000 m/s) and the wettability of the solid surface a wide range of impact behaviors is observed, from smooth spreading, to bouncing recoil, pointing towards a wide array of potential applications.
Cold welding of gold nanoparticles on mica substrate: Self-adjustment and enhanced diffusion

PubMed Central

Cha, Song-Hyun; Park, Youmie; Han, Jeong Woo; Kim, Kyeounghak; Kim, Hyun-Seok; Jang, Hong-Lae; Cho, Seonho

2016-01-01

From the images of HR-TEM, FE-SEM, and AFM, the cold welding of gold nanoparticles (AuNPs) on a mica substrate is observed. The cold-welded gold nanoparticles of 25 nm diameters are found on the mica substrate in AFM measurement whereas the size of cold welding is limited to 10 nm for nanowires and 2~3 nm for nanofilms. Contrary to the nanowires requiring pressure, the AuNPs are able to rotate freely due to the attractive forces from the mica substrate and thus the cold welding goes along by adjusting lattice structures. The gold nanoparticles on the mica substrate are numerically modeled and whose physical characteristics are obtained by the molecular dynamic simulations of LAMMPS. The potential and kinetic energies of AuNPs on the mica substrate provide sufficient energy to overcome the diffusion barrier of gold atoms. After the cold welding, the regularity of lattice structure is maintained since the rotation of AuNPs is allowed due to the presence of mica substrate. It turns out that the growth of AuNPs can be controlled arbitrarily and the welded region is nearly perfect and provides the same crystal orientation and strength as the rest of the nanostructures. PMID:27597438
Cold welding of gold nanoparticles on mica substrate: Self-adjustment and enhanced diffusion

NASA Astrophysics Data System (ADS)

Cha, Song-Hyun; Park, Youmie; Han, Jeong Woo; Kim, Kyeounghak; Kim, Hyun-Seok; Jang, Hong-Lae; Cho, Seonho

2016-09-01

From the images of HR-TEM, FE-SEM, and AFM, the cold welding of gold nanoparticles (AuNPs) on a mica substrate is observed. The cold-welded gold nanoparticles of 25 nm diameters are found on the mica substrate in AFM measurement whereas the size of cold welding is limited to 10 nm for nanowires and 2~3 nm for nanofilms. Contrary to the nanowires requiring pressure, the AuNPs are able to rotate freely due to the attractive forces from the mica substrate and thus the cold welding goes along by adjusting lattice structures. The gold nanoparticles on the mica substrate are numerically modeled and whose physical characteristics are obtained by the molecular dynamic simulations of LAMMPS. The potential and kinetic energies of AuNPs on the mica substrate provide sufficient energy to overcome the diffusion barrier of gold atoms. After the cold welding, the regularity of lattice structure is maintained since the rotation of AuNPs is allowed due to the presence of mica substrate. It turns out that the growth of AuNPs can be controlled arbitrarily and the welded region is nearly perfect and provides the same crystal orientation and strength as the rest of the nanostructures.
Cold welding of gold nanoparticles on mica substrate: Self-adjustment and enhanced diffusion.

PubMed

Cha, Song-Hyun; Park, Youmie; Han, Jeong Woo; Kim, Kyeounghak; Kim, Hyun-Seok; Jang, Hong-Lae; Cho, Seonho

2016-09-06

From the images of HR-TEM, FE-SEM, and AFM, the cold welding of gold nanoparticles (AuNPs) on a mica substrate is observed. The cold-welded gold nanoparticles of 25 nm diameters are found on the mica substrate in AFM measurement whereas the size of cold welding is limited to 10 nm for nanowires and 2~3 nm for nanofilms. Contrary to the nanowires requiring pressure, the AuNPs are able to rotate freely due to the attractive forces from the mica substrate and thus the cold welding goes along by adjusting lattice structures. The gold nanoparticles on the mica substrate are numerically modeled and whose physical characteristics are obtained by the molecular dynamic simulations of LAMMPS. The potential and kinetic energies of AuNPs on the mica substrate provide sufficient energy to overcome the diffusion barrier of gold atoms. After the cold welding, the regularity of lattice structure is maintained since the rotation of AuNPs is allowed due to the presence of mica substrate. It turns out that the growth of AuNPs can be controlled arbitrarily and the welded region is nearly perfect and provides the same crystal orientation and strength as the rest of the nanostructures.
Graphene and its elemental analogue: A molecular dynamics view of fracture phenomenon

NASA Astrophysics Data System (ADS)

Rakib, Tawfiqur; Mojumder, Satyajit; Das, Sourav; Saha, Sourav; Motalab, Mohammad

2017-06-01

Graphene and some graphene like two dimensional materials; hexagonal boron nitride (hBN) and silicene have unique mechanical properties which severely limit the suitability of conventional theories used for common brittle and ductile materials to predict the fracture response of these materials. This study revealed the fracture response of graphene, hBN and silicene nanosheets under different tiny crack lengths by molecular dynamics (MD) simulations using LAMMPS. The useful strength of these two dimensional materials are determined by their fracture toughness. Our study shows a comparative analysis of mechanical properties among the elemental analogues of graphene and suggested that hBN can be a good substitute for graphene in terms of mechanical properties. We have also found that the pre-cracked sheets fail in brittle manner and their failure is governed by the strength of the atomic bonds at the crack tip. The MD prediction of fracture toughness shows significant difference with the fracture toughness determined by Griffth's theory of brittle failure which restricts the applicability of Griffith's criterion for these materials in case of nano-cracks. Moreover, the strengths measured in armchair and zigzag directions of nanosheets of these materials implied that the bonds in armchair direction have the stronger capability to resist crack propagation compared to zigzag direction.
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit

PubMed Central

Lawrie, David S.

2017-01-01

Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

PubMed Central

Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.

2011-01-01

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

NASA Technical Reports Server (NTRS)

Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

1994-01-01

The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
A Large Deviations Analysis of Certain Qualitative Properties of Parallel Tempering and Infinite Swapping Algorithms

DOE PAGES

Doll, J.; Dupuis, P.; Nyquist, P.

2017-02-08

Parallel tempering, or replica exchange, is a popular method for simulating complex systems. The idea is to run parallel simulations at different temperatures, and at a given swap rate exchange configurations between the parallel simulations. From the perspective of large deviations it is optimal to let the swap rate tend to infinity and it is possible to construct a corresponding simulation scheme, known as infinite swapping. In this paper we propose a novel use of large deviations for empirical measures for a more detailed analysis of the infinite swapping limit in the setting of continuous time jump Markov processes. Usingmore » the large deviations rate function and associated stochastic control problems we consider a diagnostic based on temperature assignments, which can be easily computed during a simulation. We show that the convergence of this diagnostic to its a priori known limit is a necessary condition for the convergence of infinite swapping. The rate function is also used to investigate the impact of asymmetries in the underlying potential landscape, and where in the state space poor sampling is most likely to occur.« less
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
A parallel computational model for GATE simulations.

PubMed

Rannou, F R; Vega-Acevedo, N; El Bitar, Z

2013-12-01

GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

Global MHD simulation of magnetosphere using HPF

NASA Astrophysics Data System (ADS)

Ogino, T.

We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 PEs of Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Menon, Suresh

1996-01-01

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Visualization Co-Processing of a CFD Simulation

NASA Technical Reports Server (NTRS)

Vaziri, Arsi

1999-01-01

OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.
Testing for carryover effects after cessation of treatments: a design approach.

PubMed

Sturdevant, S Gwynn; Lumley, Thomas

2016-08-02

Recently, trials addressing noisy measurements with diagnosis occurring by exceeding thresholds (such as diabetes and hypertension) have been published which attempt to measure carryover - the impact that treatment has on an outcome after cessation. The design of these trials has been criticised and simulations have been conducted which suggest that the parallel-designs used are not adequate to test this hypothesis; two solutions are that either a differing parallel-design or a cross-over design could allow for diagnosis of carryover. We undertook a systematic simulation study to determine the ability of a cross-over or a parallel-group trial design to detect carryover effects on incident hypertension in a population with prehypertension. We simulated blood pressure and focused on varying criteria to diagnose systolic hypertension. Using the difference in cumulative incidence hypertension to analyse parallel-group or cross-over trials resulted in none of the designs having acceptable Type I error rate. Under the null hypothesis of no carryover the difference is well above the nominal 5 % error rate. When a treatment is effective during the intervention period, reliable testing for a carryover effect is difficult. Neither parallel-group nor cross-over designs using the difference in cumulative incidence appear to be a feasible approach. Future trials should ensure their design and analysis is validated by simulation.
Xyce parallel electronic simulator reference guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

2014-01-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce parallel electronic simulator reference guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

2013-08-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Parallel Monte Carlo transport modeling in the context of a time-dependent, three-dimensional multi-physics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Procassini, R.J.

1997-12-31

The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less
A Generic Mesh Data Structure with Parallel Applications

ERIC Educational Resources Information Center

Cochran, William Kenneth, Jr.

2009-01-01

High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
A three-dimensional spectral algorithm for simulations of transition and turbulence

NASA Technical Reports Server (NTRS)

Zang, T. A.; Hussaini, M. Y.

1985-01-01

A spectral algorithm for simulating three dimensional, incompressible, parallel shear flows is described. It applies to the channel, to the parallel boundary layer, and to other shear flows with one wall bounded and two periodic directions. Representative applications to the channel and to the heated boundary layer are presented.
Parallel Performance of a Combustion Chemistry Simulation

DOE PAGES

Skinner, Gregg; Eigenmann, Rudolf

1995-01-01

We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic

Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

NASA Astrophysics Data System (ADS)

Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin

2016-08-01

This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.

1999-10-14

Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-staticmore » solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.« less
Xyce parallel electronic simulator reference guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

2014-03-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Computer Science Techniques Applied to Parallel Atomistic Simulation

NASA Astrophysics Data System (ADS)

Nakano, Aiichiro

1998-03-01

Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.

An intelligent processing environment for real-time simulation

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Wells, Buren Earl, Jr.

1988-01-01

The development of a highly efficient and thus truly intelligent processing environment for real-time general purpose simulation of continuous systems is described. Such an environment can be created by mapping the simulation process directly onto the University of Alamba's OPERA architecture. To facilitate this effort, the field of continuous simulation is explored, highlighting areas in which efficiency can be improved. Areas in which parallel processing can be applied are also identified, and several general OPERA type hardware configurations that support improved simulation are investigated. Three direct execution parallel processing environments are introduced, each of which greatly improves efficiency by exploiting distinct areas of the simulation process. These suggested environments are candidate architectures around which a highly intelligent real-time simulation configuration can be developed.
Particle simulation of plasmas on the massively parallel processor

NASA Technical Reports Server (NTRS)

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Use of Parallel Micro-Platform for the Simulation the Space Exploration

NASA Astrophysics Data System (ADS)

Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen

The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
Institutional Computing Executive Group Review of Multi-programmatic & Institutional Computing, Fiscal Year 2005 and 2006

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langer, S; Rotman, D; Schwegler, E

The Institutional Computing Executive Group (ICEG) review of FY05-06 Multiprogrammatic and Institutional Computing (M and IC) activities is presented in the attached report. In summary, we find that the M and IC staff does an outstanding job of acquiring and supporting a wide range of institutional computing resources to meet the programmatic and scientific goals of LLNL. The responsiveness and high quality of support given to users and the programs investing in M and IC reflects the dedication and skill of the M and IC staff. M and IC has successfully managed serial capacity, parallel capacity, and capability computing resources.more » Serial capacity computing supports a wide range of scientific projects which require access to a few high performance processors within a shared memory computer. Parallel capacity computing supports scientific projects that require a moderate number of processors (up to roughly 1000) on a parallel computer. Capability computing supports parallel jobs that push the limits of simulation science. M and IC has worked closely with Stockpile Stewardship, and together they have made LLNL a premier institution for computational and simulation science. Such a standing is vital to the continued success of laboratory science programs and to the recruitment and retention of top scientists. This report provides recommendations to build on M and IC's accomplishments and improve simulation capabilities at LLNL. We recommend that institution fully fund (1) operation of the atlas cluster purchased in FY06 to support a few large projects; (2) operation of the thunder and zeus clusters to enable 'mid-range' parallel capacity simulations during normal operation and a limited number of large simulations during dedicated application time; (3) operation of the new yana cluster to support a wide range of serial capacity simulations; (4) improvements to the reliability and performance of the Lustre parallel file system; (5) support for the new GDO petabyte-class storage facility on the green network for use in data intensive external collaborations; and (6) continued support for visualization and other methods for analyzing large simulations. We also recommend that M and IC begin planning in FY07 for the next upgrade of its parallel clusters. LLNL investments in M and IC have resulted in a world-class simulation capability leading to innovative science. We thank the LLNL management for its continued support and thank the M and IC staff for its vision and dedicated efforts to make it all happen.« less
A foundation for initial attack simulation: the Fried and Fried fire containment model

Treesearch

Jeremy S. Fried; Burton D. Fried

2010-01-01

The Fried and Fried containment algorithm, which models the effect of suppression efforts on fire growth, allows simulation of any mathematically representable fire shape, provides for "head" and "tail" attack tactics as well as parallel attack (building fireline parallel to but at some offset distance from the free-burning fire perimeter, alone and...
Parallel 3D Finite Element Numerical Modelling of DC Electron Guns

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prudencio, E.; Candel, A.; Ge, L.

2008-02-04

In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation timemore » from days to some hours.« less
Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

NASA Technical Reports Server (NTRS)

Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

2009-01-01

A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.
Long-time atomistic simulations with the Parallel Replica Dynamics method

NASA Astrophysics Data System (ADS)

Perez, Danny

Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Xyce parallel electronic simulator : reference guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.

2011-05-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to runmore » on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.« less
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

PubMed Central

Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

2014-01-01

Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Schnek: A C++ library for the development of parallel simulation codes on regular grids

NASA Astrophysics Data System (ADS)

Schmitz, Holger

2018-05-01

A large number of algorithms across the field of computational physics are formulated on grids with a regular topology. We present Schnek, a library that enables fast development of parallel simulations on regular grids. Schnek contains a number of easy-to-use modules that greatly reduce the amount of administrative code for large-scale simulation codes. The library provides an interface for reading simulation setup files with a hierarchical structure. The structure of the setup file is translated into a hierarchy of simulation modules that the developer can specify. The reader parses and evaluates mathematical expressions and initialises variables or grid data. This enables developers to write modular and flexible simulation codes with minimal effort. Regular grids of arbitrary dimension are defined as well as mechanisms for defining physical domain sizes, grid staggering, and ghost cells on these grids. Ghost cells can be exchanged between neighbouring processes using MPI with a simple interface. The grid data can easily be written into HDF5 files using serial or parallel I/O.
Inflated speedups in parallel simulations via malloc()

NASA Technical Reports Server (NTRS)

Nicol, David M.

1990-01-01

Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.
The Relation between Reconnected Flux, the Parallel Electric Field, and the Reconnection Rate in a Three-Dimensional Kinetic Simulation of Magnetic Reconnection

NASA Astrophysics Data System (ADS)

Wendel, D. E.; Olson, D. K.; Hesse, M.; Karimabadi, H.; Daughton, W. S.

2013-12-01

We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection of a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of topological features such as separators and null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a correspondence between the locus of changes in magnetic connectivity, or the quasi-separatrix layer, and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we compare the distribution of parallel electric fields along field lines with the reconnection rate. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field, while the contribution from high amplitude parallel fluctuations, such as electron holes, is negligible. The results impact the determination of reconnection sites within models of 3D turbulent reconnection as well as the inference of reconnection rates from in situ spacecraft measurements. It is difficult through direct observation to isolate the locus of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the partial sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

NASA Astrophysics Data System (ADS)

Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji

2013-03-01

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
Biocellion: accelerating computer simulation of multicellular biological system models

PubMed Central

Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

2014-01-01

Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572
LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly Parallel Simulation Approach

DTIC Science & Technology

2014-09-01

simulation time frame from 30 days to one year. This was enabled by porting the simulation to the Pleiades supercomputer at NASA Ames Research Center, a...including the motivation for changes to our past approach. We then present the software implementation (3) on the NASA Ames Pleiades supercomputer...significantly updated since last year’s paper [25]. The main incentive for that was the shift to a highly parallel approach in order to utilize the Pleiades
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

NASA Technical Reports Server (NTRS)

O'Connell, Matthew; Karman, Steve L.

2015-01-01

Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

PubMed Central

Calcagno, Cristina; Coppo, Mario

2014-01-01

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327
On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

PubMed

Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo

2014-01-01

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

Bui, Trong T.

1999-01-01

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

Fast I/O for Massively Parallel Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew T.

1996-01-01

The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
Implementation and Assessment of a Virtual Laboratory of Parallel Robots Developed for Engineering Students

ERIC Educational Resources Information Center

Gil, Arturo; Peidró, Adrián; Reinoso, Óscar; Marín, José María

2017-01-01

This paper presents a tool, LABEL, oriented to the teaching of parallel robotics. The application, organized as a set of tools developed using Easy Java Simulations, enables the study of the kinematics of parallel robotics. A set of classical parallel structures was implemented such that LABEL can solve the inverse and direct kinematic problem of…
The Parallel System for Integrating Impact Models and Sectors (pSIMS)

NASA Technical Reports Server (NTRS)

Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

2014-01-01

We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang

2015-01-15

This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less
Special purpose parallel computer architecture for real-time control and simulation in robotic applications

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1993-01-01

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
High-performance computational fluid dynamics: a custom-code approach

NASA Astrophysics Data System (ADS)

Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

2016-07-01

We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Seal, Sudip K

2011-01-01

In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallelmore » execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.« less
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
THC-MP: High performance numerical simulation of reactive transport and multiphase flow in porous media

NASA Astrophysics Data System (ADS)

Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu

2015-07-01

The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations

NASA Astrophysics Data System (ADS)

Hunsaker, Isaac L.

Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Multi-threaded parallel simulation of non-local non-linear problems in ultrashort laser pulse propagation in the presence of plasma

NASA Astrophysics Data System (ADS)

Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger

2011-06-01

We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
cellGPU: Massively parallel simulations of dynamic vertex models

NASA Astrophysics Data System (ADS)

Sussman, Daniel M.

2017-10-01

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
Moose: An Open-Source Framework to Enable Rapid Development of Collaborative, Multi-Scale, Multi-Physics Simulation Tools

NASA Astrophysics Data System (ADS)

Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.

2014-12-01

The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
Parallel phase-shifting self-interference digital holography with faithful reconstruction using compressive sensing

NASA Astrophysics Data System (ADS)

Wan, Yuhong; Man, Tianlong; Wu, Fan; Kim, Myung K.; Wang, Dayong

2016-11-01

We present a new self-interference digital holographic approach that allows single-shot capturing three-dimensional intensity distribution of the spatially incoherent objects. The Fresnel incoherent correlation holographic microscopy is combined with parallel phase-shifting technique to instantaneously obtain spatially multiplexed phase-shifting holograms. The compressive-sensing-based reconstruction algorithm is implemented to reconstruct the original object from the under sampled demultiplexed holograms. The scheme is verified with simulations. The validity of the proposed method is experimentally demonstrated in an indirectly way by simulating the use of specific parallel phase-shifting recording device.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Evaluation of copper, aluminum, and nickel interatomic potentials on predicting the elastic properties

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rassoulinejad-Mousavi, Seyed Moein; Mao, Yijin; Zhang, Yuwen, E-mail: zhangyu@missouri.edu

Choice of appropriate force field is one of the main concerns of any atomistic simulation that needs to be seriously considered in order to yield reliable results. Since investigations on the mechanical behavior of materials at micro/nanoscale have been becoming much more widespread, it is necessary to determine an adequate potential which accurately models the interaction of the atoms for desired applications. In this framework, reliability of multiple embedded atom method based interatomic potentials for predicting the elastic properties was investigated. Assessments were carried out for different copper, aluminum, and nickel interatomic potentials at room temperature which is considered asmore » the most applicable case. Examined force fields for the three species were taken from online repositories of National Institute of Standards and Technology, as well as the Sandia National Laboratories, the LAMMPS database. Using molecular dynamic simulations, the three independent elastic constants, C{sub 11}, C{sub 12}, and C{sub 44}, were found for Cu, Al, and Ni cubic single crystals. Voigt-Reuss-Hill approximation was then implemented to convert elastic constants of the single crystals into isotropic polycrystalline elastic moduli including bulk modulus, shear modulus, and Young's modulus as well as Poisson's ratio. Simulation results from massive molecular dynamic were compared with available experimental data in the literature to justify the robustness of each potential for each species. Eventually, accurate interatomic potentials have been recommended for finding each of the elastic properties of the pure species. Exactitude of the elastic properties was found to be sensitive to the choice of the force fields. Those potentials that were fitted for a specific compound may not necessarily work accurately for all the existing pure species. Tabulated results in this paper might be used as a benchmark to increase assurance of using the interatomic potential that was designated for a problem.« less
Performance issues for domain-oriented time-driven distributed simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1987-01-01

It has long been recognized that simulations form an interesting and important class of computations that may benefit from distributed or parallel processing. Since the point of parallel processing is improved performance, the recent proliferation of multiprocessors requires that we consider the performance issues that naturally arise when attempting to implement a distributed simulation. Three such issues are: (1) the problem of mapping the simulation onto the architecture, (2) the possibilities for performing redundant computation in order to reduce communication, and (3) the avoidance of deadlock due to distributed contention for message-buffer space. These issues are discussed in the context of a battlefield simulation implemented on a medium-scale multiprocessor message-passing architecture.
User's guide of TOUGH2-EGS-MP: A Massively Parallel Simulator with Coupled Geomechanics for Fluid and Heat Flow in Enhanced Geothermal Systems VERSION 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao

2013-12-01

TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporatedmore » into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.« less
Modularized Parallel Neutron Instrument Simulation on the TeraGrid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Meili; Cobb, John W; Hagen, Mark E

2007-01-01

In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serialmore » instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.« less
Glycan Reader is improved to recognize most sugar types and chemical modifications in the Protein Data Bank.

PubMed

Park, Sang-Jun; Lee, Jumin; Patel, Dhilon S; Ma, Hongjing; Lee, Hui Sun; Jo, Sunhwan; Im, Wonpil

2017-10-01

Glycans play a central role in many essential biological processes. Glycan Reader was originally developed to simplify the reading of Protein Data Bank (PDB) files containing glycans through the automatic detection and annotation of sugars and glycosidic linkages between sugar units and to proteins, all based on atomic coordinates and connectivity information. Carbohydrates can have various chemical modifications at different positions, making their chemical space much diverse. Unfortunately, current PDB files do not provide exact annotations for most carbohydrate derivatives and more than 50% of PDB glycan chains have at least one carbohydrate derivative that could not be correctly recognized by the original Glycan Reader. Glycan Reader has been improved and now identifies most sugar types and chemical modifications (including various glycolipids) in the PDB, and both PDB and PDBx/mmCIF formats are supported. CHARMM-GUI Glycan Reader is updated to generate the simulation system and input of various glycoconjugates with most sugar types and chemical modifications. It also offers a new functionality to edit the glycan structures through addition/deletion/modification of glycosylation types, sugar types, chemical modifications, glycosidic linkages, and anomeric states. The simulation system and input files can be used for CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM. Glycan Fragment Database in GlycanStructure.Org is also updated to provide an intuitive glycan sequence search tool for complex glycan structures with various chemical modifications in the PDB. http://www.charmm-gui.org/input/glycan and http://www.glycanstructure.org. wonpil@lehigh.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
Specification and Analysis of Parallel Machine Architecture

DTIC Science & Technology

1990-03-17

Parallel Machine Architeture C.V. Ramamoorthy Computer Science Division Dept. of Electrical Engineering and Computer Science University of California...capacity. (4) Adaptive: The overhead in resolution of deadlocks, etc. should be in proportion to their frequency. (5) Avoid rollbacks: Rollbacks can be...snapshots of system state graphically at a rate proportional to simulation time. Some of the examples are as follow: (1) When the simulation clock of
Argonne Simulation Framework for Intelligent Transportation Systems

DOT National Transportation Integrated Search

1996-01-01

A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate

NASA Technical Reports Server (NTRS)

Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel

1994-01-01

This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB.

PubMed

Klingbeil, Guido; Erban, Radek; Giles, Mike; Maini, Philip K

2011-04-15

The importance of stochasticity in biological systems is becoming increasingly recognized and the computational cost of biologically realistic stochastic simulations urgently requires development of efficient software. We present a new software tool STOCHSIMGPU that exploits graphics processing units (GPUs) for parallel stochastic simulations of biological/chemical reaction systems and show that significant gains in efficiency can be made. It is integrated into MATLAB and works with the Systems Biology Toolbox 2 (SBTOOLBOX2) for MATLAB. The GPU-based parallel implementation of the Gillespie stochastic simulation algorithm (SSA), the logarithmic direct method (LDM) and the next reaction method (NRM) is approximately 85 times faster than the sequential implementation of the NRM on a central processing unit (CPU). Using our software does not require any changes to the user's models, since it acts as a direct replacement of the stochastic simulation software of the SBTOOLBOX2. The software is open source under the GPL v3 and available at http://www.maths.ox.ac.uk/cmb/STOCHSIMGPU. The web site also contains supplementary information. klingbeil@maths.ox.ac.uk Supplementary data are available at Bioinformatics online.
Predicting Flows of Rarefied Gases

NASA Technical Reports Server (NTRS)

LeBeau, Gerald J.; Wilmoth, Richard G.

2005-01-01

DSMC Analysis Code (DAC) is a flexible, highly automated, easy-to-use computer program for predicting flows of rarefied gases -- especially flows of upper-atmospheric, propulsion, and vented gases impinging on spacecraft surfaces. DAC implements the direct simulation Monte Carlo (DSMC) method, which is widely recognized as standard for simulating flows at densities so low that the continuum-based equations of computational fluid dynamics are invalid. DAC enables users to model complex surface shapes and boundary conditions quickly and easily. The discretization of a flow field into computational grids is automated, thereby relieving the user of a traditionally time-consuming task while ensuring (1) appropriate refinement of grids throughout the computational domain, (2) determination of optimal settings for temporal discretization and other simulation parameters, and (3) satisfaction of the fundamental constraints of the method. In so doing, DAC ensures an accurate and efficient simulation. In addition, DAC can utilize parallel processing to reduce computation time. The domain decomposition needed for parallel processing is completely automated, and the software employs a dynamic load-balancing mechanism to ensure optimal parallel efficiency throughout the simulation.
Enabling parallel simulation of large-scale HPC network systems

DOE PAGES

Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; ...

2016-04-07

Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
Enabling parallel simulation of large-scale HPC network systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.

Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
A method for data handling numerical results in parallel OpenFOAM simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anton, Alin; Muntean, Sebastian

Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Parallel machine architecture and compiler design facilities

NASA Technical Reports Server (NTRS)

Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

1990-01-01

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
JETSPIN: A specific-purpose open-source software for simulations of nanofiber electrospinning

NASA Astrophysics Data System (ADS)

Lauricella, Marco; Pontrelli, Giuseppe; Coluzza, Ivan; Pisignano, Dario; Succi, Sauro

2015-12-01

We present the open-source computer program JETSPIN, specifically designed to simulate the electrospinning process of nanofibers. Its capabilities are shown with proper reference to the underlying model, as well as a description of the relevant input variables and associated test-case simulations. The various interactions included in the electrospinning model implemented in JETSPIN are discussed in detail. The code is designed to exploit different computational architectures, from single to parallel processor workstations. This paper provides an overview of JETSPIN, focusing primarily on its structure, parallel implementations, functionality, performance, and availability.
Xyce™ Parallel Electronic Simulator Reference Guide, Version 6.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

2016-06-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Biocellion: accelerating computer simulation of multicellular biological system models.

PubMed

Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

2014-11-01

Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2013-01-01

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
On the relationship between parallel computation and graph embedding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gupta, A.K.

1989-01-01

The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

NASA Astrophysics Data System (ADS)

Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

2017-10-01

In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
A parallel finite element procedure for contact-impact problems using edge-based smooth triangular element and GPU

NASA Astrophysics Data System (ADS)

Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang

2018-04-01

The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Unbiased Rare Event Sampling in Spatial Stochastic Systems Biology Models Using a Weighted Ensemble of Trajectories

PubMed Central

Donovan, Rory M.; Tapia, Jose-Juan; Sullivan, Devin P.; Faeder, James R.; Murphy, Robert F.; Dittrich, Markus; Zuckerman, Daniel M.

2016-01-01

The long-term goal of connecting scales in biological simulation can be facilitated by scale-agnostic methods. We demonstrate that the weighted ensemble (WE) strategy, initially developed for molecular simulations, applies effectively to spatially resolved cell-scale simulations. The WE approach runs an ensemble of parallel trajectories with assigned weights and uses a statistical resampling strategy of replicating and pruning trajectories to focus computational effort on difficult-to-sample regions. The method can also generate unbiased estimates of non-equilibrium and equilibrium observables, sometimes with significantly less aggregate computing time than would be possible using standard parallelization. Here, we use WE to orchestrate particle-based kinetic Monte Carlo simulations, which include spatial geometry (e.g., of organelles, plasma membrane) and biochemical interactions among mobile molecular species. We study a series of models exhibiting spatial, temporal and biochemical complexity and show that although WE has important limitations, it can achieve performance significantly exceeding standard parallel simulation—by orders of magnitude for some observables. PMID:26845334
Parallel-distributed mobile robot simulator

NASA Astrophysics Data System (ADS)

Okada, Hiroyuki; Sekiguchi, Minoru; Watanabe, Nobuo

1996-06-01

The aim of this project is to achieve an autonomous learning and growth function based on active interaction with the real world. It should also be able to autonomically acquire knowledge about the context in which jobs take place, and how the jobs are executed. This article describes a parallel distributed movable robot system simulator with an autonomous learning and growth function. The autonomous learning and growth function which we are proposing is characterized by its ability to learn and grow through interaction with the real world. When the movable robot interacts with the real world, the system compares the virtual environment simulation with the interaction result in the real world. The system then improves the virtual environment to match the real-world result more closely. This the system learns and grows. It is very important that such a simulation is time- realistic. The parallel distributed movable robot simulator was developed to simulate the space of a movable robot system with an autonomous learning and growth function. The simulator constructs a virtual space faithful to the real world and also integrates the interfaces between the user, the actual movable robot and the virtual movable robot. Using an ultrafast CG (computer graphics) system (FUJITSU AG series), time-realistic 3D CG is displayed.
Development of massive multilevel molecular dynamics simulation program, Platypus (PLATform for dYnamic Protein Unified Simulation), for the elucidation of protein functions.

PubMed

Takano, Yu; Nakata, Kazuto; Yonezawa, Yasushige; Nakamura, Haruki

2016-05-05

A massively parallel program for quantum mechanical-molecular mechanical (QM/MM) molecular dynamics simulation, called Platypus (PLATform for dYnamic Protein Unified Simulation), was developed to elucidate protein functions. The speedup and the parallelization ratio of Platypus in the QM and QM/MM calculations were assessed for a bacteriochlorophyll dimer in the photosynthetic reaction center (DIMER) on the K computer, a massively parallel computer achieving 10 PetaFLOPs with 705,024 cores. Platypus exhibited the increase in speedup up to 20,000 core processors at the HF/cc-pVDZ and B3LYP/cc-pVDZ, and up to 10,000 core processors by the CASCI(16,16)/6-31G** calculations. We also performed excited QM/MM-MD simulations on the chromophore of Sirius (SIRIUS) in water. Sirius is a pH-insensitive and photo-stable ultramarine fluorescent protein. Platypus accelerated on-the-fly excited-state QM/MM-MD simulations for SIRIUS in water, using over 4000 core processors. In addition, it also succeeded in 50-ps (200,000-step) on-the-fly excited-state QM/MM-MD simulations for the SIRIUS in water. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.

Parallel Stochastic discrete event simulation of calcium dynamics in neuron.

PubMed

Ishlam Patoary, Mohammad Nazrul; Tropper, Carl; McDougal, Robert A; Zhongwei, Lin; Lytton, William W

2017-09-26

The intra-cellular calcium signaling pathways of a neuron depends on both biochemical reactions and diffusions. Some quasi-isolated compartments (e.g. spines) are so small and calcium concentrations are so low that one extra molecule diffusing in by chance can make a nontrivial difference in its concentration (percentage-wise). These rare events can affect dynamics discretely in such way that they cannot be evaluated by a deterministic simulation. Stochastic models of such a system provide a more detailed understanding of these systems than existing deterministic models because they capture their behavior at a molecular level. Our research focuses on the development of a high performance parallel discrete event simulation environment, Neuron Time Warp (NTW), which is intended for use in the parallel simulation of stochastic reaction-diffusion systems such as intra-calcium signaling. NTW is integrated with NEURON, a simulator which is widely used within the neuroscience community. We simulate two models, a calcium buffer and a calcium wave model. The calcium buffer model is employed in order to verify the correctness and performance of NTW by comparing it to a serial deterministic simulation in NEURON. We also derived a discrete event calcium wave model from a deterministic model using the stochastic IP3R structure.
mdFoam+: Advanced molecular dynamics in OpenFOAM

NASA Astrophysics Data System (ADS)

Longshaw, S. M.; Borg, M. K.; Ramisetti, S. B.; Zhang, J.; Lockerby, D. A.; Emerson, D. R.; Reese, J. M.

2018-03-01

This paper introduces mdFoam+, which is an MPI parallelised molecular dynamics (MD) solver implemented entirely within the OpenFOAM software framework. It is open-source and released under the same GNU General Public License (GPL) as OpenFOAM. The source code is released as a publicly open software repository that includes detailed documentation and tutorial cases. Since mdFoam+ is designed entirely within the OpenFOAM C++ object-oriented framework, it inherits a number of key features. The code is designed for extensibility and flexibility, so it is aimed first and foremost as an MD research tool, in which new models and test cases can be developed and tested rapidly. Implementing mdFoam+ in OpenFOAM also enables easier development of hybrid methods that couple MD with continuum-based solvers. Setting up MD cases follows the standard OpenFOAM format, as mdFoam+ also relies upon the OpenFOAM dictionary-based directory structure. This ensures that useful pre- and post-processing capabilities provided by OpenFOAM remain available even though the fully Lagrangian nature of an MD simulation is not typical of most OpenFOAM applications. Results show that mdFoam+ compares well to another well-known MD code (e.g. LAMMPS) in terms of benchmark problems, although it also has additional functionality that does not exist in other open-source MD codes.
Design of a bounded wave EMP (Electromagnetic Pulse) simulator

NASA Astrophysics Data System (ADS)

Sevat, P. A. A.

1989-06-01

Electromagnetic Pulse (EMP) simulators are used to simulate the EMP generated by a nuclear weapon and to harden equipment against the effects of EMP. At present, DREO has a 1 m EMP simulator for testing computer terminal size equipment. To develop the R and D capability for testing larger objects, such as a helicopter, a much bigger threat level facility is required. This report concerns the design of a bounded wave EMP simulator suitable for testing large size equipment. Different types of simulators are described and their pros and cons are discussed. A bounded wave parallel plate type simulator is chosen for it's efficiency and the least environmental impact. Detailed designs are given for 6 m and 10 m parallel plate type wire grid simulators. Electromagnetic fields inside and outside the simulators are computed. Preliminary specifications for a pulse generator required for the simulator are also given. Finally, the electromagnetic fields radiated from the simulator are computed and discussed.
Durham extremely large telescope adaptive optics simulation platform.

PubMed

Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard

2007-03-01

Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
Super and parallel computers and their impact on civil engineering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamat, M.P.

1986-01-01

This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
Evaluation of Parallel Analysis Methods for Determining the Number of Factors

ERIC Educational Resources Information Center

Crawford, Aaron V.; Green, Samuel B.; Levy, Roy; Lo, Wen-Juo; Scott, Lietta; Svetina, Dubravka; Thompson, Marilyn S.

2010-01-01

Population and sample simulation approaches were used to compare the performance of parallel analysis using principal component analysis (PA-PCA) and parallel analysis using principal axis factoring (PA-PAF) to identify the number of underlying factors. Additionally, the accuracies of the mean eigenvalue and the 95th percentile eigenvalue criteria…
Design of a massively parallel computer using bit serial processing elements

NASA Technical Reports Server (NTRS)

Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

1995-01-01

A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

PubMed Central

Bajaj, Chandrajit

2009-01-01

Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
Progress in Unsteady Turbopump Flow Simulations Using Overset Grid Systems

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Chan, William; Kwak, Dochan

2002-01-01

This viewgraph presentation provides information on unsteady flow simulations for the Second Generation RLV (Reusable Launch Vehicle) baseline turbopump. Three impeller rotations were simulated by using a 34.3 million grid points model. MPI/OpenMP hybrid parallelism and MLP shared memory parallelism has been implemented and benchmarked in INS3D, an incompressible Navier-Stokes solver. For RLV turbopump simulations a speed up of more than 30 times has been obtained. Moving boundary capability is obtained by using the DCF module. Scripting capability from CAD geometry to solution is developed. Unsteady flow simulations for advanced consortium impeller/diffuser by using a 39 million grid points model are currently underway. 1.2 impeller rotations are completed. The fluid/structure coupling is initiated.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

Turner, Mark G.; Topp, David A.

1998-01-01

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems

PubMed Central

Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo

2014-01-01

Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957
Modeling the nanoscale viscoelasticity of fluids by bridging non-Markovian fluctuating hydrodynamics and molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Voulgarakis, Nikolaos K.; Satish, Siddarth; Chu, Jhih-Wei

2009-12-01

A multiscale computational method is developed to model the nanoscale viscoelasticity of fluids by bridging non-Markovian fluctuating hydrodynamics (FHD) and molecular dynamics (MD) simulations. To capture the elastic responses that emerge at small length scales, we attach an additional rheological model parallel to the macroscopic constitutive equation of a fluid. The widely used linear Maxwell model is employed as a working choice; other models can be used as well. For a fluid that is Newtonian in the macroscopic limit, this approach results in a parallel Newtonian-Maxwell model. For water, argon, and an ionic liquid, the power spectrum of momentum field autocorrelation functions of the parallel Newtonian-Maxwell model agrees very well with those calculated from all-atom MD simulations. To incorporate thermal fluctuations, we generalize the equations of FHD to work with non-Markovian rheological models and colored noise. The fluctuating stress tensor (white noise) is integrated in time in the same manner as its dissipative counterpart and numerical simulations indicate that this approach accurately preserves the set temperature in a FHD simulation. By mapping position and velocity vectors in the molecular representation onto field variables, we bridge the non-Markovian FHD with atomistic MD simulations. Through this mapping, we quantitatively determine the transport coefficients of the parallel Newtonian-Maxwell model for water and argon from all-atom MD simulations. For both fluids, a significant enhancement in elastic responses is observed as the wave number of hydrodynamic modes is reduced to a few nanometers. The mapping from particle to field representations and the perturbative strategy of developing constitutive equations provide a useful framework for modeling the nanoscale viscoelasticity of fluids.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kononenko, Oleksiy

2015-03-26

ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
Re-forming supercritical quasi-parallel shocks. I - One- and two-dimensional simulations

NASA Technical Reports Server (NTRS)

Thomas, V. A.; Winske, D.; Omidi, N.

1990-01-01

The process of reforming supercritical quasi-parallel shocks is investigated using one-dimensional and two-dimensional hybrid (particle ion, massless fluid electron) simulations both of shocks and of simpler two-stream interactions. It is found that the supercritical quasi-parallel shock is not steady. Instread of a well-defined shock ramp between upstream and downstream states that remains at a fixed position in the flow, the ramp periodically steepens, broadens, and then reforms upstream of its former position. It is concluded that the wave generation process is localized at the shock ramp and that the reformation process proceeds in the absence of upstream perturbations intersecting the shock.
Limits to high-speed simulations of spiking neural networks using general-purpose computers.

PubMed

Zenke, Friedemann; Gerstner, Wulfram

2014-01-01

To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
LASER APPLICATIONS AND OTHER TOPICS IN QUANTUM ELECTRONICS: Application of the stochastic parallel gradient descent algorithm for numerical simulation and analysis of the coherent summation of radiation from fibre amplifiers

NASA Astrophysics Data System (ADS)

Zhou, Pu; Wang, Xiaolin; Li, Xiao; Chen, Zilum; Xu, Xiaojun; Liu, Zejin

2009-10-01

Coherent summation of fibre laser beams, which can be scaled to a relatively large number of elements, is simulated by using the stochastic parallel gradient descent (SPGD) algorithm. The applicability of this algorithm for coherent summation is analysed and its optimisaton parameters and bandwidth limitations are studied.
Parallelization of a Fully-Distributed Hydrologic Model using Sub-basin Partitioning

NASA Astrophysics Data System (ADS)

Vivoni, E. R.; Mniszewski, S.; Fasel, P.; Springer, E.; Ivanov, V. Y.; Bras, R. L.

2005-12-01

A primary obstacle towards advances in watershed simulations has been the limited computational capacity available to most models. The growing trend of model complexity, data availability and physical representation has not been matched by adequate developments in computational efficiency. This situation has created a serious bottleneck which limits existing distributed hydrologic models to small domains and short simulations. In this study, we present novel developments in the parallelization of a fully-distributed hydrologic model. Our work is based on the TIN-based Real-time Integrated Basin Simulator (tRIBS), which provides continuous hydrologic simulation using a multiple resolution representation of complex terrain based on a triangulated irregular network (TIN). While the use of TINs reduces computational demand, the sequential version of the model is currently limited over large basins (>10,000 km2) and long simulation periods (>1 year). To address this, a parallel MPI-based version of the tRIBS model has been implemented and tested using high performance computing resources at Los Alamos National Laboratory. Our approach utilizes domain decomposition based on sub-basin partitioning of the watershed. A stream reach graph based on the channel network structure is used to guide the sub-basin partitioning. Individual sub-basins or sub-graphs of sub-basins are assigned to separate processors to carry out internal hydrologic computations (e.g. rainfall-runoff transformation). Routed streamflow from each sub-basin forms the major hydrologic data exchange along the stream reach graph. Individual sub-basins also share subsurface hydrologic fluxes across adjacent boundaries. We demonstrate how the sub-basin partitioning provides computational feasibility and efficiency for a set of test watersheds in northeastern Oklahoma. We compare the performance of the sequential and parallelized versions to highlight the efficiency gained as the number of processors increases. We also discuss how the coupled use of TINs and parallel processing can lead to feasible long-term simulations in regional watersheds while preserving basin properties at high-resolution.
Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations

NASA Technical Reports Server (NTRS)

Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

2013-01-01

Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.
Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

NASA Technical Reports Server (NTRS)

Morgan, Philip E.

2004-01-01

This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

Implementation of molecular dynamics and its extensions with the coarse-grained UNRES force field on massively parallel systems; towards millisecond-scale simulations of protein structure, dynamics, and thermodynamics

PubMed Central

Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.

2010-01-01

We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
Acceleration of discrete stochastic biochemical simulation using GPGPU.

PubMed

Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira

2015-01-01

For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130.
Acceleration of discrete stochastic biochemical simulation using GPGPU

PubMed Central

Sumiyoshi, Kei; Hirata, Kazuki; Hiroi, Noriko; Funahashi, Akira

2015-01-01

For systems made up of a small number of molecules, such as a biochemical network in a single cell, a simulation requires a stochastic approach, instead of a deterministic approach. The stochastic simulation algorithm (SSA) simulates the stochastic behavior of a spatially homogeneous system. Since stochastic approaches produce different results each time they are used, multiple runs are required in order to obtain statistical results; this results in a large computational cost. We have implemented a parallel method for using SSA to simulate a stochastic model; the method uses a graphics processing unit (GPU), which enables multiple realizations at the same time, and thus reduces the computational time and cost. During the simulation, for the purpose of analysis, each time course is recorded at each time step. A straightforward implementation of this method on a GPU is about 16 times faster than a sequential simulation on a CPU with hybrid parallelization; each of the multiple simulations is run simultaneously, and the computational tasks within each simulation are parallelized. We also implemented an improvement to the memory access and reduced the memory footprint, in order to optimize the computations on the GPU. We also implemented an asynchronous data transfer scheme to accelerate the time course recording function. To analyze the acceleration of our implementation on various sizes of model, we performed SSA simulations on different model sizes and compared these computation times to those for sequential simulations with a CPU. When used with the improved time course recording function, our method was shown to accelerate the SSA simulation by a factor of up to 130. PMID:25762936
Simulating neural systems with Xyce.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schiek, Richard Louis; Thornquist, Heidi K.; Mei, Ting

2012-12-01

Sandias parallel circuit simulator, Xyce, can address large scale neuron simulations in a new way extending the range within which one can perform high-fidelity, multi-compartment neuron simulations. This report documents the implementation of neuron devices in Xyce, their use in simulation and analysis of neuron systems.
Magnitude of parallel pseudo potential in a magnetosonic shock wave

NASA Astrophysics Data System (ADS)

Ohsawa, Yukiharu

2018-05-01

The parallel pseudo potential F, which is the integral of the parallel electric field along the magnetic field, in a large-amplitude magnetosonic pulse (shock wave) is theoretically studied. Particle simulations revealed in the late 1990's that the product of the elementary charge and F can be much larger than the electron temperature in shock waves, i.e., the parallel electric field can be quite strong. However, no theory was presented for this unexpected result. This paper first revisits the small-amplitude theory for F and then investigates the parallel pseudo potential F in large-amplitude pulses based on the two-fluid model with finite thermal pressures. It is found that the magnitude of F in a shock wave is determined by the wave amplitude, the electron temperature, and the kinetic energy of an ion moving with the Alfvén speed. This theoretically obtained expression for F is nearly identical to the empirical relation for F discovered in the previous simulation work.
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functionalmore » characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.« less
Methods for design and evaluation of parallel computating systems (The PISCES project)

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

1989-01-01

The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less
Capturing Petascale Application Characteristics with the Sequoia Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vetter, Jeffrey S; Bhatia, Nikhil; Grobelny, Eric M

2005-01-01

Characterization of the computation, communication, memory, and I/O demands of current scientific applications is crucial for identifying which technologies will enable petascale scientific computing. In this paper, we present the Sequoia Toolkit for characterizing HPC applications. The Sequoia Toolkit consists of the Sequoia trace capture library and the Sequoia Event Analysis Library, or SEAL, that facilitates the development of tools for analyzing Sequoia event traces. Using the Sequoia Toolkit, we have characterized the behavior of application runs with up to 2048 application processes. To illustrate the use of the Sequoia Toolkit, we present a preliminary characterization of LAMMPS, a molecularmore » dynamics application of great interest to the computational biology community.« less
Xyce Parallel Electronic Simulator Users Guide Version 6.2.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)« less
Xyce Parallel Electronic Simulator Users Guide Version 6.4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)« less
Modified current follower-based immittance function simulators

NASA Astrophysics Data System (ADS)

Alpaslan, Halil; Yuce, Erkan

2017-12-01

In this paper, four immittance function simulators consisting of a single modified current follower with single Z- terminal and a minimum number of passive components are proposed. The first proposed circuit can provide +L parallel with +R and the second proposed one can realise -L parallel with -R. The third proposed structure can provide +L series with +R and the fourth proposed one can realise -L series with -R. However, all the proposed immittance function simulators need a single resistive matching constraint. Parasitic impedance effects on all the proposed immittance function simulators are investigated. A second-order current-mode (CM) high-pass filter derived from the first proposed immittance function simulator is given as an application example. Also, a second-order CM low-pass filter derived from the third proposed immittance function simulator is given as an application example. A number of simulation results based on SPICE programme and an experimental test result are given to verify the theory.
Reusable Component Model Development Approach for Parallel and Distributed Simulation

PubMed Central

Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

2014-01-01

Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

NASA Technical Reports Server (NTRS)

Lyons, Daniel T.; Desai, Prasun N.

2005-01-01

This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR

NASA Astrophysics Data System (ADS)

Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen

2013-03-01

Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

PubMed Central

Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Longitudinal train dynamics: an overview

NASA Astrophysics Data System (ADS)

Wu, Qing; Spiryagin, Maksym; Cole, Colin

2016-12-01

This paper discusses the evolution of longitudinal train dynamics (LTD) simulations, which covers numerical solvers, vehicle connection systems, air brake systems, wagon dumper systems and locomotives, resistance forces and gravitational components, vehicle in-train instabilities, and computing schemes. A number of potential research topics are suggested, such as modelling of friction, polymer, and transition characteristics for vehicle connection simulations, studies of wagon dumping operations, proper modelling of vehicle in-train instabilities, and computing schemes for LTD simulations. Evidence shows that LTD simulations have evolved with computing capabilities. Currently, advanced component models that directly describe the working principles of the operation of air brake systems, vehicle connection systems, and traction systems are available. Parallel computing is a good solution to combine and simulate all these advanced models. Parallel computing can also be used to conduct three-dimensional long train dynamics simulations.
Neural simulations on multi-core architectures.

PubMed

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures

PubMed Central

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393

Some links on this page may take you to non-federal websites. Their policies may differ from this site.