parallel adaptive simulation: Topics by Science.gov

Sample records for parallel adaptive simulation

Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Durham extremely large telescope adaptive optics simulation platform.

PubMed

Basden, Alastair; Butterley, Timothy; Myers, Richard; Wilson, Richard

2007-03-01

Adaptive optics systems are essential on all large telescopes for which image quality is important. These are complex systems with many design parameters requiring optimization before good performance can be achieved. The simulation of adaptive optics systems is therefore necessary to categorize the expected performance. We describe an adaptive optics simulation platform, developed at Durham University, which can be used to simulate adaptive optics systems on the largest proposed future extremely large telescopes as well as on current systems. This platform is modular, object oriented, and has the benefit of hardware application acceleration that can be used to improve the simulation performance, essential for ensuring that the run time of a given simulation is acceptable. The simulation platform described here can be highly parallelized using parallelization techniques suited for adaptive optics simulation, while still offering the user complete control while the simulation is running. The results from the simulation of a ground layer adaptive optics system are provided as an example to demonstrate the flexibility of this simulation platform.
Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

NASA Astrophysics Data System (ADS)

Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin

2016-08-01

This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.
Performance evaluation of GPU parallelization, space-time adaptive algorithms, and their combination for simulating cardiac electrophysiology.

PubMed

Sachetto Oliveira, Rafael; Martins Rocha, Bernardo; Burgarelli, Denise; Meira, Wagner; Constantinides, Christakis; Weber Dos Santos, Rodrigo

2018-02-01

The use of computer models as a tool for the study and understanding of the complex phenomena of cardiac electrophysiology has attained increased importance nowadays. At the same time, the increased complexity of the biophysical processes translates into complex computational and mathematical models. To speed up cardiac simulations and to allow more precise and realistic uses, 2 different techniques have been traditionally exploited: parallel computing and sophisticated numerical methods. In this work, we combine a modern parallel computing technique based on multicore and graphics processing units (GPUs) and a sophisticated numerical method based on a new space-time adaptive algorithm. We evaluate each technique alone and in different combinations: multicore and GPU, multicore and GPU and space adaptivity, multicore and GPU and space adaptivity and time adaptivity. All the techniques and combinations were evaluated under different scenarios: 3D simulations on slabs, 3D simulations on a ventricular mouse mesh, ie, complex geometry, sinus-rhythm, and arrhythmic conditions. Our results suggest that multicore and GPU accelerate the simulations by an approximate factor of 33×, whereas the speedups attained by the space-time adaptive algorithms were approximately 48. Nevertheless, by combining all the techniques, we obtained speedups that ranged between 165 and 498. The tested methods were able to reduce the execution time of a simulation by more than 498× for a complex cellular model in a slab geometry and by 165× in a realistic heart geometry simulating spiral waves. The proposed methods will allow faster and more realistic simulations in a feasible time with no significant loss of accuracy. Copyright © 2017 John Wiley & Sons, Ltd.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

PubMed

Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

2013-09-15

A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.
Developing parallel GeoFEST(P) using the PYRAMID AMR library

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.

2004-01-01

The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel adaptive wavelet collocation method for PDEs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu

2015-10-01

A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Optimal Design of Passive Power Filters Based on Pseudo-parallel Genetic Algorithm

NASA Astrophysics Data System (ADS)

Li, Pei; Li, Hongbo; Gao, Nannan; Niu, Lin; Guo, Liangfeng; Pei, Ying; Zhang, Yanyan; Xu, Minmin; Chen, Kerui

2017-05-01

The economic costs together with filter efficiency are taken as targets to optimize the parameter of passive filter. Furthermore, the method of combining pseudo-parallel genetic algorithm with adaptive genetic algorithm is adopted in this paper. In the early stages pseudo-parallel genetic algorithm is introduced to increase the population diversity, and adaptive genetic algorithm is used in the late stages to reduce the workload. At the same time, the migration rate of pseudo-parallel genetic algorithm is improved to change with population diversity adaptively. Simulation results show that the filter designed by the proposed method has better filtering effect with lower economic cost, and can be used in engineering.
Parallel Anisotropic Tetrahedral Adaptation

NASA Technical Reports Server (NTRS)

Park, Michael A.; Darmofal, David L.

2008-01-01

An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

NASA Astrophysics Data System (ADS)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

2017-01-01

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

NASA Astrophysics Data System (ADS)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Progress in the Simulation of Steady and Time-Dependent Flows with 3D Parallel Unstructured Cartesian Methods

NASA Technical Reports Server (NTRS)

Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)

2002-01-01

The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the World- wide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE PAGES

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; ...

2016-09-29

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Numerical simulation of h-adaptive immersed boundary method for freely falling disks

NASA Astrophysics Data System (ADS)

Zhang, Pan; Xia, Zhenhua; Cai, Qingdong

2018-05-01

In this work, a freely falling disk with aspect ratio 1/10 is directly simulated by using an adaptive numerical model implemented on a parallel computation framework JASMIN. The adaptive numerical model is a combination of the h-adaptive mesh refinement technique and the implicit immersed boundary method (IBM). Our numerical results agree well with the experimental results in all of the six degrees of freedom of the disk. Furthermore, very similar vortex structures observed in the experiment were also obtained.
Specification and Analysis of Parallel Machine Architecture

DTIC Science & Technology

1990-03-17

Parallel Machine Architeture C.V. Ramamoorthy Computer Science Division Dept. of Electrical Engineering and Computer Science University of California...capacity. (4) Adaptive: The overhead in resolution of deadlocks, etc. should be in proportion to their frequency. (5) Avoid rollbacks: Rollbacks can be...snapshots of system state graphically at a rate proportional to simulation time. Some of the examples are as follow: (1) When the simulation clock of
Large-Scale Parallel Simulations of Turbulent Combustion using Combined Dimension Reduction and Tabulation of Chemistry

DTIC Science & Technology

2012-05-22

tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition, we use x2f mpi – a Fortran library...for parallel vector-valued function evaluation (used with ISAT in this context) – to efficiently redistribute the chemistry workload among the...Constrained-Equilibrium (RCCE) method, and tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition

Locally adaptive parallel temperature accelerated dynamics method

NASA Astrophysics Data System (ADS)

Shim, Yunsic; Amar, Jacques G.

2010-03-01

The recently-developed temperature-accelerated dynamics (TAD) method [M. Sørensen and A.F. Voter, J. Chem. Phys. 112, 9599 (2000)] along with the more recently developed parallel TAD (parTAD) method [Y. Shim et al, Phys. Rev. B 76, 205439 (2007)] allow one to carry out non-equilibrium simulations over extended time and length scales. The basic idea behind TAD is to speed up transitions by carrying out a high-temperature MD simulation and then use the resulting information to obtain event times at the desired low temperature. In a typical implementation, a fixed high temperature Thigh is used. However, in general one expects that for each configuration there exists an optimal value of Thigh which depends on the particular transition pathways and activation energies for that configuration. Here we present a locally adaptive high-temperature TAD method in which instead of using a fixed Thigh the high temperature is dynamically adjusted in order to maximize simulation efficiency. Preliminary results of the performance obtained from parTAD simulations of Cu/Cu(100) growth using the locally adaptive Thigh method will also be presented.
The island dynamics model on parallel quadtree grids

NASA Astrophysics Data System (ADS)

Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

2018-05-01

We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.

2016-01-01

MADNESS (multiresolution adaptive numerical environment for scientific simulation) is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett

Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

DOE PAGES

Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett; ...

2017-01-01

Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Progress in Computational Simulation of Earthquakes

NASA Technical Reports Server (NTRS)

Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert

2006-01-01

GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
Analysis, preliminary design and simulation systems for control-structure interaction problems

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.

1991-01-01

Software aspects of control-structure interaction (CSI) analysis are discussed. The following subject areas are covered: (1) implementation of a partitioned algorithm for simulation of large CSI problems; (2) second-order discrete Kalman filtering equations for CSI simulations; and (3) parallel computations and control of adaptive structures.
Applying Parallel Adaptive Methods with GeoFEST/PYRAMID to Simulate Earth Surface Crustal Dynamics

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Glasscoe, Margaret; Donnellan, Andrea; Li, Peggy

2006-01-01

This viewgraph presentation reviews the use Adaptive Mesh Refinement (AMR) in simulating the Crustal Dynamics of Earth's Surface. AMR simultaneously improves solution quality, time to solution, and computer memory requirements when compared to generating/running on a globally fine mesh. The use of AMR in simulating the dynamics of the Earth's Surface is spurred by future proposed NASA missions, such as InSAR for Earth surface deformation and other measurements. These missions will require support for large-scale adaptive numerical methods using AMR to model observations. AMR was chosen because it has been successful in computation fluid dynamics for predictive simulation of complex flows around complex structures.
Hybrid Parallelization of Adaptive MHD-Kinetic Module in Multi-Scale Fluid-Kinetic Simulation Suite

DOE PAGES

Borovikov, Sergey; Heerikhuisen, Jacob; Pogorelov, Nikolai

2013-04-01

The Multi-Scale Fluid-Kinetic Simulation Suite has a computational tool set for solving partially ionized flows. In this paper we focus on recent developments of the kinetic module which solves the Boltzmann equation using the Monte-Carlo method. The module has been recently redesigned to utilize intra-node hybrid parallelization. We describe in detail the redesign process, implementation issues, and modifications made to the code. Finally, we conduct a performance analysis.
Spatial adaptive sampling in multiscale simulation

NASA Astrophysics Data System (ADS)

Rouet-Leduc, Bertrand; Barros, Kipton; Cieren, Emmanuel; Elango, Venmugil; Junghans, Christoph; Lookman, Turab; Mohd-Yusof, Jamaludin; Pavel, Robert S.; Rivera, Axel Y.; Roehm, Dominic; McPherson, Allen L.; Germann, Timothy C.

2014-07-01

In a common approach to multiscale simulation, an incomplete set of macroscale equations must be supplemented with constitutive data provided by fine-scale simulation. Collecting statistics from these fine-scale simulations is typically the overwhelming computational cost. We reduce this cost by interpolating the results of fine-scale simulation over the spatial domain of the macro-solver. Unlike previous adaptive sampling strategies, we do not interpolate on the potentially very high dimensional space of inputs to the fine-scale simulation. Our approach is local in space and time, avoids the need for a central database, and is designed to parallelize well on large computer clusters. To demonstrate our method, we simulate one-dimensional elastodynamic shock propagation using the Heterogeneous Multiscale Method (HMM); we find that spatial adaptive sampling requires only ≈ 50 ×N0.14 fine-scale simulations to reconstruct the stress field at all N grid points. Related multiscale approaches, such as Equation Free methods, may also benefit from spatial adaptive sampling.
MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation

DOE PAGES

Harrison, Robert J.; Beylkin, Gregory; Bischoff, Florian A.; ...

2016-01-01

We present MADNESS (multiresolution adaptive numerical environment for scientific simulation) that is a high-level software environment for solving integral and differential equations in many dimensions that uses adaptive and fast harmonic analysis methods with guaranteed precision that are based on multiresolution analysis and separated representations. Underpinning the numerical capabilities is a powerful petascale parallel programming environment that aims to increase both programmer productivity and code scalability. This paper describes the features and capabilities of MADNESS and briefly discusses some current applications in chemistry and several areas of physics.
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations

PubMed Central

Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka

2011-01-01

Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Nunes, Ruben; Almeida, José A.

2010-08-01

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.
Parallel algorithms for simulating continuous time Markov chains

NASA Technical Reports Server (NTRS)

Nicol, David M.; Heidelberger, Philip

1992-01-01

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
High-resolution multi-code implementation of unsteady Navier-Stokes flow solver based on paralleled overset adaptive mesh refinement and high-order low-dissipation hybrid schemes

NASA Astrophysics Data System (ADS)

Li, Gaohua; Fu, Xiang; Wang, Fuxin

2017-10-01

The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.
Parallel Adaptive Simulation of Detonation Waves Using a Weighted Essentially Non-Oscillatory Scheme

NASA Astrophysics Data System (ADS)

McMahon, Sean

The purpose of this thesis was to develop a code that could be used to develop a better understanding of the physics of detonation waves. First, a detonation was simulated in one dimension using ZND theory. Then, using the 1D solution as an initial condition, a detonation was simulated in two dimensions using a weighted essentially non-oscillatory scheme on an adaptive mesh with the smallest lengthscales being equal to 2-3 flamelet lengths. The code development in linking Chemkin for chemical kinetics to the adaptive mesh refinement flow solver was completed. The detonation evolved in a way that, qualitatively, matched the experimental observations, however, the simulation was unable to progress past the formation of the triple point.
Parallel Adaptive High-Order CFD Simulations Characterizing Cavity Acoustics for the Complete SOFIA Aircraft

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2014-01-01

This paper presents one-of-a-kind MPI-parallel computational fluid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft of a Boeing 747SP. These simulations focus on how the unsteady flow field inside and over the cavity interferes with the optical path and mounting of the telescope. A temporally fourth-order Runge-Kutta, and spatially fifth-order WENO-5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh refinement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32,000 cores and 4 billion cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregularities caused by the highly complex geometry. Limits to scaling beyond 32K cores are identified, and targeted code optimizations are discussed.
Adaptive multiple super fast simulated annealing for stochastic microstructure reconstruction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ryu, Seun; Lin, Guang; Sun, Xin

2013-01-01

Fast image reconstruction from statistical information is critical in image fusion from multimodality chemical imaging instrumentation to create high resolution image with large domain. Stochastic methods have been used widely in image reconstruction from two point correlation function. The main challenge is to increase the efficiency of reconstruction. A novel simulated annealing method is proposed for fast solution of image reconstruction. Combining the advantage of very fast cooling schedules, dynamic adaption and parallelization, the new simulation annealing algorithm increases the efficiencies by several orders of magnitude, making the large domain image fusion feasible.
A parallel adaptive mesh refinement algorithm

NASA Technical Reports Server (NTRS)

Quirk, James J.; Hanebutte, Ulf R.

1993-01-01

Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.
Parallel adaptive discontinuous Galerkin approximation for thin layer avalanche modeling

NASA Astrophysics Data System (ADS)

Patra, A. K.; Nichita, C. C.; Bauer, A. C.; Pitman, E. B.; Bursik, M.; Sheridan, M. F.

2006-08-01

This paper describes the development of highly accurate adaptive discontinuous Galerkin schemes for the solution of the equations arising from a thin layer type model of debris flows. Such flows have wide applicability in the analysis of avalanches induced by many natural calamities, e.g. volcanoes, earthquakes, etc. These schemes are coupled with special parallel solution methodologies to produce a simulation tool capable of very high-order numerical accuracy. The methodology successfully replicates cold rock avalanches at Mount Rainier, Washington and hot volcanic particulate flows at Colima Volcano, Mexico.

A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

NASA Astrophysics Data System (ADS)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

USGS Publications Warehouse

Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

2011-01-01

We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.
An FPGA-based DS-CDMA multiuser demodulator employing adaptive multistage parallel interference cancellation

NASA Astrophysics Data System (ADS)

Li, Xinhua; Song, Zhenyu; Zhan, Yongjie; Wu, Qiongzhi

2009-12-01

Since the system capacity is severely limited, reducing the multiple access interfere (MAI) is necessary in the multiuser direct-sequence code division multiple access (DS-CDMA) system which is used in the telecommunication terminals data-transferred link system. In this paper, we adopt an adaptive multistage parallel interference cancellation structure in the demodulator based on the least mean square (LMS) algorithm to eliminate the MAI on the basis of overviewing various of multiuser dectection schemes. Neither a training sequence nor a pilot signal is needed in the proposed scheme, and its implementation complexity can be greatly reduced by a LMS approximate algorithm. The algorithm and its FPGA implementation is then derived. Simulation results of the proposed adaptive PIC can outperform some of the existing interference cancellation methods in AWGN channels. The hardware setup of mutiuser demodulator is described, and the experimental results based on it demonstrate that the simulation results shows large performance gains over the conventional single-user demodulator.
A fast ultrasonic simulation tool based on massively parallel implementations

NASA Astrophysics Data System (ADS)

Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain

2014-02-01

This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Using collective variables to drive molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Fiorin, Giacomo; Klein, Michael L.; Hénin, Jérôme

2013-12-01

A software framework is introduced that facilitates the application of biasing algorithms to collective variables of the type commonly employed to drive massively parallel molecular dynamics (MD) simulations. The modular framework that is presented enables one to combine existing collective variables into new ones, and combine any chosen collective variable with available biasing methods. The latter include the classic time-dependent biases referred to as steered MD and targeted MD, the temperature-accelerated MD algorithm, as well as the adaptive free-energy biases called metadynamics and adaptive biasing force. The present modular software is extensible, and portable between commonly used MD simulation engines.
Parallel computing method for simulating hydrological processesof large rivers under climate change

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, Y.

2016-12-01

Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Particle simulation of plasmas on the massively parallel processor

NASA Technical Reports Server (NTRS)

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Use of Parallel Micro-Platform for the Simulation the Space Exploration

NASA Astrophysics Data System (ADS)

Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen

The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.
Parallel Adaptive High-Order CFD Simulations Characterizing SOFIA Cavitiy Acoustics

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2015-01-01

This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A tempo- rally fourth-order accurate Runge-Kutta, and a spatially fth-order accurate WENO-5Z scheme were used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

NASA Astrophysics Data System (ADS)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
Cartesian Off-Body Grid Adaption for Viscous Time- Accurate Flow Simulation

NASA Technical Reports Server (NTRS)

Buning, Pieter G.; Pulliam, Thomas H.

2011-01-01

An improved solution adaption capability has been implemented in the OVERFLOW overset grid CFD code. Building on the Cartesian off-body approach inherent in OVERFLOW and the original adaptive refinement method developed by Meakin, the new scheme provides for automated creation of multiple levels of finer Cartesian grids. Refinement can be based on the undivided second-difference of the flow solution variables, or on a specific flow quantity such as vorticity. Coupled with load-balancing and an inmemory solution interpolation procedure, the adaption process provides very good performance for time-accurate simulations on parallel compute platforms. A method of using refined, thin body-fitted grids combined with adaption in the off-body grids is presented, which maximizes the part of the domain subject to adaption. Two- and three-dimensional examples are used to illustrate the effectiveness and performance of the adaption scheme.
Multiscale Simulations of Magnetic Island Coalescence

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2010-01-01

We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.
Use of Hilbert Curves in Parallelized CUDA code: Interaction of Interstellar Atoms with the Heliosphere

NASA Astrophysics Data System (ADS)

Destefano, Anthony; Heerikhuisen, Jacob

2015-04-01

Fully 3D particle simulations can be a computationally and memory expensive task, especially when high resolution grid cells are required. The problem becomes further complicated when parallelization is needed. In this work we focus on computational methods to solve these difficulties. Hilbert curves are used to map the 3D particle space to the 1D contiguous memory space. This method of organization allows for minimized cache misses on the GPU as well as a sorted structure that is equivalent to an octal tree data structure. This type of sorted structure is attractive for uses in adaptive mesh implementations due to the logarithm search time. Implementations using the Message Passing Interface (MPI) library and NVIDIA's parallel computing platform CUDA will be compared, as MPI is commonly used on server nodes with many CPU's. We will also compare static grid structures with those of adaptive mesh structures. The physical test bed will be simulating heavy interstellar atoms interacting with a background plasma, the heliosphere, simulated from fully consistent coupled MHD/kinetic particle code. It is known that charge exchange is an important factor in space plasmas, specifically it modifies the structure of the heliosphere itself. We would like to thank the Alabama Supercomputer Authority for the use of their computational resources.
Metascalable molecular dynamics simulation of nano-mechano-chemistry

NASA Astrophysics Data System (ADS)

Shimojo, F.; Kalia, R. K.; Nakano, A.; Nomura, K.; Vashishta, P.

2008-07-01

We have developed a metascalable (or 'design once, scale on new architectures') parallel application-development framework for first-principles based simulations of nano-mechano-chemical processes on emerging petaflops architectures based on spatiotemporal data locality principles. The framework consists of (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms, (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these scalable algorithms onto hardware. The EDC-STEP-HCD framework exposes and expresses maximal concurrency and data locality, thereby achieving parallel efficiency as high as 0.99 for 1.59-billion-atom reactive force field molecular dynamics (MD) and 17.7-million-atom (1.56 trillion electronic degrees of freedom) quantum mechanical (QM) MD in the framework of the density functional theory (DFT) on adaptive multigrids, in addition to 201-billion-atom nonreactive MD, on 196 608 IBM BlueGene/L processors. We have also used the framework for automated execution of adaptive hybrid DFT/MD simulation on a grid of six supercomputers in the US and Japan, in which the number of processors changed dynamically on demand and tasks were migrated according to unexpected faults. The paper presents the application of the framework to the study of nanoenergetic materials: (1) combustion of an Al/Fe2O3 thermite and (2) shock initiation and reactive nanojets at a void in an energetic crystal.
Phase reconstruction using compressive two-step parallel phase-shifting digital holography

NASA Astrophysics Data System (ADS)

Ramachandran, Prakash; Alex, Zachariah C.; Nelleri, Anith

2018-04-01

The linear relationship between the sample complex object wave and its approximated complex Fresnel field obtained using single shot parallel phase-shifting digital holograms (PPSDH) is used in compressive sensing framework and an accurate phase reconstruction is demonstrated. It is shown that the accuracy of phase reconstruction of this method is better than that of compressive sensing adapted single exposure inline holography (SEOL) method. It is derived that the measurement model of PPSDH method retains both the real and imaginary parts of the Fresnel field but with an approximation noise and the measurement model of SEOL retains only the real part exactly equal to the real part of the complex Fresnel field and its imaginary part is completely not available. Numerical simulation is performed for CS adapted PPSDH and CS adapted SEOL and it is demonstrated that the phase reconstruction is accurate for CS adapted PPSDH and can be used for single shot digital holographic reconstruction.
PLUM: Parallel Load Balancing for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Saini, Subhash (Technical Monitor)

1998-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method called PLUM to dynamically balance the processor workloads with a global view. This paper presents the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. A data redistribution model is also presented that predicts the remapping cost on the SP2. This model is required to determine whether the gain from a balanced workload distribution offsets the cost of data movement. Results presented in this paper demonstrate that PLUM is an effective dynamic load balancing strategy which remains viable on a large number of processors.
PARALLEL HOP: A SCALABLE HALO FINDER FOR MASSIVE COSMOLOGICAL DATA SETS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skory, Stephen; Turk, Matthew J.; Norman, Michael L.

2010-11-15

Modern N-body cosmological simulations contain billions (10{sup 9}) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, suchmore » that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit {sup yt}, an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 2000{sup 3} particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.« less

A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram

This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow

NASA Astrophysics Data System (ADS)

Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos

2015-11-01

The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.
Octree-based, GPU implementation of a continuous cellular automaton for the simulation of complex, evolving surfaces

NASA Astrophysics Data System (ADS)

Ferrando, N.; Gosálvez, M. A.; Cerdá, J.; Gadea, R.; Sato, K.

2011-03-01

Presently, dynamic surface-based models are required to contain increasingly larger numbers of points and to propagate them over longer time periods. For large numbers of surface points, the octree data structure can be used as a balance between low memory occupation and relatively rapid access to the stored data. For evolution rules that depend on neighborhood states, extended simulation periods can be obtained by using simplified atomistic propagation models, such as the Cellular Automata (CA). This method, however, has an intrinsic parallel updating nature and the corresponding simulations are highly inefficient when performed on classical Central Processing Units (CPUs), which are designed for the sequential execution of tasks. In this paper, a series of guidelines is presented for the efficient adaptation of octree-based, CA simulations of complex, evolving surfaces into massively parallel computing hardware. A Graphics Processing Unit (GPU) is used as a cost-efficient example of the parallel architectures. For the actual simulations, we consider the surface propagation during anisotropic wet chemical etching of silicon as a computationally challenging process with a wide-spread use in microengineering applications. A continuous CA model that is intrinsically parallel in nature is used for the time evolution. Our study strongly indicates that parallel computations of dynamically evolving surfaces simulated using CA methods are significantly benefited by the incorporation of octrees as support data structures, substantially decreasing the overall computational time and memory usage.
Computational algorithms for simulations in atmospheric optics.

PubMed

Konyaev, P A; Lukin, V P

2016-04-20

A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Parallel Adaptive Mesh Refinement for High-Order Finite-Volume Schemes in Computational Fluid Dynamics

NASA Astrophysics Data System (ADS)

Schwing, Alan Michael

For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable comparisons across a range of regimes. Unsteady and steady applications are considered in both subsonic and supersonic flows. Inviscid and viscous simulations achieve similar results at a much reduced cost when employing dynamic mesh adaptation. Several techniques for guiding adaptation are compared. Detailed analysis of statistics from the instrumented solver enable understanding of the costs associated with adaptation. Adaptive mesh refinement shows promise for the test cases presented here. It can be considerably faster than using conventional grids and provides accurate results. The procedures for adapting the grid are light-weight enough to not require significant computational time and yield significant reductions in grid size.
Robust adaptive antiswing control of underactuated crane systems with two parallel payloads and rail length constraint.

PubMed

Zhang, Zhongcai; Wu, Yuqiang; Huang, Jinming

2016-11-01

The antiswing control and accurate positioning are simultaneously investigated for underactuated crane systems in the presence of two parallel payloads on the trolley and rail length limitation. The equations of motion for the crane system in question are established via the Euler-Lagrange equation. An adaptive control strategy is proposed with the help of system energy function and energy shaping technique. Stability analysis shows that under the designed adaptive controller, the payload swings can be suppressed ultimately and the trolley can be regulated to the destination while not exceeding the pre-specified boundaries. Simulation results are provided to show the satisfactory control performances of the presented control method in terms of working efficiency as well as robustness with respect to external disturbances. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Computer Science Techniques Applied to Parallel Atomistic Simulation

NASA Astrophysics Data System (ADS)

Nakano, Aiichiro

1998-03-01

Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Phylogeographic differentiation versus transcriptomic adaptation to warm temperatures in Zostera marina, a globally important seagrass.

PubMed

Jueterbock, A; Franssen, S U; Bergmann, N; Gu, J; Coyer, J A; Reusch, T B H; Bornberg-Bauer, E; Olsen, J L

2016-11-01

Populations distributed across a broad thermal cline are instrumental in addressing adaptation to increasing temperatures under global warming. Using a space-for-time substitution design, we tested for parallel adaptation to warm temperatures along two independent thermal clines in Zostera marina, the most widely distributed seagrass in the temperate Northern Hemisphere. A North-South pair of populations was sampled along the European and North American coasts and exposed to a simulated heatwave in a common-garden mesocosm. Transcriptomic responses under control, heat stress and recovery were recorded in 99 RNAseq libraries with ~13 000 uniquely annotated, expressed genes. We corrected for phylogenetic differentiation among populations to discriminate neutral from adaptive differentiation. The two southern populations recovered faster from heat stress and showed parallel transcriptomic differentiation, as compared with northern populations. Among 2389 differentially expressed genes, 21 exceeded neutral expectations and were likely involved in parallel adaptation to warm temperatures. However, the strongest differentiation following phylogenetic correction was between the three Atlantic populations and the Mediterranean population with 128 of 4711 differentially expressed genes exceeding neutral expectations. Although adaptation to warm temperatures is expected to reduce sensitivity to heatwaves, the continued resistance of seagrass to further anthropogenic stresses may be impaired by heat-induced downregulation of genes related to photosynthesis, pathogen defence and stress tolerance. © 2016 John Wiley & Sons Ltd.
Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

2013-04-01

We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.
Adapted RF pulse design for SAR reduction in parallel excitation with experimental verification at 9.4 T.

PubMed

Wu, Xiaoping; Akgün, Can; Vaughan, J Thomas; Andersen, Peter; Strupp, John; Uğurbil, Kâmil; Van de Moortele, Pierre-François

2010-07-01

Parallel excitation holds strong promises to mitigate the impact of large transmit B1 (B+1) distortion at very high magnetic field. Accelerated RF pulses, however, inherently tend to require larger values in RF peak power which may result in substantial increase in Specific Absorption Rate (SAR) in tissues, which is a constant concern for patient safety at very high field. In this study, we demonstrate adapted rate RF pulse design allowing for SAR reduction while preserving excitation target accuracy. Compared with other proposed implementations of adapted rate RF pulses, our approach is compatible with any k-space trajectories, does not require an analytical expression of the gradient waveform and can be used for large flip angle excitation. We demonstrate our method with numerical simulations based on electromagnetic modeling and we include an experimental verification of transmit pattern accuracy on an 8 transmit channel 9.4 T system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less
Multiresolution molecular mechanics: Implementation and efficiency

NASA Astrophysics Data System (ADS)

Biyikli, Emre; To, Albert C.

2017-01-01

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
Parallel Programming Strategies for Irregular Adaptive Applications

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.
Predictive wind turbine simulation with an adaptive lattice Boltzmann method for moving boundaries

NASA Astrophysics Data System (ADS)

Deiterding, Ralf; Wood, Stephen L.

2016-09-01

Operating horizontal axis wind turbines create large-scale turbulent wake structures that affect the power output of downwind turbines considerably. The computational prediction of this phenomenon is challenging as efficient low dissipation schemes are necessary that represent the vorticity production by the moving structures accurately and that are able to transport wakes without significant artificial decay over distances of several rotor diameters. We have developed a parallel adaptive lattice Boltzmann method for large eddy simulation of turbulent weakly compressible flows with embedded moving structures that considers these requirements rather naturally and enables first principle simulations of wake-turbine interaction phenomena at reasonable computational costs. The paper describes the employed computational techniques and presents validation simulations for the Mexnext benchmark experiments as well as simulations of the wake propagation in the Scaled Wind Farm Technology (SWIFT) array consisting of three Vestas V27 turbines in triangular arrangement.
Flexbar 3.0 - SIMD and multicore parallelization.

PubMed

Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

2017-09-15

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Virtual Petaflop Simulation: Parallel Potential Solvers and New Integrators for Gravitational Systems

NASA Technical Reports Server (NTRS)

Lake, George; Quinn, Thomas; Richardson, Derek C.; Stadel, Joachim

1999-01-01

"The orbit of any one planet depends on the combined motion of all the planets, not to mention the actions of all these on each other. To consider simultaneously all these causes of motion and to define these motions by exact laws allowing of convenient calculation exceeds, unless I am mistaken, the forces of the entire human intellect" -Isaac Newton 1687. Epochal surveys are throwing down the gauntlet for cosmological simulation. We describe three keys to meeting the challenge of N-body simulation: adaptive potential solvers, adaptive integrators and volume renormalization. With these techniques and a dedicated Teraflop facility, simulation can stay even with observation of the Universe. We also describe some problems in the formation and stability of planetary systems. Here, the challenge is to perform accurate integrations that retain Hamiltonian properties for 10(exp 13) timesteps.
An adaptive front tracking technique for three-dimensional transient flows

NASA Astrophysics Data System (ADS)

Galaktionov, O. S.; Anderson, P. D.; Peters, G. W. M.; van de Vosse, F. N.

2000-01-01

An adaptive technique, based on both surface stretching and surface curvature analysis for tracking strongly deforming fluid volumes in three-dimensional flows is presented. The efficiency and accuracy of the technique are demonstrated for two- and three-dimensional flow simulations. For the two-dimensional test example, the results are compared with results obtained using a different tracking approach based on the advection of a passive scalar. Although for both techniques roughly the same structures are found, the resolution for the front tracking technique is much higher. In the three-dimensional test example, a spherical blob is tracked in a chaotic mixing flow. For this problem, the accuracy of the adaptive tracking is demonstrated by the volume conservation for the advected blob. Adaptive front tracking is suitable for simulation of the initial stages of fluid mixing, where the interfacial area can grow exponentially with time. The efficiency of the algorithm significantly benefits from parallelization of the code. Copyright
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

NASA Technical Reports Server (NTRS)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

NASA Astrophysics Data System (ADS)

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten

2017-11-01

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less

Global magnetosphere simulations using constrained-transport Hall-MHD with CWENO reconstruction

NASA Astrophysics Data System (ADS)

Lin, L.; Germaschewski, K.; Maynard, K. M.; Abbott, S.; Bhattacharjee, A.; Raeder, J.

2013-12-01

We present a new CWENO (Centrally-Weighted Essentially Non-Oscillatory) reconstruction based MHD solver for the OpenGGCM global magnetosphere code. The solver was built using libMRC, a library for creating efficient parallel PDE solvers on structured grids. The use of libMRC gives us access to its core functionality of providing an automated code generation framework which takes a user provided PDE right hand side in symbolic form to generate an efficient, computer architecture specific, parallel code. libMRC also supports block-structured adaptive mesh refinement and implicit-time stepping through integration with the PETSc library. We validate the new CWENO Hall-MHD solver against existing solvers both in standard test problems as well as in global magnetosphere simulations.
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR

NASA Astrophysics Data System (ADS)

Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen

2013-03-01

Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.
PROTO-PLASM: parallel language for adaptive and scalable modelling of biosystems.

PubMed

Bajaj, Chandrajit; DiCarlo, Antonio; Paoluzzi, Alberto

2008-09-13

This paper discusses the design goals and the first developments of PROTO-PLASM, a novel computational environment to produce libraries of executable, combinable and customizable computer models of natural and synthetic biosystems, aiming to provide a supporting framework for predictive understanding of structure and behaviour through multiscale geometric modelling and multiphysics simulations. Admittedly, the PROTO-PLASM platform is still in its infancy. Its computational framework--language, model library, integrated development environment and parallel engine--intends to provide patient-specific computational modelling and simulation of organs and biosystem, exploiting novel functionalities resulting from the symbolic combination of parametrized models of parts at various scales. PROTO-PLASM may define the model equations, but it is currently focused on the symbolic description of model geometry and on the parallel support of simulations. Conversely, CellML and SBML could be viewed as defining the behavioural functions (the model equations) to be used within a PROTO-PLASM program. Here we exemplify the basic functionalities of PROTO-PLASM, by constructing a schematic heart model. We also discuss multiscale issues with reference to the geometric and physical modelling of neuromuscular junctions.
Proto-Plasm: parallel language for adaptive and scalable modelling of biosystems

PubMed Central

Bajaj, Chandrajit; DiCarlo, Antonio; Paoluzzi, Alberto

2008-01-01

This paper discusses the design goals and the first developments of Proto-Plasm, a novel computational environment to produce libraries of executable, combinable and customizable computer models of natural and synthetic biosystems, aiming to provide a supporting framework for predictive understanding of structure and behaviour through multiscale geometric modelling and multiphysics simulations. Admittedly, the Proto-Plasm platform is still in its infancy. Its computational framework—language, model library, integrated development environment and parallel engine—intends to provide patient-specific computational modelling and simulation of organs and biosystem, exploiting novel functionalities resulting from the symbolic combination of parametrized models of parts at various scales. Proto-Plasm may define the model equations, but it is currently focused on the symbolic description of model geometry and on the parallel support of simulations. Conversely, CellML and SBML could be viewed as defining the behavioural functions (the model equations) to be used within a Proto-Plasm program. Here we exemplify the basic functionalities of Proto-Plasm, by constructing a schematic heart model. We also discuss multiscale issues with reference to the geometric and physical modelling of neuromuscular junctions. PMID:18559320
Sliding mode control of direct coupled interleaved boost converter for fuel cell

NASA Astrophysics Data System (ADS)

Wang, W. Y.; Ding, Y. H.; Ke, X.; Ma, X.

2017-12-01

A three phase direct coupled interleaved boost converter (TP-DIBC) was recommended in this paper. This converter has a small unbalance current sharing among the branches of TP-DIBC. An adaptive control law sliding mode control (SMC) is designed for the TP-DIBC. The aim is to 1) reduce ripple output voltage, inductor current and regulate output voltage tightly 2) The total current carried by direct coupled interleaved boost converter (DIBC) must be equally shared between different parallel branches. The efficacy and robustness of the proposed TP-DIBC and adaptive SMC is confirmed via computer simulations using Matlab SimPower System Tools. The simulation result is in line with the expectation.
ALEGRA -- A massively parallel h-adaptive code for solid dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Summers, R.M.; Wong, M.K.; Boucheron, E.A.

1997-12-31

ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

PubMed

Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik

2015-06-09

Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Adaptive mesh refinement and load balancing based on multi-level block-structured Cartesian mesh

NASA Astrophysics Data System (ADS)

Misaka, Takashi; Sasaki, Daisuke; Obayashi, Shigeru

2017-11-01

We developed a framework for a distributed-memory parallel computer that enables dynamic data management for adaptive mesh refinement and load balancing. We employed simple data structure of the building cube method (BCM) where a computational domain is divided into multi-level cubic domains and each cube has the same number of grid points inside, realising a multi-level block-structured Cartesian mesh. Solution adaptive mesh refinement, which works efficiently with the help of the dynamic load balancing, was implemented by dividing cubes based on mesh refinement criteria. The framework was investigated with the Laplace equation in terms of adaptive mesh refinement, load balancing and the parallel efficiency. It was then applied to the incompressible Navier-Stokes equations to simulate a turbulent flow around a sphere. We considered wall-adaptive cube refinement where a non-dimensional wall distance y+ near the sphere is used for a criterion of mesh refinement. The result showed the load imbalance due to y+ adaptive mesh refinement was corrected by the present approach. To utilise the BCM framework more effectively, we also tested a cube-wise algorithm switching where an explicit and implicit time integration schemes are switched depending on the local Courant-Friedrichs-Lewy (CFL) condition in each cube.
ADAPT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reynolds, John; Jankovsky, Zachary; Metzroth, Kyle G

2018-04-04

The purpose of the ADAPT code is to generate Dynamic Event Trees (DET) using a user specified set of simulators. ADAPT can utilize any simulation tool which meets a minimal set of requirements. ADAPT is based on the concept of DET which uses explicit modeling of the deterministic dynamic processes that take place during a nuclear reactor plant system (or other complex system) evolution along with stochastic modeling. When DET are used to model various aspects of Probabilistic Risk Assessment (PRA), all accident progression scenarios starting from an initiating event are considered simultaneously. The DET branching occurs at user specifiedmore » times and/or when an action is required by the system and/or the operator. These outcomes then decide how the dynamic system variables will evolve in time for each DET branch. Since two different outcomes at a DET branching may lead to completely different paths for system evolution, the next branching for these paths may occur not only at separate times, but can be based on different branching criteria. The computational infrastructure allows for flexibility in ADAPT to link with different system simulation codes, parallel processing of the scenarios under consideration, on-line scenario management (initiation as well as termination), analysis of results, and user friendly graphical capabilities. The ADAPT system is designed for a distributed computing environment; the scheduler can track multiple concurrent branches simultaneously. The scheduler is modularized so that the DET branching strategy can be modified (e.g. biasing towards the worst-case scenario/event). Independent database systems store data from the simulation tasks and the DET structure so that the event tree can be constructed and analyzed later. ADAPT is provided with a user-friendly client which can easily sort through and display the results of an experiment, precluding the need for the user to manually inspect individual simulator runs.« less
Fast Particle Methods for Multiscale Phenomena Simulations

NASA Technical Reports Server (NTRS)

Koumoutsakos, P.; Wray, A.; Shariff, K.; Pohorille, Andrew

2000-01-01

We are developing particle methods oriented at improving computational modeling capabilities of multiscale physical phenomena in : (i) high Reynolds number unsteady vortical flows, (ii) particle laden and interfacial flows, (iii)molecular dynamics studies of nanoscale droplets and studies of the structure, functions, and evolution of the earliest living cell. The unifying computational approach involves particle methods implemented in parallel computer architectures. The inherent adaptivity, robustness and efficiency of particle methods makes them a multidisciplinary computational tool capable of bridging the gap of micro-scale and continuum flow simulations. Using efficient tree data structures, multipole expansion algorithms, and improved particle-grid interpolation, particle methods allow for simulations using millions of computational elements, making possible the resolution of a wide range of length and time scales of these important physical phenomena.The current challenges in these simulations are in : [i] the proper formulation of particle methods in the molecular and continuous level for the discretization of the governing equations [ii] the resolution of the wide range of time and length scales governing the phenomena under investigation. [iii] the minimization of numerical artifacts that may interfere with the physics of the systems under consideration. [iv] the parallelization of processes such as tree traversal and grid-particle interpolations We are conducting simulations using vortex methods, molecular dynamics and smooth particle hydrodynamics, exploiting their unifying concepts such as : the solution of the N-body problem in parallel computers, highly accurate particle-particle and grid-particle interpolations, parallel FFT's and the formulation of processes such as diffusion in the context of particle methods. This approach enables us to transcend among seemingly unrelated areas of research.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jain, Atul K.

The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.
MaMiCo: Software design for parallel molecular-continuum flow simulations

NASA Astrophysics Data System (ADS)

Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim

2016-03-01

The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum simulations, retaining sequential and parallel performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow simulations which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum simulations. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid simulations is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

DOE PAGES

Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...

2017-11-27

Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Increased performance in the short-term water demand forecasting through the use of a parallel adaptive weighting strategy

NASA Astrophysics Data System (ADS)

Sardinha-Lourenço, A.; Andrade-Campos, A.; Antunes, A.; Oliveira, M. S.

2018-03-01

Recent research on water demand short-term forecasting has shown that models using univariate time series based on historical data are useful and can be combined with other prediction methods to reduce errors. The behavior of water demands in drinking water distribution networks focuses on their repetitive nature and, under meteorological conditions and similar consumers, allows the development of a heuristic forecast model that, in turn, combined with other autoregressive models, can provide reliable forecasts. In this study, a parallel adaptive weighting strategy of water consumption forecast for the next 24-48 h, using univariate time series of potable water consumption, is proposed. Two Portuguese potable water distribution networks are used as case studies where the only input data are the consumption of water and the national calendar. For the development of the strategy, the Autoregressive Integrated Moving Average (ARIMA) method and a short-term forecast heuristic algorithm are used. Simulations with the model showed that, when using a parallel adaptive weighting strategy, the prediction error can be reduced by 15.96% and the average error by 9.20%. This reduction is important in the control and management of water supply systems. The proposed methodology can be extended to other forecast methods, especially when it comes to the availability of multiple forecast models.
AFFINE-CORRECTED PARADISE: FREE-BREATHING PATIENT-ADAPTIVE CARDIAC MRI WITH SENSITIVITY ENCODING

PubMed Central

Sharif, Behzad; Bresler, Yoram

2013-01-01

We propose a real-time cardiac imaging method with parallel MRI that allows for free breathing during imaging and does not require cardiac or respiratory gating. The method is based on the recently proposed PARADISE (Patient-Adaptive Reconstruction and Acquisition Dynamic Imaging with Sensitivity Encoding) scheme. The new acquisition method adapts the PARADISE k-t space sampling pattern according to an affine model of the respiratory motion. The reconstruction scheme involves multi-channel time-sequential imaging with time-varying channels. All model parameters are adapted to the imaged patient as part of the experiment and drive both data acquisition and cine reconstruction. Simulated cardiac MRI experiments using the realistic NCAT phantom show high quality cine reconstructions and robustness to modeling inaccuracies. PMID:24390159
Local error estimates for adaptive simulation of the Reaction–Diffusion Master Equation via operator splitting

PubMed Central

Hellander, Andreas; Lawson, Michael J; Drawert, Brian; Petzold, Linda

2015-01-01

The efficiency of exact simulation methods for the reaction-diffusion master equation (RDME) is severely limited by the large number of diffusion events if the mesh is fine or if diffusion constants are large. Furthermore, inherent properties of exact kinetic-Monte Carlo simulation methods limit the efficiency of parallel implementations. Several approximate and hybrid methods have appeared that enable more efficient simulation of the RDME. A common feature to most of them is that they rely on splitting the system into its reaction and diffusion parts and updating them sequentially over a discrete timestep. This use of operator splitting enables more efficient simulation but it comes at the price of a temporal discretization error that depends on the size of the timestep. So far, existing methods have not attempted to estimate or control this error in a systematic manner. This makes the solvers hard to use for practitioners since they must guess an appropriate timestep. It also makes the solvers potentially less efficient than if the timesteps are adapted to control the error. Here, we derive estimates of the local error and propose a strategy to adaptively select the timestep when the RDME is simulated via a first order operator splitting. While the strategy is general and applicable to a wide range of approximate and hybrid methods, we exemplify it here by extending a previously published approximate method, the Diffusive Finite-State Projection (DFSP) method, to incorporate temporal adaptivity. PMID:26865735
Local error estimates for adaptive simulation of the Reaction-Diffusion Master Equation via operator splitting.

PubMed

Hellander, Andreas; Lawson, Michael J; Drawert, Brian; Petzold, Linda

2014-06-01

The efficiency of exact simulation methods for the reaction-diffusion master equation (RDME) is severely limited by the large number of diffusion events if the mesh is fine or if diffusion constants are large. Furthermore, inherent properties of exact kinetic-Monte Carlo simulation methods limit the efficiency of parallel implementations. Several approximate and hybrid methods have appeared that enable more efficient simulation of the RDME. A common feature to most of them is that they rely on splitting the system into its reaction and diffusion parts and updating them sequentially over a discrete timestep. This use of operator splitting enables more efficient simulation but it comes at the price of a temporal discretization error that depends on the size of the timestep. So far, existing methods have not attempted to estimate or control this error in a systematic manner. This makes the solvers hard to use for practitioners since they must guess an appropriate timestep. It also makes the solvers potentially less efficient than if the timesteps are adapted to control the error. Here, we derive estimates of the local error and propose a strategy to adaptively select the timestep when the RDME is simulated via a first order operator splitting. While the strategy is general and applicable to a wide range of approximate and hybrid methods, we exemplify it here by extending a previously published approximate method, the Diffusive Finite-State Projection (DFSP) method, to incorporate temporal adaptivity.
DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.

Bi-Objective Optimal Control Modification Adaptive Control for Systems with Input Uncertainty

NASA Technical Reports Server (NTRS)

Nguyen, Nhan T.

2012-01-01

This paper presents a new model-reference adaptive control method based on a bi-objective optimal control formulation for systems with input uncertainty. A parallel predictor model is constructed to relate the predictor error to the estimation error of the control effectiveness matrix. In this work, we develop an optimal control modification adaptive control approach that seeks to minimize a bi-objective linear quadratic cost function of both the tracking error norm and predictor error norm simultaneously. The resulting adaptive laws for the parametric uncertainty and control effectiveness uncertainty are dependent on both the tracking error and predictor error, while the adaptive laws for the feedback gain and command feedforward gain are only dependent on the tracking error. The optimal control modification term provides robustness to the adaptive laws naturally from the optimal control framework. Simulations demonstrate the effectiveness of the proposed adaptive control approach.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes

NASA Technical Reports Server (NTRS)

Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.

1999-01-01

The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Demeure, I.M.

The research presented here is concerned with representation techniques and tools to support the design, prototyping, simulation, and evaluation of message-based parallel, distributed computations. The author describes ParaDiGM-Parallel, Distributed computation Graph Model-a visual representation technique for parallel, message-based distributed computations. ParaDiGM provides several views of a computation depending on the aspect of concern. It is made of two complementary submodels, the DCPG-Distributed Computing Precedence Graph-model, and the PAM-Process Architecture Model-model. DCPGs are precedence graphs used to express the functionality of a computation in terms of tasks, message-passing, and data. PAM graphs are used to represent the partitioning of a computationmore » into schedulable units or processes, and the pattern of communication among those units. There is a natural mapping between the two models. He illustrates the utility of ParaDiGM as a representation technique by applying it to various computations (e.g., an adaptive global optimization algorithm, the client-server model). ParaDiGM representations are concise. They can be used in documenting the design and the implementation of parallel, distributed computations, in describing such computations to colleagues, and in comparing and contrasting various implementations of the same computation. He then describes VISA-VISual Assistant, a software tool to support the design, prototyping, and simulation of message-based parallel, distributed computations. VISA is based on the ParaDiGM model. In particular, it supports the editing of ParaDiGM graphs to describe the computations of interest, and the animation of these graphs to provide visual feedback during simulations. The graphs are supplemented with various attributes, simulation parameters, and interpretations which are procedures that can be executed by VISA.« less
Modern spandrels: the roles of genetic drift, gene flow and natural selection in the evolution of parallel clines.

PubMed

Santangelo, James S; Johnson, Marc T J; Ness, Rob W

2018-05-16

Urban environments offer the opportunity to study the role of adaptive and non-adaptive evolutionary processes on an unprecedented scale. While the presence of parallel clines in heritable phenotypic traits is often considered strong evidence for the role of natural selection, non-adaptive evolutionary processes can also generate clines, and this may be more likely when traits have a non-additive genetic basis due to epistasis. In this paper, we use spatially explicit simulations modelled according to the cyanogenesis (hydrogen cyanide, HCN) polymorphism in white clover ( Trifolium repens ) to examine the formation of phenotypic clines along urbanization gradients under varying levels of drift, gene flow and selection. HCN results from an epistatic interaction between two Mendelian-inherited loci. Our results demonstrate that the genetic architecture of this trait makes natural populations susceptible to decreases in HCN frequencies via drift. Gradients in the strength of drift across a landscape resulted in phenotypic clines with lower frequencies of HCN in strongly drifting populations, giving the misleading appearance of deterministic adaptive changes in the phenotype. Studies of heritable phenotypic change in urban populations should generate null models of phenotypic evolution based on the genetic architecture underlying focal traits prior to invoking selection's role in generating adaptive differentiation. © 2018 The Author(s).
Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model

NASA Astrophysics Data System (ADS)

Kumar, M.; Duffy, C.

2006-05-01

Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.
Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

NASA Astrophysics Data System (ADS)

Gassmöller, Rene; Bangerth, Wolfgang

2016-04-01

Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Widlund, Olof B.

2015-06-09

The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
Scalability of a Low-Cost Multi-Teraflop Linux Cluster for High-End Classical Atomistic and Quantum Mechanical Simulations

NASA Technical Reports Server (NTRS)

Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash

2003-01-01

Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
Parallel software for lattice N = 4 supersymmetric Yang-Mills theory

NASA Astrophysics Data System (ADS)

Schaich, David; DeGrand, Thomas

2015-05-01

We present new parallel software, SUSY LATTICE, for lattice studies of four-dimensional N = 4 supersymmetric Yang-Mills theory with gauge group SU(N). The lattice action is constructed to exactly preserve a single supersymmetry charge at non-zero lattice spacing, up to additional potential terms included to stabilize numerical simulations. The software evolved from the MILC code for lattice QCD, and retains a similar large-scale framework despite the different target theory. Many routines are adapted from an existing serial code (Catterall and Joseph, 2012), which SUSY LATTICE supersedes. This paper provides an overview of the new parallel software, summarizing the lattice system, describing the applications that are currently provided and explaining their basic workflow for non-experts in lattice gauge theory. We discuss the parallel performance of the code, and highlight some notable aspects of the documentation for those interested in contributing to its future development.
Nyx: Adaptive mesh, massively-parallel, cosmological simulation code

NASA Astrophysics Data System (ADS)

Almgren, Ann; Beckner, Vince; Friesen, Brian; Lukic, Zarija; Zhang, Weiqun

2017-12-01

Nyx code solves equations of compressible hydrodynamics on an adaptive grid hierarchy coupled with an N-body treatment of dark matter. The gas dynamics in Nyx use a finite volume methodology on an adaptive set of 3-D Eulerian grids; dark matter is represented as discrete particles moving under the influence of gravity. Particles are evolved via a particle-mesh method, using Cloud-in-Cell deposition/interpolation scheme. Both baryonic and dark matter contribute to the gravitational field. In addition, Nyx includes physics for accurately modeling the intergalactic medium; in optically thin limits and assuming ionization equilibrium, the code calculates heating and cooling processes of the primordial-composition gas in an ionizing ultraviolet background radiation field.
ADAPTIVE REAL-TIME CARDIAC MRI USING PARADISE: VALIDATION BY THE PHYSIOLOGICALLY IMPROVED NCAT PHANTOM

PubMed Central

Sharif, Behzad; Bresler, Yoram

2013-01-01

Patient-Adaptive Reconstruction and Acquisition Dynamic Imaging with Sensitivity Encoding (PARADISE) is a dynamic MR imaging scheme that optimally combines parallel imaging and model-based adaptive acquisition. In this work, we propose the application of PARADISE to real-time cardiac MRI. We introduce a physiologically improved version of a realistic four-dimensional cardiac-torso (NCAT) phantom, which incorporates natural beat-to-beat heart rate and motion variations. Cardiac cine imaging using PARADISE is simulated and its performance is analyzed by virtue of the improved phantom. Results verify the effectiveness of PARADISE for high resolution un-gated real-time cardiac MRI and its superiority over conventional acquisition methods. PMID:24398475
Analysis and improvements of Adaptive Particle Refinement (APR) through CPU time, accuracy and robustness considerations

NASA Astrophysics Data System (ADS)

Chiron, L.; Oger, G.; de Leffe, M.; Le Touzé, D.

2018-02-01

While smoothed-particle hydrodynamics (SPH) simulations are usually performed using uniform particle distributions, local particle refinement techniques have been developed to concentrate fine spatial resolutions in identified areas of interest. Although the formalism of this method is relatively easy to implement, its robustness at coarse/fine interfaces can be problematic. Analysis performed in [16] shows that the radius of refined particles should be greater than half the radius of unrefined particles to ensure robustness. In this article, the basics of an Adaptive Particle Refinement (APR) technique, inspired by AMR in mesh-based methods, are presented. This approach ensures robustness with alleviated constraints. Simulations applying the new formalism proposed achieve accuracy comparable to fully refined spatial resolutions, together with robustness, low CPU times and maintained parallel efficiency.
A Dynamic Finite Element Method for Simulating the Physics of Faults Systems

NASA Astrophysics Data System (ADS)

Saez, E.; Mora, P.; Gross, L.; Weatherley, D.

2004-12-01

We introduce a dynamic Finite Element method using a novel high level scripting language to describe the physical equations, boundary conditions and time integration scheme. The library we use is the parallel Finley library: a finite element kernel library, designed for solving large-scale problems. It is incorporated as a differential equation solver into a more general library called escript, based on the scripting language Python. This library has been developed to facilitate the rapid development of 3D parallel codes, and is optimised for the Australian Computational Earth Systems Simulator Major National Research Facility (ACcESS MNRF) supercomputer, a 208 processor SGI Altix with a peak performance of 1.1 TFlops. Using the scripting approach we obtain a parallel FE code able to take advantage of the computational efficiency of the Altix 3700. We consider faults as material discontinuities (the displacement, velocity, and acceleration fields are discontinuous at the fault), with elastic behavior. The stress continuity at the fault is achieved naturally through the expression of the fault interactions in the weak formulation. The elasticity problem is solved explicitly in time, using the Saint Verlat scheme. Finally, we specify a suitable frictional constitutive relation and numerical scheme to simulate fault behaviour. Our model is based on previous work on modelling fault friction and multi-fault systems using lattice solid-like models. We adapt the 2D model for simulating the dynamics of parallel fault systems described to the Finite-Element method. The approach uses a frictional relation along faults that is slip and slip-rate dependent, and the numerical integration approach introduced by Mora and Place in the lattice solid model. In order to illustrate the new Finite Element model, single and multi-fault simulation examples are presented.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics

NASA Technical Reports Server (NTRS)

Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

2016-01-01

This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
A Cartesian cut cell method for rarefied flow simulations around moving obstacles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dechristé, G., E-mail: Guillaume.Dechriste@math.u-bordeaux1.fr; CNRS, IMB, UMR 5251, F-33400 Talence; Mieussens, L., E-mail: Luc.Mieussens@math.u-bordeaux1.fr

2016-06-01

For accurate simulations of rarefied gas flows around moving obstacles, we propose a cut cell method on Cartesian grids: it allows exact conservation and accurate treatment of boundary conditions. Our approach is designed to treat Cartesian cells and various kinds of cut cells by the same algorithm, with no need to identify the specific shape of each cut cell. This makes the implementation quite simple, and allows a direct extension to 3D problems. Such simulations are also made possible by using an adaptive mesh refinement technique and a hybrid parallel implementation. This is illustrated by several test cases, including amore » 3D unsteady simulation of the Crookes radiometer.« less
Exploring Discretization Error in Simulation-Based Aerodynamic Databases

NASA Technical Reports Server (NTRS)

Aftosmis, Michael J.; Nemec, Marian

2010-01-01

This work examines the level of discretization error in simulation-based aerodynamic databases and introduces strategies for error control. Simulations are performed using a parallel, multi-level Euler solver on embedded-boundary Cartesian meshes. Discretization errors in user-selected outputs are estimated using the method of adjoint-weighted residuals and we use adaptive mesh refinement to reduce these errors to specified tolerances. Using this framework, we examine the behavior of discretization error throughout a token database computed for a NACA 0012 airfoil consisting of 120 cases. We compare the cost and accuracy of two approaches for aerodynamic database generation. In the first approach, mesh adaptation is used to compute all cases in the database to a prescribed level of accuracy. The second approach conducts all simulations using the same computational mesh without adaptation. We quantitatively assess the error landscape and computational costs in both databases. This investigation highlights sensitivities of the database under a variety of conditions. The presence of transonic shocks or the stiffness in the governing equations near the incompressible limit are shown to dramatically increase discretization error requiring additional mesh resolution to control. Results show that such pathologies lead to error levels that vary by over factor of 40 when using a fixed mesh throughout the database. Alternatively, controlling this sensitivity through mesh adaptation leads to mesh sizes which span two orders of magnitude. We propose strategies to minimize simulation cost in sensitive regions and discuss the role of error-estimation in database quality.
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

1996-01-01

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Efficient Load Balancing and Data Remapping for Adaptive Grid Calculations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1997-01-01

Mesh adaption is a powerful tool for efficient unstructured- grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.
Laser Ray Tracing in a Parallel Arbitrary Lagrangian-Eulerian Adaptive Mesh Refinement Hydrocode

DOE Office of Scientific and Technical Information (OSTI.GOV)

Masters, N D; Kaiser, T B; Anderson, R W

2009-09-28

ALE-AMR is a new hydrocode that we are developing as a predictive modeling tool for debris and shrapnel formation in high-energy laser experiments. In this paper we present our approach to implementing laser ray-tracing in ALE-AMR. We present the equations of laser ray tracing, our approach to efficient traversal of the adaptive mesh hierarchy in which we propagate computational rays through a virtual composite mesh consisting of the finest resolution representation of the modeled space, and anticipate simulations that will be compared to experiments for code validation.
Particle-in-cell simulations on graphic processing units

NASA Astrophysics Data System (ADS)

Ren, C.; Zhou, X.; Li, J.; Huang, M. C.; Zhao, Y.

2014-10-01

We will show our recent progress in using GPU's to accelerate the PIC code OSIRIS [Fonseca et al. LNCS 2331, 342 (2002)]. The OISRIS parallel structure is retained and the computation-intensive kernels are shipped to GPU's. Algorithms for the kernels are adapted for the GPU, including high-order charge-conserving current deposition schemes with few branching and parallel particle sorting [Kong et al., JCP 230, 1676 (2011)]. These algorithms make efficient use of the GPU shared memory. This work was supported by U.S. Department of Energy under Grant No. DE-FC02-04ER54789 and by NSF under Grant No. PHY-1314734.

Adaptive parallel logic networks

NASA Technical Reports Server (NTRS)

Martinez, Tony R.; Vidal, Jacques J.

1988-01-01

Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit

NASA Technical Reports Server (NTRS)

MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; deFainchtein, Rosalinda; Packer, Charles

1999-01-01

In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity.
An object-oriented approach for parallel self adaptive mesh refinement on block structured grids

NASA Technical Reports Server (NTRS)

Lemke, Max; Witsch, Kristian; Quinlan, Daniel

1993-01-01

Self-adaptive mesh refinement dynamically matches the computational demands of a solver for partial differential equations to the activity in the application's domain. In this paper we present two C++ class libraries, P++ and AMR++, which significantly simplify the development of sophisticated adaptive mesh refinement codes on (massively) parallel distributed memory architectures. The development is based on our previous research in this area. The C++ class libraries provide abstractions to separate the issues of developing parallel adaptive mesh refinement applications into those of parallelism, abstracted by P++, and adaptive mesh refinement, abstracted by AMR++. P++ is a parallel array class library to permit efficient development of architecture independent codes for structured grid applications, and AMR++ provides support for self-adaptive mesh refinement on block-structured grids of rectangular non-overlapping blocks. Using these libraries, the application programmers' work is greatly simplified to primarily specifying the serial single grid application and obtaining the parallel and self-adaptive mesh refinement code with minimal effort. Initial results for simple singular perturbation problems solved by self-adaptive multilevel techniques (FAC, AFAC), being implemented on the basis of prototypes of the P++/AMR++ environment, are presented. Singular perturbation problems frequently arise in large applications, e.g. in the area of computational fluid dynamics. They usually have solutions with layers which require adaptive mesh refinement and fast basic solvers in order to be resolved efficiently.
The Feasibility of Adaptive Unstructured Computations On Petaflops Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Heber, Gerd; Gao, Guang; Saini, Subhash (Technical Monitor)

1999-01-01

This viewgraph presentation covers the advantages of mesh adaptation, unstructured grids, and dynamic load balancing. It illustrates parallel adaptive communications, and explains PLUM (Parallel dynamic load balancing for adaptive unstructured meshes), and PSAW (Proper Self Avoiding Walks).
Application of parallel distributed Lagrange multiplier technique to simulate coupled Fluid-Granular flows in pipes with varying Cross-Sectional area

DOE PAGES

Kanarska, Yuliya; Walton, Otis

2015-11-30

Fluid-granular flows are common phenomena in nature and industry. Here, an efficient computational technique based on the distributed Lagrange multiplier method is utilized to simulate complex fluid-granular flows. Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. The particle–particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEMmore » method with some modifications using the volume of an overlapping region as an input to the contact forces. Here, a parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library.« less
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers

NASA Technical Reports Server (NTRS)

Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)

1997-01-01

The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
RPYFMM: Parallel adaptive fast multipole method for Rotne-Prager-Yamakawa tensor in biomolecular hydrodynamics simulations

NASA Astrophysics Data System (ADS)

Guan, W.; Cheng, X.; Huang, J.; Huber, G.; Li, W.; McCammon, J. A.; Zhang, B.

2018-06-01

RPYFMM is a software package for the efficient evaluation of the potential field governed by the Rotne-Prager-Yamakawa (RPY) tensor interactions in biomolecular hydrodynamics simulations. In our algorithm, the RPY tensor is decomposed as a linear combination of four Laplace interactions, each of which is evaluated using the adaptive fast multipole method (FMM) (Greengard and Rokhlin, 1997) where the exponential expansions are applied to diagonalize the multipole-to-local translation operators. RPYFMM offers a unified execution on both shared and distributed memory computers by leveraging the DASHMM library (DeBuhr et al., 2016, 2018). Preliminary numerical results show that the interactions for a molecular system of 15 million particles (beads) can be computed within one second on a Cray XC30 cluster using 12,288 cores, while achieving approximately 54% strong-scaling efficiency.
Near-Body Grid Adaption for Overset Grids

NASA Technical Reports Server (NTRS)

Buning, Pieter G.; Pulliam, Thomas H.

2016-01-01

A solution adaption capability for curvilinear near-body grids has been implemented in the OVERFLOW overset grid computational fluid dynamics code. The approach follows closely that used for the Cartesian off-body grids, but inserts refined grids in the computational space of original near-body grids. Refined curvilinear grids are generated using parametric cubic interpolation, with one-sided biasing based on curvature and stretching ratio of the original grid. Sensor functions, grid marking, and solution interpolation tasks are implemented in the same fashion as for off-body grids. A goal-oriented procedure, based on largest error first, is included for controlling growth rate and maximum size of the adapted grid system. The adaption process is almost entirely parallelized using MPI, resulting in a capability suitable for viscous, moving body simulations. Two- and three-dimensional examples are presented.
Simple adaptive control system design for a quadrotor with an internal PFC

NASA Astrophysics Data System (ADS)

Mizumoto, Ikuro; Nakamura, Takuto; Kumon, Makoto; Takagi, Taro

2014-12-01

The paper deals with an adaptive control system design problem for a four rotor helicopter or quadrotor. A simple adaptive control design scheme with a parallel feedforward compensator (PFC) in the internal loop of the considered quadrotor will be proposed based on the backstepping strategy. As is well known, the backstepping control strategy is one of the advanced control strategy for nonlinear systems. However, the control algorithm will become complex if the system has higher order relative degrees. We will show that one can skip some design steps of the backstepping method by introducing a PFC in the inner loop of the considered quadrotor, so that the structure of the obtained controller will be simplified and a high gain based adaptive feedback control system will be designed. The effectiveness of the proposed method will be confirmed through numerical simulations.
Genomics of parallel adaptation at two timescales in Drosophila

PubMed Central

Begun, David J.

2017-01-01

Two interesting unanswered questions are the extent to which both the broad patterns and genetic details of adaptive divergence are repeatable across species, and the timescales over which parallel adaptation may be observed. Drosophila melanogaster is a key model system for population and evolutionary genomics. Findings from genetics and genomics suggest that recent adaptation to latitudinal environmental variation (on the timescale of hundreds or thousands of years) associated with Out-of-Africa colonization plays an important role in maintaining biological variation in the species. Additionally, studies of interspecific differences between D. melanogaster and its sister species D. simulans have revealed that a substantial proportion of proteins and amino acid residues exhibit adaptive divergence on a roughly few million years long timescale. Here we use population genomic approaches to attack the problem of parallelism between D. melanogaster and a highly diverged conger, D. hydei, on two timescales. D. hydei, a member of the repleta group of Drosophila, is similar to D. melanogaster, in that it too appears to be a recently cosmopolitan species and recent colonizer of high latitude environments. We observed parallelism both for genes exhibiting latitudinal allele frequency differentiation within species and for genes exhibiting recurrent adaptive protein divergence between species. Greater parallelism was observed for long-term adaptive protein evolution and this parallelism includes not only the specific genes/proteins that exhibit adaptive evolution, but extends even to the magnitudes of the selective effects on interspecific protein differences. Thus, despite the roughly 50 million years of time separating D. melanogaster and D. hydei, and despite their considerably divergent biology, they exhibit substantial parallelism, suggesting the existence of a fundamental predictability of adaptive evolution in the genus. PMID:28968391
A parallel algorithm for the initial screening of space debris collisions prediction using the SGP4/SDP4 models and GPU acceleration

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-05-01

Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
Task 6 - Subtask 1: PNNL Visit by JAEA Researchers to Evaluate the Feasibility of the FLESCOT Code for the Future JAEA Use for the Fukushima Surface Water Environmental Assessment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Onishi, Yasuo

Four Japan Atomic Energy Agency (JAEA) researchers visited Pacific Northwest National Laboratory (PNNL) for seven working days and have evaluated the suitability and adaptability of FLESCOT to a JAEA’s supercomputer system to effectively simulate cesium behavior in dam reservoirs, river mouths, and coastal areas in Fukushima contaminated by the Fukushima Daiichi nuclear accident. PNNL showed the following to JAEA visitors during the seven-working day period: FLESCOT source code; User’s manual; FLESCOT description – Program structure – Algorism – Solver – Boundary condition handling – Data definition – Input and output methods – How to run. During the visit, JAEA hadmore » access to FLESCOT to run with an input data set to evaluate the capacity and feasibility of adapting it to a JAEA super computer with massive parallel processors. As a part of this evaluation, PNNL ran FLESCOT for sample cases of the contaminant migration simulation to further describe FLESCOT in action. JAEA and PNNL researchers also evaluated time spent for each subroutine of FLESCOT, and the JAEA researcher implemented some initial parallelization schemes to FLESCOT. Based on this code evaluation, JAEA and PNNL determined that FLESCOT is: applicable to Fukushima lakes/dam reservoirs, river mouth areas, and coastal water; and feasible to implement parallelization for the JAEA supercomputer. In addition, PNNL and JAEA researchers discussed molecular modeling approaches on cesium adsorption mechanisms to enhance the JAEA molecular modeling activities. PNNL and JAEA also discussed specific collaboration of molecular and computational modeling activities.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal /Fluid Team

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
The adaptive parallel UKF inversion method for the shape of space objects based on the ground-based photometric data

NASA Astrophysics Data System (ADS)

Du, Xiaoping; Wang, Yang; Liu, Hao

2018-04-01

The space object in highly elliptical orbit is always presented as an image point on the ground-based imaging equipment so that it is difficult to resolve and identify the shape and attitude directly. In this paper a novel algorithm is presented for the estimation of spacecraft shape. The apparent magnitude model suitable for the inversion of object information such as shape and attitude is established based on the analysis of photometric characteristics. A parallel adaptive shape inversion algorithm based on UKF was designed after the achievement of dynamic equation of the nonlinear, Gaussian system involved with the influence of various dragging forces. The result of a simulation study demonstrate the viability and robustness of the new filter and its fast convergence rate. It realizes the inversion of combination shape with high accuracy, especially for the bus of cube and cylinder. Even though with sparse photometric data, it still can maintain a higher success rate of inversion.
Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

NASA Astrophysics Data System (ADS)

Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

2015-03-01

We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
Analysis of the thermal balance characteristics for multiple-connected piezoelectric transformers.

PubMed

Park, Joung-Hu; Cho, Bo-Hyung; Choi, Sung-Jin; Lee, Sang-Min

2009-08-01

Because the amount of power that a piezoelectric transformer (PT) can handle is limited, multiple connections of PTs are necessary for the power-capacity improvement of PT-applications. In the connection, thermal imbalance between the PTs should be prevented to avoid the thermal runaway of each PT. The thermal balance of the multiple-connected PTs is dominantly affected by the electrothermal characteristics of individual PTs. In this paper, the thermal balance of both parallel-parallel and parallel-series connections are analyzed by electrical model parameters. For quantitative analysis, the thermal-balance effects are estimated by the simulation of the mechanical loss ratio between the PTs. The analysis results show that with PTs of similar characteristics, the parallel-series connection has better thermal balance characteristics due to the reduced mechanical loss of the higher temperature PT. For experimental verification of the analysis, a hardware-prototype test of a Cs-Lp type 40 W adapter system with radial-vibration mode PTs has been performed.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Systems-on-chip approach for real-time simulation of wheel-rail contact laws

NASA Astrophysics Data System (ADS)

Mei, T. X.; Zhou, Y. J.

2013-04-01

This paper presents the development of a systems-on-chip approach to speed up the simulation of wheel-rail contact laws, which can be used to reduce the requirement for high-performance computers and enable simulation in real time for the use of hardware-in-loop for experimental studies of the latest vehicle dynamic and control technologies. The wheel-rail contact laws are implemented using a field programmable gate array (FPGA) device with a design that substantially outperforms modern general-purpose PC platforms or fixed architecture digital signal processor devices in terms of processing time, configuration flexibility and cost. In order to utilise the FPGA's parallel-processing capability, the operations in the contact laws algorithms are arranged in a parallel manner and multi-contact patches are tackled simultaneously in the design. The interface between the FPGA device and the host PC is achieved by using a high-throughput and low-latency Ethernet link. The development is based on FASTSIM algorithms, although the design can be adapted and expanded for even more computationally demanding tasks.
Earthquake Rupture Dynamics using Adaptive Mesh Refinement and High-Order Accurate Numerical Methods

NASA Astrophysics Data System (ADS)

Kozdon, J. E.; Wilcox, L.

2013-12-01

Our goal is to develop scalable and adaptive (spatial and temporal) numerical methods for coupled, multiphysics problems using high-order accurate numerical methods. To do so, we are developing an opensource, parallel library known as bfam (available at http://bfam.in). The first application to be developed on top of bfam is an earthquake rupture dynamics solver using high-order discontinuous Galerkin methods and summation-by-parts finite difference methods. In earthquake rupture dynamics, wave propagation in the Earth's crust is coupled to frictional sliding on fault interfaces. This coupling is two-way, required the simultaneous simulation of both processes. The use of laboratory-measured friction parameters requires near-fault resolution that is 4-5 orders of magnitude higher than that needed to resolve the frequencies of interest in the volume. This, along with earlier simulations using a low-order, finite volume based adaptive mesh refinement framework, suggest that adaptive mesh refinement is ideally suited for this problem. The use of high-order methods is motivated by the high level of resolution required off the fault in earlier the low-order finite volume simulations; we believe this need for resolution is a result of the excessive numerical dissipation of low-order methods. In bfam spatial adaptivity is handled using the p4est library and temporal adaptivity will be accomplished through local time stepping. In this presentation we will present the guiding principles behind the library as well as verification of code against the Southern California Earthquake Center dynamic rupture code validation test problems.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly Parallel Simulation Approach

NASA Technical Reports Server (NTRS)

Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon

2014-01-01

This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a probability of collision Pc > 10 (sup -6) can be mitigated.
Massive parallel 3D PIC simulation of negative ion extraction

NASA Astrophysics Data System (ADS)

Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu

2017-09-01

The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
Simple adaptive control system design for a quadrotor with an internal PFC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mizumoto, Ikuro; Nakamura, Takuto; Kumon, Makoto

2014-12-10

The paper deals with an adaptive control system design problem for a four rotor helicopter or quadrotor. A simple adaptive control design scheme with a parallel feedforward compensator (PFC) in the internal loop of the considered quadrotor will be proposed based on the backstepping strategy. As is well known, the backstepping control strategy is one of the advanced control strategy for nonlinear systems. However, the control algorithm will become complex if the system has higher order relative degrees. We will show that one can skip some design steps of the backstepping method by introducing a PFC in the inner loopmore » of the considered quadrotor, so that the structure of the obtained controller will be simplified and a high gain based adaptive feedback control system will be designed. The effectiveness of the proposed method will be confirmed through numerical simulations.« less
Six degree-of-freedom scanning supports and manipulators based on parallel robots

NASA Astrophysics Data System (ADS)

Comin, Fabio

1995-02-01

The exploitation of third generation SR sources heavily relies on accurate and stable positioning and scanning of samples and optical elements. In some cases, active feedback is also necessary. Normally, these tasks are carried out by serial addition of individual components, each of them providing a well-defined excursion path. On the contrary, the exploitation of the concept of parallel robots, structures in close cinematic chain, permits us to follow any given trajectory in the six-dimensional space with a large increase in accuracy and stiffness. At ESRF, the parallel robot architecture conceived some tens of years ago for flight simulators has been adapted to both actively align and operate optical elements of considerable weight and position small samples in ultrahigh vacuum. The performance of these devices gives results far superior to the initial specification and a variety of drive mechanisms are being developed to fit the different needs of the ESRF beamlines.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations

NASA Technical Reports Server (NTRS)

Chrisochoides, Nikos

1995-01-01

We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

DOE PAGES

Warren, Michael S.

2014-01-01

We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2 18 ) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (4096 3 ) particle cosmological simulations, accounting for 4×10 20 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy andmore » scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.« less
SCASim: A Flexible and Reusable Detector Simulator for the MIRI instrument of the JWST

NASA Astrophysics Data System (ADS)

Beard, S.; Morin, J.; Gastaud, R.; Azzollini, R.; Bouchet, P.; Chaintreuil, S.; Lahuis, F.; Littlejohns, O.; Nehme, C.; Pye, J.

2012-09-01

The JWST Mid Infrared Instrument (MIRI) operates in the 5-28μm wavelength range and can be configured for imaging, coronographic imaging, long-slit, low-resolution spectroscopy or medium resolution spectroscopy with an integral field unit. SCASim is one of a suite of simulators which operate together to simulate all the different modes of the instrument. These simulators are essential for the efficient operation of MIRI; allowing more accurate planning of MIRI observations on sky or during the pre-launch testing of the instrument. The data generated by the simulators are essential for testing the data pipeline software. The simulators not only need to reproduce the behaviour of the instrument faithfully, they also need to be adaptable so that information learned about the instrument during the pre-launch testing and in-orbit commissioning can be fed back into the simulation. SCASim simulates the behaviour of the MIRI detectors, taking into account cosmetic effects, quantum efficiency, shot noise, dark current, read noise, amplifier layout, cosmic ray hits, etc... The software has benefited from three major design choices. First, the development of a suite of MIRI simulators, rather than single simulator, has allowed MIRI simulators to be developed in parallel by different teams, with each simulator able to concentrate on one particular area. SCASim provides a facility common to all the other simulators and saves duplication of effort. Second, SCASim has a Python-based object-oriented design which makes it easier to adapt as new information about the instrument is learned during testing. Third, all simulator parameters are maintained in external files, rather than being hard coded in the software. These design choices have made SCASim highly reusable. In its present form it can be used to simulate any JWST detector, and it can be adapted for future instruments with similar, photon-counting detectors.
Component Framework for Loosely Coupled High Performance Integrated Plasma Simulations

NASA Astrophysics Data System (ADS)

Elwasif, W. R.; Bernholdt, D. E.; Shet, A. G.; Batchelor, D. B.; Foley, S.

2010-11-01

We present the design and implementation of a component-based simulation framework for the execution of coupled time-dependent plasma modeling codes. The Integrated Plasma Simulator (IPS) provides a flexible lightweight component model that streamlines the integration of stand alone codes into coupled simulations. Standalone codes are adapted to the IPS component interface specification using a thin wrapping layer implemented in the Python programming language. The framework provides services for inter-component method invocation, configuration, task, and data management, asynchronous event management, simulation monitoring, and checkpoint/restart capabilities. Services are invoked, as needed, by the computational components to coordinate the execution of different aspects of coupled simulations on Massive parallel Processing (MPP) machines. A common plasma state layer serves as the foundation for inter-component, file-based data exchange. The IPS design principles, implementation details, and execution model will be presented, along with an overview of several use cases.
Fast adaptive composite grid methods on distributed parallel architectures

NASA Technical Reports Server (NTRS)

Lemke, Max; Quinlan, Daniel

1992-01-01

The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Progress on the development of FullWave, a Hot and Cold Plasma Parallel Full Wave Code

NASA Astrophysics Data System (ADS)

Spencer, J. Andrew; Svidzinski, Vladimir; Zhao, Liangji; Kim, Jin-Soo

2017-10-01

FullWave is being developed at FAR-TECH, Inc. to simulate RF waves in hot inhomogeneous magnetized plasmas without making small orbit approximations. FullWave is based on a meshless formulation in configuration space on non-uniform clouds of computational points (CCP) adapted to better resolve plasma resonances, antenna structures and complex boundaries. The linear frequency domain wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel is calculated. The details of FullWave and some preliminary results will be presented, including: 1) a monitor function based on analytic solutions of the cold-plasma dispersion relation; 2) an adaptive CCP based on the monitor function; 3) construction of the finite differences for approximation of derivatives on adaptive CCP; 4) results of 2-D full wave simulations in the cold plasma model in tokamak geometry using the formulated approach for ECRH, ICRH and Lower Hybrid range of frequencies. Work is supported by the U.S. DOE SBIR program.
SAChES: Scalable Adaptive Chain-Ensemble Sampling.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Swiler, Laura Painton; Ray, Jaideep; Ebeida, Mohamed Salah

We present the development of a parallel Markov Chain Monte Carlo (MCMC) method called SAChES, Scalable Adaptive Chain-Ensemble Sampling. This capability is targed to Bayesian calibration of com- putationally expensive simulation models. SAChES involves a hybrid of two methods: Differential Evo- lution Monte Carlo followed by Adaptive Metropolis. Both methods involve parallel chains. Differential evolution allows one to explore high-dimensional parameter spaces using loosely coupled (i.e., largely asynchronous) chains. Loose coupling allows the use of large chain ensembles, with far more chains than the number of parameters to explore. This reduces per-chain sampling burden, enables high-dimensional inversions and the usemore » of computationally expensive forward models. The large number of chains can also ameliorate the impact of silent-errors, which may affect only a few chains. The chain ensemble can also be sampled to provide an initial condition when an aberrant chain is re-spawned. Adaptive Metropolis takes the best points from the differential evolution and efficiently hones in on the poste- rior density. The multitude of chains in SAChES is leveraged to (1) enable efficient exploration of the parameter space; and (2) ensure robustness to silent errors which may be unavoidable in extreme-scale computational platforms of the future. This report outlines SAChES, describes four papers that are the result of the project, and discusses some additional results.« less
On magnetic field amplification and particle acceleration near non-relativistic collisionless shocks: Particles in MHD Cells simulations

NASA Astrophysics Data System (ADS)

Casse, F.; van Marle, A. J.; Marcowith, A.

2018-01-01

We present simulations of magnetized astrophysical shocks taking into account the interplay between the thermal plasma of the shock and supra-thermal particles. Such interaction is depicted by combining a grid-based magneto-hydrodynamics description of the thermal fluid with particle-in-cell techniques devoted to the dynamics of supra-thermal particles. This approach, which incorporates the use of adaptive mesh refinement features, is potentially a key to simulate astrophysical systems on spatial scales that are beyond the reach of pure particle-in-cell simulations. We consider non-relativistic super-Alfénic shocks with various magnetic field obliquity. We recover all the features from previous studies when the magnetic field is parallel to the normal to the shock. In contrast with previous particle-in-cell and hybrid simulations, we find that particle acceleration and magnetic field amplification also occur when the magnetic field is oblique to the normal to the shock but on larger timescales than in the parallel case. We show that in our oblique shock simulations the streaming of supra-thermal particles induces a corrugation of the shock front. Such oscillations of both the shock front and the magnetic field then locally helps the particles to enter the upstream region and to initiate a non-resonant streaming instability and finally to induce diffuse particle acceleration.
Parallel Tetrahedral Mesh Adaptation with Dynamic Load Balancing

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Gabow, Harold N.

1999-01-01

The ability to dynamically adapt an unstructured grid is a powerful tool for efficiently solving computational problems with evolving physical features. In this paper, we report on our experience parallelizing an edge-based adaptation scheme, called 3D_TAG. using message passing. Results show excellent speedup when a realistic helicopter rotor mesh is randomly refined. However. performance deteriorates when the mesh is refined using a solution-based error indicator since mesh adaptation for practical problems occurs in a localized region., creating a severe load imbalance. To address this problem, we have developed PLUM, a global dynamic load balancing framework for adaptive numerical computations. Even though PLUM primarily balances processor workloads for the solution phase, it reduces the load imbalance problem within mesh adaptation by repartitioning the mesh after targeting edges for refinement but before the actual subdivision. This dramatically improves the performance of parallel 3D_TAG since refinement occurs in a more load balanced fashion. We also present optimal and heuristic algorithms that, when applied to the default mapping of a parallel repartitioner, significantly reduce the data redistribution overhead. Finally, portability is examined by comparing performance on three state-of-the-art parallel machines.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Sohn, Andrew

1996-01-01

Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler.

PubMed

Jin, Ick Hoon; Yuan, Ying; Liang, Faming

2013-10-01

Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
A wavelet approach to binary blackholes with asynchronous multitasking

NASA Astrophysics Data System (ADS)

Lim, Hyun; Hirschmann, Eric; Neilsen, David; Anderson, Matthew; Debuhr, Jackson; Zhang, Bo

2016-03-01

Highly accurate simulations of binary black holes and neutron stars are needed to address a variety of interesting problems in relativistic astrophysics. We present a new method for the solving the Einstein equations (BSSN formulation) using iterated interpolating wavelets. Wavelet coefficients provide a direct measure of the local approximation error for the solution and place collocation points that naturally adapt to features of the solution. Further, they exhibit exponential convergence on unevenly spaced collection points. The parallel implementation of the wavelet simulation framework presented here deviates from conventional practice in combining multi-threading with a form of message-driven computation sometimes referred to as asynchronous multitasking.
Dynamic grid refinement for partial differential equations on parallel computers

NASA Technical Reports Server (NTRS)

Mccormick, S.; Quinlan, D.

1989-01-01

The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
Adding dynamic rules to self-organizing fuzzy systems

NASA Technical Reports Server (NTRS)

Buhusi, Catalin V.

1992-01-01

This paper develops a Dynamic Self-Organizing Fuzzy System (DSOFS) capable of adding, removing, and/or adapting the fuzzy rules and the fuzzy reference sets. The DSOFS background consists of a self-organizing neural structure with neuron relocation features which will develop a map of the input-output behavior. The relocation algorithm extends the topological ordering concept. Fuzzy rules (neurons) are dynamically added or released while the neural structure learns the pattern. The DSOFS advantages are the automatic synthesis and the possibility of parallel implementation. A high adaptation speed and a reduced number of neurons is needed in order to keep errors under some limits. The computer simulation results are presented in a nonlinear systems modelling application.
Mathematical and Numerical Aspects of the Adaptive Fast Multipole Poisson-Boltzmann Solver

DOE PAGES

Zhang, Bo; Lu, Benzhuo; Cheng, Xiaolin; ...

2013-01-01

This paper summarizes the mathematical and numerical theories and computational elements of the adaptive fast multipole Poisson-Boltzmann (AFMPB) solver. We introduce and discuss the following components in order: the Poisson-Boltzmann model, boundary integral equation reformulation, surface mesh generation, the nodepatch discretization approach, Krylov iterative methods, the new version of fast multipole methods (FMMs), and a dynamic prioritization technique for scheduling parallel operations. For each component, we also remark on feasible approaches for further improvements in efficiency, accuracy and applicability of the AFMPB solver to large-scale long-time molecular dynamics simulations. Lastly, the potential of the solver is demonstrated with preliminary numericalmore » results.« less
Neural controller for adaptive movements with unforeseen payloads.

PubMed

Kuperstein, M; Wang, J

1990-01-01

A theory and computer simulation of a neural controller that learns to move and position a link carrying an unforeseen payload accurately are presented. The neural controller learns adaptive dynamic control from its own experience. It does not use information about link mass, link length, or direction of gravity, and it uses only indirect uncalibrated information about payload and actuator limits. Its average positioning accuracy across a large range of payloads after learning is 3% of the positioning range. This neural controller can be used as a basis for coordinating any number of sensory inputs with limbs of any number of joints. The feedforward nature of control allows parallel implementation in real time across multiple joints.

Lattice Boltzmann model for simulation of magnetohydrodynamics

NASA Technical Reports Server (NTRS)

Chen, Shiyi; Chen, Hudong; Martinez, Daniel; Matthaeus, William

1991-01-01

A numerical method, based on a discrete Boltzmann equation, is presented for solving the equations of magnetohydrodynamics (MHD). The algorithm provides advantages similar to the cellular automaton method in that it is local and easily adapted to parallel computing environments. Because of much lower noise levels and less stringent requirements on lattice size, the method appears to be more competitive with traditional solution methods. Examples show that the model accurately reproduces both linear and nonlinear MHD phenomena.
A heuristic re-mapping algorithm reducing inter-level communication in SAMR applications.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steensland, Johan; Ray, Jaideep

2003-07-01

This paper aims at decreasing execution time for large-scale structured adaptive mesh refinement (SAMR) applications by proposing a new heuristic re-mapping algorithm and experimentally showing its effectiveness in reducing inter-level communication. Tests were done for five different SAMR applications. The overall goal is to engineer a dynamically adaptive meta-partitioner capable of selecting and configuring the most appropriate partitioning strategy at run-time based on current system and application state. Such a metapartitioner can significantly reduce execution times for general SAMR applications. Computer simulations of physical phenomena are becoming increasingly popular as they constitute an important complement to real-life testing. In manymore » cases, such simulations are based on solving partial differential equations by numerical methods. Adaptive methods are crucial to efficiently utilize computer resources such as memory and CPU. But even with adaption, the simulations are computationally demanding and yield huge data sets. Thus parallelization and the efficient partitioning of data become issues of utmost importance. Adaption causes the workload to change dynamically, calling for dynamic (re-) partitioning to maintain efficient resource utilization. The proposed heuristic algorithm reduced inter-level communication substantially. Since the complexity of the proposed algorithm is low, this decrease comes at a relatively low cost. As a consequence, we draw the conclusion that the proposed re-mapping algorithm would be useful to lower overall execution times for many large SAMR applications. Due to its usefulness and its parameterization, the proposed algorithm would constitute a natural and important component of the meta-partitioner.« less
A domain specific language for performance portable molecular dynamics algorithms

NASA Astrophysics Data System (ADS)

Saunders, William Robert; Grant, James; Müller, Eike Hermann

2018-03-01

Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this challenge, we describe a "Separation of Concerns" approach for the development of parallel and optimised MD codes: the science specialist writes code at a high abstraction level in a domain specific language (DSL), which is then translated into efficient computer code by a scientific programmer. In a related context, an abstraction for the solution of partial differential equations with grid based methods has recently been implemented in the (Py)OP2 library. Inspired by this approach, we develop a Python code generation system for molecular dynamics simulations on different parallel architectures, including massively parallel distributed memory systems and GPUs. We demonstrate the efficiency of the auto-generated code by studying its performance and scalability on different hardware and compare it to other state-of-the-art simulation packages. With growing data volumes the extraction of physically meaningful information from the simulation becomes increasingly challenging and requires equally efficient implementations. A particular advantage of our approach is the easy expression of such analysis algorithms. We consider two popular methods for deducing the crystalline structure of a material from the local environment of each atom, show how they can be expressed in our abstraction and implement them in the code generation framework.
Speed tracking and synchronization of multiple motors using ring coupling control and adaptive sliding mode control.

PubMed

Li, Le-Bao; Sun, Ling-Ling; Zhang, Sheng-Zhou; Yang, Qing-Quan

2015-09-01

A new control approach for speed tracking and synchronization of multiple motors is developed, by incorporating an adaptive sliding mode control (ASMC) technique into a ring coupling synchronization control structure. This control approach can stabilize speed tracking of each motor and synchronize its motion with other motors' motion so that speed tracking errors and synchronization errors converge to zero. Moreover, an adaptive law is exploited to estimate the unknown bound of uncertainty, which is obtained in the sense of Lyapunov stability theorem to minimize the control effort and attenuate chattering. Performance comparisons with parallel control, relative coupling control and conventional PI control are investigated on a four-motor synchronization control system. Extensive simulation results show the effectiveness of the proposed control scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Adaptive Mesh Refinement for Microelectronic Device Design

NASA Technical Reports Server (NTRS)

Cwik, Tom; Lou, John; Norton, Charles

1999-01-01

Finite element and finite volume methods are used in a variety of design simulations when it is necessary to compute fields throughout regions that contain varying materials or geometry. Convergence of the simulation can be assessed by uniformly increasing the mesh density until an observable quantity stabilizes. Depending on the electrical size of the problem, uniform refinement of the mesh may be computationally infeasible due to memory limitations. Similarly, depending on the geometric complexity of the object being modeled, uniform refinement can be inefficient since regions that do not need refinement add to the computational expense. In either case, convergence to the correct (measured) solution is not guaranteed. Adaptive mesh refinement methods attempt to selectively refine the region of the mesh that is estimated to contain proportionally higher solution errors. The refinement may be obtained by decreasing the element size (h-refinement), by increasing the order of the element (p-refinement) or by a combination of the two (h-p refinement). A successful adaptive strategy refines the mesh to produce an accurate solution measured against the correct fields without undue computational expense. This is accomplished by the use of a) reliable a posteriori error estimates, b) hierarchal elements, and c) automatic adaptive mesh generation. Adaptive methods are also useful when problems with multi-scale field variations are encountered. These occur in active electronic devices that have thin doped layers and also when mixed physics is used in the calculation. The mesh needs to be fine at and near the thin layer to capture rapid field or charge variations, but can coarsen away from these layers where field variations smoothen and charge densities are uniform. This poster will present an adaptive mesh refinement package that runs on parallel computers and is applied to specific microelectronic device simulations. Passive sensors that operate in the infrared portion of the spectrum as well as active device simulations that model charge transport and Maxwell's equations will be presented.
A METHOD FOR IN-SITU CHARACTERIZATION OF RF HEATING IN PARALLEL TRANSMIT MRI

PubMed Central

Alon, Leeor; Deniz, Cem Murat; Brown, Ryan; Sodickson, Daniel K.; Zhu, Yudong

2012-01-01

In ultra high field magnetic resonance imaging, parallel radio-frequency (RF) transmission presents both opportunities and challenges for specific absorption rate (SAR) management. On one hand, parallel transmission provides flexibility in tailoring electric fields in the body while facilitating magnetization profile control. On the other hand, it increases the complexity of energy deposition as well as possibly exacerbating local SAR by improper design or delivery of RF pulses. This study shows that the information needed to characterize RF heating in parallel transmission is contained within a local power correlation matrix. Building upon a calibration scheme involving a finite number of magnetic resonance thermometry measurements, the present work establishes a way of estimating the local power correlation matrix. Determination of this matrix allows prediction of temperature change for an arbitrary parallel transmit RF pulse. In the case of a three transmit coil MR experiment in a phantom, determination and validation of the power correlation matrix was conducted in less than 200 minutes with induced temperature changes of <4 degrees C. Further optimization and adaptation are possible, and simulations evaluating potential feasibility for in vivo use are presented. The method allows general characteristics indicative of RF coil/pulse safety determined in situ. PMID:22714806
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Smooth adaptive sliding mode vibration control of a flexible parallel manipulator with multiple smart linkages in modal space

NASA Astrophysics Data System (ADS)

Zhang, Quan; Li, Chaodong; Zhang, Jiantao; Zhang, Jianhui

2017-12-01

This paper addresses the dynamic model and active vibration control of a rigid-flexible parallel manipulator with three smart links actuated by three linear ultrasonic motors. To suppress the vibration of three flexible intermediate links under high speed and acceleration, multiple Lead Zirconium Titanate (PZT) sensors and actuators are collocated mounted on each link, forming a smart structure which can achieve self-sensing and self-actuating. The dynamic characteristics and equations of the flexible link incorporated with the PZT sensors and actuator are analyzed and formulated. The smooth adaptive sliding mode based active vibration control is proposed to suppress the vibration of the smart links, and the first and second modes of the three links are targeted to be suppressed in modal space to avoid the spillover phenomenon. Simulations and experiments are implemented to validate the effectiveness of the smart structures and the proposed control laws. Experimental results show that the vibration of the first mode around 92 Hz and the second mode around 240 Hz of the three smart links are reduced respectively by 64.98%, 59.47%, 62.28%, and 45.80%, 36.79%, 33.33%, which further verify the multi-mode vibration control ability of the smooth adaptive sliding mode control law.
Distributed Cerebellar Motor Learning: A Spike-Timing-Dependent Plasticity Model

PubMed Central

Luque, Niceto R.; Garrido, Jesús A.; Naveros, Francisco; Carrillo, Richard R.; D'Angelo, Egidio; Ros, Eduardo

2016-01-01

Deep cerebellar nuclei neurons receive both inhibitory (GABAergic) synaptic currents from Purkinje cells (within the cerebellar cortex) and excitatory (glutamatergic) synaptic currents from mossy fibers. Those two deep cerebellar nucleus inputs are thought to be also adaptive, embedding interesting properties in the framework of accurate movements. We show that distributed spike-timing-dependent plasticity mechanisms (STDP) located at different cerebellar sites (parallel fibers to Purkinje cells, mossy fibers to deep cerebellar nucleus cells, and Purkinje cells to deep cerebellar nucleus cells) in close-loop simulations provide an explanation for the complex learning properties of the cerebellum in motor learning. Concretely, we propose a new mechanistic cerebellar spiking model. In this new model, deep cerebellar nuclei embed a dual functionality: deep cerebellar nuclei acting as a gain adaptation mechanism and as a facilitator for the slow memory consolidation at mossy fibers to deep cerebellar nucleus synapses. Equipping the cerebellum with excitatory (e-STDP) and inhibitory (i-STDP) mechanisms at deep cerebellar nuclei afferents allows the accommodation of synaptic memories that were formed at parallel fibers to Purkinje cells synapses and then transferred to mossy fibers to deep cerebellar nucleus synapses. These adaptive mechanisms also contribute to modulate the deep-cerebellar-nucleus-output firing rate (output gain modulation toward optimizing its working range). PMID:26973504
Balancing exploration, uncertainty and computational demands in many objective reservoir optimization

NASA Astrophysics Data System (ADS)

Zatarain Salazar, Jazmin; Reed, Patrick M.; Quinn, Julianne D.; Giuliani, Matteo; Castelletti, Andrea

2017-11-01

Reservoir operations are central to our ability to manage river basin systems serving conflicting multi-sectoral demands under increasingly uncertain futures. These challenges motivate the need for new solution strategies capable of effectively and efficiently discovering the multi-sectoral tradeoffs that are inherent to alternative reservoir operation policies. Evolutionary many-objective direct policy search (EMODPS) is gaining importance in this context due to its capability of addressing multiple objectives and its flexibility in incorporating multiple sources of uncertainties. This simulation-optimization framework has high potential for addressing the complexities of water resources management, and it can benefit from current advances in parallel computing and meta-heuristics. This study contributes a diagnostic assessment of state-of-the-art parallel strategies for the auto-adaptive Borg Multi Objective Evolutionary Algorithm (MOEA) to support EMODPS. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple sectoral demands from hydropower production, urban water supply, recreation and environmental flows need to be balanced. Using EMODPS with different parallel configurations of the Borg MOEA, we optimize operating policies over different size ensembles of synthetic streamflows and evaporation rates. As we increase the ensemble size, we increase the statistical fidelity of our objective function evaluations at the cost of higher computational demands. This study demonstrates how to overcome the mathematical and computational barriers associated with capturing uncertainties in stochastic multiobjective reservoir control optimization, where parallel algorithmic search serves to reduce the wall-clock time in discovering high quality representations of key operational tradeoffs. Our results show that emerging self-adaptive parallelization schemes exploiting cooperative search populations are crucial. Such strategies provide a promising new set of tools for effectively balancing exploration, uncertainty, and computational demands when using EMODPS.
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
RISC-type microprocessors may revolutionize aerospace simulation

NASA Astrophysics Data System (ADS)

Jackson, Albert S.

The author explores the application of RISC (reduced instruction set computer) processors in massively parallel computer (MPC) designs for aerospace simulation. The MPC approach is shown to be well adapted to the needs of aerospace simulation. It is shown that any of the three common types of interconnection schemes used with MPCs are effective for general-purpose simulation, although the bus-or switch-oriented machines are somewhat easier to use. For partial differential equation models, the hypercube approach at first glance appears more efficient because the nearest-neighbor connections required for three-dimensional models are hardwired in a hypercube machine. However, the data broadcast ability of a bus system, combined with the fact that data can be transmitted over a bus as soon as it has been updated, makes the bus approach very competitive with the hypercube approach even for these types of models.
Comparison of DAC and MONACO DSMC Codes with Flat Plate Simulation

NASA Technical Reports Server (NTRS)

Padilla, Jose F.

2010-01-01

Various implementations of the direct simulation Monte Carlo (DSMC) method exist in academia, government and industry. By comparing implementations, deficiencies and merits of each can be discovered. This document reports comparisons between DSMC Analysis Code (DAC) and MONACO. DAC is NASA's standard DSMC production code and MONACO is a research DSMC code developed in academia. These codes have various differences; in particular, they employ distinct computational grid definitions. In this study, DAC and MONACO are compared by having each simulate a blunted flat plate wind tunnel test, using an identical volume mesh. Simulation expense and DSMC metrics are compared. In addition, flow results are compared with available laboratory data. Overall, this study revealed that both codes, excluding grid adaptation, performed similarly. For parallel processing, DAC was generally more efficient. As expected, code accuracy was mainly dependent on physical models employed.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao

2014-02-01

We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

DOE PAGES

Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

2013-01-01

Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Sparse-view photoacoustic tomography using virtual parallel-projections and spatially adaptive filtering

NASA Astrophysics Data System (ADS)

Wang, Yihan; Lu, Tong; Wan, Wenbo; Liu, Lingling; Zhang, Songhe; Li, Jiao; Zhao, Huijuan; Gao, Feng

2018-02-01

To fully realize the potential of photoacoustic tomography (PAT) in preclinical and clinical applications, rapid measurements and robust reconstructions are needed. Sparse-view measurements have been adopted effectively to accelerate the data acquisition. However, since the reconstruction from the sparse-view sampling data is challenging, both of the effective measurement and the appropriate reconstruction should be taken into account. In this study, we present an iterative sparse-view PAT reconstruction scheme where a virtual parallel-projection concept matching for the proposed measurement condition is introduced to help to achieve the "compressive sensing" procedure of the reconstruction, and meanwhile the spatially adaptive filtering fully considering the a priori information of the mutually similar blocks existing in natural images is introduced to effectively recover the partial unknown coefficients in the transformed domain. Therefore, the sparse-view PAT images can be reconstructed with higher quality compared with the results obtained by the universal back-projection (UBP) algorithm in the same sparse-view cases. The proposed approach has been validated by simulation experiments, which exhibits desirable performances in image fidelity even from a small number of measuring positions.
The effect of selection environment on the probability of parallel evolution.

PubMed

Bailey, Susan F; Rodrigue, Nicolas; Kassen, Rees

2015-06-01

Across the great diversity of life, there are many compelling examples of parallel and convergent evolution-similar evolutionary changes arising in independently evolving populations. Parallel evolution is often taken to be strong evidence of adaptation occurring in populations that are highly constrained in their genetic variation. Theoretical models suggest a few potential factors driving the probability of parallel evolution, but experimental tests are needed. In this study, we quantify the degree of parallel evolution in 15 replicate populations of Pseudomonas fluorescens evolved in five different environments that varied in resource type and arrangement. We identified repeat changes across multiple levels of biological organization from phenotype, to gene, to nucleotide, and tested the impact of 1) selection environment, 2) the degree of adaptation, and 3) the degree of heterogeneity in the environment on the degree of parallel evolution at the gene-level. We saw, as expected, that parallel evolution occurred more often between populations evolved in the same environment; however, the extent of parallel evolution varied widely. The degree of adaptation did not significantly explain variation in the extent of parallelism in our system but number of available beneficial mutations correlated negatively with parallel evolution. In addition, degree of parallel evolution was significantly higher in populations evolved in a spatially structured, multiresource environment, suggesting that environmental heterogeneity may be an important factor constraining adaptation. Overall, our results stress the importance of environment in driving parallel evolutionary changes and point to a number of avenues for future work for understanding when evolution is predictable. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
High accuracy mantle convection simulation through modern numerical methods - II: realistic models and problems

NASA Astrophysics Data System (ADS)

Heister, Timo; Dannberg, Juliane; Gassmöller, Rene; Bangerth, Wolfgang

2017-08-01

Computations have helped elucidate the dynamics of Earth's mantle for several decades already. The numerical methods that underlie these simulations have greatly evolved within this time span, and today include dynamically changing and adaptively refined meshes, sophisticated and efficient solvers, and parallelization to large clusters of computers. At the same time, many of the methods - discussed in detail in a previous paper in this series - were developed and tested primarily using model problems that lack many of the complexities that are common to the realistic models our community wants to solve today. With several years of experience solving complex and realistic models, we here revisit some of the algorithm designs of the earlier paper and discuss the incorporation of more complex physics. In particular, we re-consider time stepping and mesh refinement algorithms, evaluate approaches to incorporate compressibility, and discuss dealing with strongly varying material coefficients, latent heat, and how to track chemical compositions and heterogeneities. Taken together and implemented in a high-performance, massively parallel code, the techniques discussed in this paper then allow for high resolution, 3-D, compressible, global mantle convection simulations with phase transitions, strongly temperature dependent viscosity and realistic material properties based on mineral physics data.
Feasibility of using the Massively Parallel Processor for large eddy simulations and other Computational Fluid Dynamics applications

NASA Technical Reports Server (NTRS)

Bruno, John

1984-01-01

The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Parallel simulation today

NASA Technical Reports Server (NTRS)

Nicol, David; Fujimoto, Richard

1992-01-01

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
High Resolution Simulations of Arctic Sea Ice, 1979-1993

DTIC Science & Technology

2003-01-01

William H. Lipscomb * PO[ARISSP To evaluate improvements in modelling Arctic sea ice, we compare results from two regional models at 1/120 horizontal...resolution. The first is a coupled ice-ocean model of the Arctic Ocean, consisting of an ocean model (adapted from the Parallel Ocean Program, Los...Alamos National Laboratory [LANL]) and the "old" sea ice model . The second model uses the same grid but consists of an improved "new" sea ice model (LANL
System Level Applications of Adaptive Computing (SLAAC)

DTIC Science & Technology

2003-11-01

saved clock cycles, as the computation cycle time was directly proportional to the number of bitplanes in the image. The simulation was undertaken in...S-1][D -1] SK E W E R [k+K S-1][0] SK E W E R [k+K S-1][1] MinMax MinMax MinMax Min - IdxMin Max - IdxMax 0 Figure 3: PPI algorithm architeture ...parallel processing of data. The total throughput in these extended architectures is directly proportional to the amount of resources (CLB slices
Porting plasma physics simulation codes to modern computing architectures using the libmrc framework

NASA Astrophysics Data System (ADS)

Germaschewski, Kai; Abbott, Stephen

2015-11-01

Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source libmrc framework that has been used to modularize and port three plasma physics codes: The extended MHD code MRCv3 with implicit time integration and curvilinear grids; the OpenGGCM global magnetosphere model; and the particle-in-cell code PSC. libmrc consolidates basic functionality needed for simulations based on structured grids (I/O, load balancing, time integrators), and also introduces a parallel object model that makes it possible to maintain multiple implementations of computational kernels, on e.g. conventional processors and GPUs. It handles data layout conversions and enables us to port performance-critical parts of a code to a new architecture step-by-step, while the rest of the code can remain unchanged. We will show examples of the performance gains and some physics applications.
SIERRA Low Mach Module: Fuego User Manual Version 4.46.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal/Fluid Team

2017-09-01

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
SIERRA Low Mach Module: Fuego Theory Manual Version 4.44

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal /Fluid Team

2017-04-01

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
SIERRA Low Mach Module: Fuego Theory Manual Version 4.46.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sierra Thermal/Fluid Team

The SIERRA Low Mach Module: Fuego along with the SIERRA Participating Media Radiation Module: Syrinx, henceforth referred to as Fuego and Syrinx, respectively, are the key elements of the ASCI fire environment simulation project. The fire environment simulation project is directed at characterizing both open large-scale pool fires and building enclosure fires. Fuego represents the turbulent, buoyantly-driven incompressible flow, heat transfer, mass transfer, combustion, soot, and absorption coefficient model portion of the simulation software. Syrinx represents the participating-media thermal radiation mechanics. This project is an integral part of the SIERRA multi-mechanics software development project. Fuego depends heavily upon the coremore » architecture developments provided by SIERRA for massively parallel computing, solution adaptivity, and mechanics coupling on unstructured grids.« less
Comparison of wavefront sensor models for simulation of adaptive optics.

PubMed

Wu, Zhiwen; Enmark, Anita; Owner-Petersen, Mette; Andersen, Torben

2009-10-26

The new generation of extremely large telescopes will have adaptive optics. Due to the complexity and cost of such systems, it is important to simulate their performance before construction. Most systems planned will have Shack-Hartmann wavefront sensors. Different mathematical models are available for simulation of such wavefront sensors. The choice of wavefront sensor model strongly influences computation time and simulation accuracy. We have studied the influence of three wavefront sensor models on performance calculations for a generic, adaptive optics (AO) system designed for K-band operation of a 42 m telescope. The performance of this AO system has been investigated both for reduced wavelengths and for reduced r(0) in the K band. The telescope AO system was designed for K-band operation, that is both the subaperture size and the actuator pitch were matched to a fixed value of r(0) in the K-band. We find that under certain conditions, such as investigating limiting guide star magnitude for large Strehl-ratios, a full model based on Fraunhofer propagation to the subimages is significantly more accurate. It does however require long computation times. The shortcomings of simpler models based on either direct use of average wavefront tilt over the subapertures for actuator control, or use of the average tilt to move a precalculated point spread function in the subimages are most pronounced for studies of system limitations to operating parameter variations. In the long run, efficient parallelization techniques may be developed to overcome the problem.
On magnetic field amplification and particle acceleration near non-relativistic astrophysical shocks: particles in MHD cells simulations

NASA Astrophysics Data System (ADS)

van Marle, Allard Jan; Casse, Fabien; Marcowith, Alexandre

2018-01-01

We present simulations of magnetized astrophysical shocks taking into account the interplay between the thermal plasma of the shock and suprathermal particles. Such interaction is depicted by combining a grid-based magnetohydrodynamics description of the thermal fluid with particle in cell techniques devoted to the dynamics of suprathermal particles. This approach, which incorporates the use of adaptive mesh refinement features, is potentially a key to simulate astrophysical systems on spatial scales that are beyond the reach of pure particle-in-cell simulations. We consider in this study non-relativistic shocks with various Alfvénic Mach numbers and magnetic field obliquity. We recover all the features of both magnetic field amplification and particle acceleration from previous studies when the magnetic field is parallel to the normal to the shock. In contrast with previous particle-in-cell-hybrid simulations, we find that particle acceleration and magnetic field amplification also occur when the magnetic field is oblique to the normal to the shock but on larger time-scales than in the parallel case. We show that in our simulations, the suprathermal particles are experiencing acceleration thanks to a pre-heating process of the particle similar to a shock drift acceleration leading to the corrugation of the shock front. Such oscillations of the shock front and the magnetic field locally help the particles to enter the upstream region and to initiate a non-resonant streaming instability and finally to induce diffuse particle acceleration.
PHAST: Protein-like heteropolymer analysis by statistical thermodynamics

NASA Astrophysics Data System (ADS)

Frigori, Rafael B.

2017-06-01

PHAST is a software package written in standard Fortran, with MPI and CUDA extensions, able to efficiently perform parallel multicanonical Monte Carlo simulations of single or multiple heteropolymeric chains, as coarse-grained models for proteins. The outcome data can be straightforwardly analyzed within its microcanonical Statistical Thermodynamics module, which allows for computing the entropy, caloric curve, specific heat and free energies. As a case study, we investigate the aggregation of heteropolymers bioinspired on Aβ25-33 fragments and their cross-seeding with IAPP20-29 isoforms. Excellent parallel scaling is observed, even under numerically difficult first-order like phase transitions, which are properly described by the built-in fully reconfigurable force fields. Still, the package is free and open source, this shall motivate users to readily adapt it to specific purposes.
Direct Machining of Low-Loss THz Waveguide Components With an RF Choke.

PubMed

Lewis, Samantha M; Nanni, Emilio A; Temkin, Richard J

2014-12-01

We present results for the successful fabrication of low-loss THz metallic waveguide components using direct machining with a CNC end mill. The approach uses a split-block machining process with the addition of an RF choke running parallel to the waveguide. The choke greatly reduces coupling to the parasitic mode of the parallel-plate waveguide produced by the split-block. This method has demonstrated loss as low as 0.2 dB/cm at 280 GHz for a copper WR-3 waveguide. It has also been used in the fabrication of 3 and 10 dB directional couplers in brass, demonstrating excellent agreement with design simulations from 240-260 GHz. The method may be adapted to structures with features on the order of 200 μm.
Reactive Transport Modeling of Induced Calcite Precipitation Reaction Fronts in Porous Media Using A Parallel, Fully Coupled, Fully Implicit Approach

NASA Astrophysics Data System (ADS)

Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.

2010-12-01

Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Parallel Adaptive Mesh Refinement Library

NASA Technical Reports Server (NTRS)

Mac-Neice, Peter; Olson, Kevin

2005-01-01

Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
A.I.-based real-time support for high performance aircraft operations

NASA Technical Reports Server (NTRS)

Vidal, J. J.

1985-01-01

Artificial intelligence (AI) based software and hardware concepts are applied to the handling system malfunctions during flight tests. A representation of malfunction procedure logic using Boolean normal forms are presented. The representation facilitates the automation of malfunction procedures and provides easy testing for the embedded rules. It also forms a potential basis for a parallel implementation in logic hardware. The extraction of logic control rules, from dynamic simulation and their adaptive revision after partial failure are examined. It uses a simplified 2-dimensional aircraft model with a controller that adaptively extracts control rules for directional thrust that satisfies a navigational goal without exceeding pre-established position and velocity limits. Failure recovery (rule adjusting) is examined after partial actuator failure. While this experiment was performed with primitive aircraft and mission models, it illustrates an important paradigm and provided complexity extrapolations for the proposed extraction of expertise from simulation, as discussed. The use of relaxation and inexact reasoning in expert systems was also investigated.
Adaptive Strategies for Controls of Flexible Arms. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Yuan, Bau-San

1989-01-01

An adaptive controller for a modern manipulator has been designed based on asymptotical stability via the Lyapunov criterion with the output error between the system and a reference model used as the actuating control signal. Computer simulations were carried out to test the design. The combination of the adaptive controller and a system vibration and mode shape estimator show that the flexible arm should move along a pre-defined trajectory with high-speed motion and fast vibration setting time. An existing computer-controlled prototype two link manipulator, RALF (Robotic Arm, Large Flexible), with a parallel mechanism driven by hydraulic actuators was used to verify the mathematical analysis. The experimental results illustrate that assumed modes found from finite element techniques can be used to derive the equations of motion with acceptable accuracy. The robust adaptive (modal) control is implemented to compensate for unmodelled modes and nonlinearities and is compared with the joint feedback control in additional experiments. Preliminary results show promise for the experimental control algorithm.
A DAFT DL_POLY distributed memory adaptation of the Smoothed Particle Mesh Ewald method

NASA Astrophysics Data System (ADS)

Bush, I. J.; Todorov, I. T.; Smith, W.

2006-09-01

The Smoothed Particle Mesh Ewald method [U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G. Pedersen, J. Chem. Phys. 103 (1995) 8577] for calculating long ranged forces in molecular simulation has been adapted for the parallel molecular dynamics code DL_POLY_3 [I.T. Todorov, W. Smith, Philos. Trans. Roy. Soc. London 362 (2004) 1835], making use of a novel 3D Fast Fourier Transform (DAFT) [I.J. Bush, The Daresbury Advanced Fourier transform, Daresbury Laboratory, 1999] that perfectly matches the Domain Decomposition (DD) parallelisation strategy [W. Smith, Comput. Phys. Comm. 62 (1991) 229; M.R.S. Pinches, D. Tildesley, W. Smith, Mol. Sim. 6 (1991) 51; D. Rapaport, Comput. Phys. Comm. 62 (1991) 217] of the DL_POLY_3 code. In this article we describe software adaptations undertaken to import this functionality and provide a review of its performance.
ITG-TEM turbulence simulation with bounce-averaged kinetic electrons in tokamak geometry

NASA Astrophysics Data System (ADS)

Kwon, Jae-Min; Qi, Lei; Yi, S.; Hahm, T. S.

2017-06-01

We develop a novel numerical scheme to simulate electrostatic turbulence with kinetic electron responses in magnetically confined toroidal plasmas. Focusing on ion gyro-radius scale turbulences with slower frequencies than the time scales for electron parallel motions, we employ and adapt the bounce-averaged kinetic equation to model trapped electrons for nonlinear turbulence simulation with Coulomb collisions. Ions are modeled by employing the gyrokinetic equation. The newly developed scheme is implemented on a global δf particle in cell code gKPSP. By performing linear and nonlinear simulations, it is demonstrated that the new scheme can reproduce key physical properties of Ion Temperature Gradient (ITG) and Trapped Electron Mode (TEM) instabilities, and resulting turbulent transport. The overall computational cost of kinetic electrons using this novel scheme is limited to 200%-300% of the cost for simulations with adiabatic electrons. Therefore the new scheme allows us to perform kinetic simulations with trapped electrons very efficiently in magnetized plasmas.
A Parallel Implementation of Multilevel Recursive Spectral Bisection for Application to Adaptive Unstructured Meshes. Chapter 1

NASA Technical Reports Server (NTRS)

Barnard, Stephen T.; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

The design of a parallel implementation of multilevel recursive spectral bisection is described. The goal is to implement a code that is fast enough to enable dynamic repartitioning of adaptive meshes.

Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Active illuminated space object imaging and tracking simulation

NASA Astrophysics Data System (ADS)

Yue, Yufang; Xie, Xiaogang; Luo, Wen; Zhang, Feizhou; An, Jianzhu

2016-10-01

Optical earth imaging simulation of a space target in orbit and it's extraction in laser illumination condition were discussed. Based on the orbit and corresponding attitude of a satellite, its 3D imaging rendering was built. General simulation platform was researched, which was adaptive to variable 3D satellite models and relative position relationships between satellite and earth detector system. Unified parallel projection technology was proposed in this paper. Furthermore, we denoted that random optical distribution in laser-illuminated condition was a challenge for object discrimination. Great randomicity of laser active illuminating speckles was the primary factor. The conjunction effects of multi-frame accumulation process and some tracking methods such as Meanshift tracking, contour poid, and filter deconvolution were simulated. Comparison of results illustrates that the union of multi-frame accumulation and contour poid was recommendable for laser active illuminated images, which had capacities of high tracking precise and stability for multiple object attitudes.
Using the GeoFEST Faulted Region Simulation System

NASA Technical Reports Server (NTRS)

Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

2004-01-01

GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
Parallel evolution of a type IV secretion system in radiating lineages of the host-restricted bacterial pathogen Bartonella.

PubMed

Engel, Philipp; Salzburger, Walter; Liesch, Marius; Chang, Chao-Chin; Maruyama, Soichi; Lanz, Christa; Calteau, Alexandra; Lajus, Aurélie; Médigue, Claudine; Schuster, Stephan C; Dehio, Christoph

2011-02-10

Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment.
Parallel Evolution of a Type IV Secretion System in Radiating Lineages of the Host-Restricted Bacterial Pathogen Bartonella

PubMed Central

Engel, Philipp; Salzburger, Walter; Liesch, Marius; Chang, Chao-Chin; Maruyama, Soichi; Lanz, Christa; Calteau, Alexandra; Lajus, Aurélie; Médigue, Claudine; Schuster, Stephan C.; Dehio, Christoph

2011-01-01

Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment. PMID:21347280
Haptic adaptation to slant: No transfer between exploration modes

PubMed Central

van Dam, Loes C. J.; Plaisier, Myrthe A.; Glowania, Catharina; Ernst, Marc O.

2016-01-01

Human touch is an inherently active sense: to estimate an object’s shape humans often move their hand across its surface. This way the object is sampled both in a serial (sampling different parts of the object across time) and parallel fashion (sampling using different parts of the hand simultaneously). Both the serial (moving a single finger) and parallel (static contact with the entire hand) exploration modes provide reliable and similar global shape information, suggesting the possibility that this information is shared early in the sensory cortex. In contrast, we here show the opposite. Using an adaptation-and-transfer paradigm, a change in haptic perception was induced by slant-adaptation using either the serial or parallel exploration mode. A unified shape-based coding would predict that this would equally affect perception using other exploration modes. However, we found that adaptation-induced perceptual changes did not transfer between exploration modes. Instead, serial and parallel exploration components adapted simultaneously, but to different kinaesthetic aspects of exploration behaviour rather than object-shape per se. These results indicate that a potential combination of information from different exploration modes can only occur at down-stream cortical processing stages, at which adaptation is no longer effective. PMID:27698392
Balancing Exploration, Uncertainty Representation and Computational Time in Many-Objective Reservoir Policy Optimization

NASA Astrophysics Data System (ADS)

Zatarain-Salazar, J.; Reed, P. M.; Quinn, J.; Giuliani, M.; Castelletti, A.

2016-12-01

As we confront the challenges of managing river basin systems with a large number of reservoirs and increasingly uncertain tradeoffs impacting their operations (due to, e.g. climate change, changing energy markets, population pressures, ecosystem services, etc.), evolutionary many-objective direct policy search (EMODPS) solution strategies will need to address the computational demands associated with simulating more uncertainties and therefore optimizing over increasingly noisy objective evaluations. Diagnostic assessments of state-of-the-art many-objective evolutionary algorithms (MOEAs) to support EMODPS have highlighted that search time (or number of function evaluations) and auto-adaptive search are key features for successful optimization. Furthermore, auto-adaptive MOEA search operators are themselves sensitive to having a sufficient number of function evaluations to learn successful strategies for exploring complex spaces and for escaping from local optima when stagnation is detected. Fortunately, recent parallel developments allow coordinated runs that enhance auto-adaptive algorithmic learning and can handle scalable and reliable search with limited wall-clock time, but at the expense of the total number of function evaluations. In this study, we analyze this tradeoff between parallel coordination and depth of search using different parallelization schemes of the Multi-Master Borg on a many-objective stochastic control problem. We also consider the tradeoff between better representing uncertainty in the stochastic optimization, and simplifying this representation to shorten the function evaluation time and allow for greater search. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple competing objectives for hydropower production, urban water supply, recreation and environmental flows need to be balanced. Our results provide guidance for balancing exploration, uncertainty, and computational demands when using the EMODPS framework to discover key tradeoffs within the LSRB system.
Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Software

DTIC Science & Technology

2015-08-01

Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten and James P Larentzos Approved for...Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten Weapons and Materials Research Directorate, ARL James P Larentzos Engility...Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software 5a. CONTRACT NUMBER 5b
Widespread parallel population adaptation to climate variation across a radiation: implications for adaptation to climate change.

PubMed

Thorpe, Roger S; Barlow, Axel; Malhotra, Anita; Surget-Groba, Yann

2015-03-01

Global warming will impact species in a number of ways, and it is important to know the extent to which natural populations can adapt to anthropogenic climate change by natural selection. Parallel microevolution within separate species can demonstrate natural selection, but several studies of homoplasy have not yet revealed examples of widespread parallel evolution in a generic radiation. Taking into account primary phylogeographic divisions, we investigate numerous quantitative traits (size, shape, scalation, colour pattern and hue) in anole radiations from the mountainous Lesser Antillean islands. Adaptation to climatic differences can lead to very pronounced differences between spatially close populations with all studied traits showing some evidence of parallel evolution. Traits from shape, scalation, pattern and hue (particularly the latter) show widespread evolutionary parallels within these species in response to altitudinal climate variation greater than extreme anthropogenic climate change predicted for 2080. This gives strong evidence of the ability to adapt to climate variation by natural selection throughout this radiation. As anoles can evolve very rapidly, it suggests anthropogenic climate change is likely to be less of a conservation threat than other factors, such as habitat loss and invasive species, in this, Lesser Antillean, biodiversity hot spot. © 2015 John Wiley & Sons Ltd.
The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
Neuromimetic Circuits with Synaptic Devices Based on Strongly Correlated Electron Systems

NASA Astrophysics Data System (ADS)

Ha, Sieu D.; Shi, Jian; Meroz, Yasmine; Mahadevan, L.; Ramanathan, Shriram

2014-12-01

Strongly correlated electron systems such as the rare-earth nickelates (R NiO3 , R denotes a rare-earth element) can exhibit synapselike continuous long-term potentiation and depression when gated with ionic liquids; exploiting the extreme sensitivity of coupled charge, spin, orbital, and lattice degrees of freedom to stoichiometry. We present experimental real-time, device-level classical conditioning and unlearning using nickelate-based synaptic devices in an electronic circuit compatible with both excitatory and inhibitory neurons. We establish a physical model for the device behavior based on electric-field-driven coupled ionic-electronic diffusion that can be utilized for design of more complex systems. We use the model to simulate a variety of associate and nonassociative learning mechanisms, as well as a feedforward recurrent network for storing memory. Our circuit intuitively parallels biological neural architectures, and it can be readily generalized to other forms of cellular learning and extinction. The simulation of neural function with electronic device analogs may provide insight into biological processes such as decision making, learning, and adaptation, while facilitating advanced parallel information processing in hardware.
Extensible Adaptable Simulation Systems: Supporting Multiple Fidelity Simulations in a Common Environment

NASA Technical Reports Server (NTRS)

McLaughlin, Brian J.; Barrett, Larry K.

2012-01-01

Common practice in the development of simulation systems is meeting all user requirements within a single instantiation. The Joint Polar Satellite System (JPSS) presents a unique challenge to establish a simulation environment that meets the needs of a diverse user community while also spanning a multi-mission environment over decades of operation. In response, the JPSS Flight Vehicle Test Suite (FVTS) is architected with an extensible infrastructure that supports the operation of multiple observatory simulations for a single mission and multiple mission within a common system perimeter. For the JPSS-1 satellite, multiple fidelity flight observatory simulations are necessary to support the distinct user communities consisting of the Common Ground System development team, the Common Ground System Integration & Test team, and the Mission Rehearsal Team/Mission Operations Team. These key requirements present several challenges to FVTS development. First, the FVTS must ensure all critical user requirements are satisfied by at least one fidelity instance of the observatory simulation. Second, the FVTS must allow for tailoring of the system instances to function in diverse operational environments from the High-security operations environment at NOAA Satellite Operations Facility (NSOF) to the ground system factory floor. Finally, the FVTS must provide the ability to execute sustaining engineering activities on a subset of the system without impacting system availability to parallel users. The FVTS approach of allowing for multiple fidelity copies of observatory simulations represents a unique concept in simulator capability development and corresponds to the JPSS Ground System goals of establishing a capability that is flexible, extensible, and adaptable.
An analysis of adaptive design variations on the sequential parallel comparison design for clinical trials.

PubMed

Mi, Michael Y; Betensky, Rebecca A

2013-04-01

Currently, a growing placebo response rate has been observed in clinical trials for antidepressant drugs, a phenomenon that has made it increasingly difficult to demonstrate efficacy. The sequential parallel comparison design (SPCD) is a clinical trial design that was proposed to address this issue. The SPCD theoretically has the potential to reduce the sample-size requirement for a clinical trial and to simultaneously enrich the study population to be less responsive to the placebo. Because the basic SPCD already reduces the placebo response by removing placebo responders between the first and second phases of a trial, the purpose of this study was to examine whether we can further improve the efficiency of the basic SPCD and whether we can do so when the projected underlying drug and placebo response rates differ considerably from the actual ones. Three adaptive designs that used interim analyses to readjust the length of study duration for individual patients were tested to reduce the sample-size requirement or increase the statistical power of the SPCD. Various simulations of clinical trials using the SPCD with interim analyses were conducted to test these designs through calculations of empirical power. From the simulations, we found that the adaptive designs can recover unnecessary resources spent in the traditional SPCD trial format with overestimated initial sample sizes and provide moderate gains in power. Under the first design, results showed up to a 25% reduction in person-days, with most power losses below 5%. In the second design, results showed up to a 8% reduction in person-days with negligible loss of power. In the third design using sample-size re-estimation, up to 25% power was recovered from underestimated sample-size scenarios. Given the numerous possible test parameters that could have been chosen for the simulations, the study's results are limited to situations described by the parameters that were used and may not generalize to all possible scenarios. Furthermore, dropout of patients is not considered in this study. It is possible to make an already complex design such as the SPCD adaptive, and thus more efficient, potentially overcoming the problem of placebo response at lower cost. Ultimately, such a design may expedite the approval of future effective treatments.
An analysis of adaptive design variations on the sequential parallel comparison design for clinical trials

PubMed Central

Mi, Michael Y.; Betensky, Rebecca A.

2013-01-01

Background Currently, a growing placebo response rate has been observed in clinical trials for antidepressant drugs, a phenomenon that has made it increasingly difficult to demonstrate efficacy. The sequential parallel comparison design (SPCD) is a clinical trial design that was proposed to address this issue. The SPCD theoretically has the potential to reduce the sample size requirement for a clinical trial and to simultaneously enrich the study population to be less responsive to the placebo. Purpose Because the basic SPCD design already reduces the placebo response by removing placebo responders between the first and second phases of a trial, the purpose of this study was to examine whether we can further improve the efficiency of the basic SPCD and if we can do so when the projected underlying drug and placebo response rates differ considerably from the actual ones. Methods Three adaptive designs that used interim analyses to readjust the length of study duration for individual patients were tested to reduce the sample size requirement or increase the statistical power of the SPCD. Various simulations of clinical trials using the SPCD with interim analyses were conducted to test these designs through calculations of empirical power. Results From the simulations, we found that the adaptive designs can recover unnecessary resources spent in the traditional SPCD trial format with overestimated initial sample sizes and provide moderate gains in power. Under the first design, results showed up to a 25% reduction in person-days, with most power losses below 5%. In the second design, results showed up to a 8% reduction in person-days with negligible loss of power. In the third design using sample size re-estimation, up to 25% power was recovered from underestimated sample size scenarios. Limitations Given the numerous possible test parameters that could have been chosen for the simulations, the study’s results are limited to situations described by the parameters that were used, and may not generalize to all possible scenarios. Furthermore, drop-out of patients is not considered in this study. Conclusions It is possible to make an already complex design such as the SPCD adaptive, and thus more efficient, potentially overcoming the problem of placebo response at lower cost. Ultimately, such a design may expedite the approval of future effective treatments. PMID:23283576
High-Resolution Numerical Simulation and Analysis of Mach Reflection Structures in Detonation Waves in Low-Pressure H 2 –O 2 –Ar Mixtures: A Summary of Results Obtained with the Adaptive Mesh Refinement Framework AMROC

DOE PAGES

Deiterding, Ralf

2011-01-01

Numerical simulation can be key to the understanding of the multidimensional nature of transient detonation waves. However, the accurate approximation of realistic detonations is demanding as a wide range of scales needs to be resolved. This paper describes a successful solution strategy that utilizes logically rectangular dynamically adaptive meshes. The hydrodynamic transport scheme and the treatment of the nonequilibrium reaction terms are sketched. A ghost fluid approach is integrated into the method to allow for embedded geometrically complex boundaries. Large-scale parallel simulations of unstable detonation structures of Chapman-Jouguet detonations in low-pressure hydrogen-oxygen-argon mixtures demonstrate the efficiency of the described techniquesmore » in practice. In particular, computations of regular cellular structures in two and three space dimensions and their development under transient conditions, that is, under diffraction and for propagation through bends are presented. Some of the observed patterns are classified by shock polar analysis, and a diagram of the transition boundaries between possible Mach reflection structures is constructed.« less
Comparing oncology clinical programs by use of innovative designs and expected net present value optimization: Which adaptive approach leads to the best result?

PubMed

Parke, Tom; Marchenko, Olga; Anisimov, Vladimir; Ivanova, Anastasia; Jennison, Christopher; Perevozskaya, Inna; Song, Guochen

2017-01-01

Designing an oncology clinical program is more challenging than designing a single study. The standard approaches have been proven to be not very successful during the last decade; the failure rate of Phase 2 and Phase 3 trials in oncology remains high. Improving a development strategy by applying innovative statistical methods is one of the major objectives of a drug development process. The oncology sub-team on Adaptive Program under the Drug Information Association Adaptive Design Scientific Working Group (DIA ADSWG) evaluated hypothetical oncology programs with two competing treatments and published the work in the Therapeutic Innovation and Regulatory Science journal in January 2014. Five oncology development programs based on different Phase 2 designs, including adaptive designs and a standard two parallel arm Phase 3 design were simulated and compared in terms of the probability of clinical program success and expected net present value (eNPV). In this article, we consider eight Phase2/Phase3 development programs based on selected combinations of five Phase 2 study designs and three Phase 3 study designs. We again used the probability of program success and eNPV to compare simulated programs. For the development strategies, we considered that the eNPV showed robust improvement for each successive strategy, with the highest being for a three-arm response adaptive randomization design in Phase 2 and a group sequential design with 5 analyses in Phase 3.
Adaptive particle-based pore-level modeling of incompressible fluid flow in porous media: a direct and parallel approach

NASA Astrophysics Data System (ADS)

Ovaysi, S.; Piri, M.

2009-12-01

We present a three-dimensional fully dynamic parallel particle-based model for direct pore-level simulation of incompressible viscous fluid flow in disordered porous media. The model was developed from scratch and is capable of simulating flow directly in three-dimensional high-resolution microtomography images of naturally occurring or man-made porous systems. It reads the images as input where the position of the solid walls are given. The entire medium, i.e., solid and fluid, is then discretized using particles. The model is based on Moving Particle Semi-implicit (MPS) technique. We modify this technique in order to improve its stability. The model handles highly irregular fluid-solid boundaries effectively. It takes into account viscous pressure drop in addition to the gravity forces. It conserves mass and can automatically detect any false connectivity with fluid particles in the neighboring pores and throats. It includes a sophisticated algorithm to automatically split and merge particles to maintain hydraulic connectivity of extremely narrow conduits. Furthermore, it uses novel methods to handle particle inconsistencies and open boundaries. To handle the computational load, we present a fully parallel version of the model that runs on distributed memory computer clusters and exhibits excellent scalability. The model is used to simulate unsteady-state flow problems under different conditions starting from straight noncircular capillary tubes with different cross-sectional shapes, i.e., circular/elliptical, square/rectangular and triangular cross-sections. We compare the predicted dimensionless hydraulic conductances with the data available in the literature and observe an excellent agreement. We then test the scalability of our parallel model with two samples of an artificial sandstone, samples A and B, with different volumes and different distributions (non-uniform and uniform) of solid particles among the processors. An excellent linear scalability is obtained for sample B that has more uniform distribution of solid particles leading to a superior load balancing. The model is then used to simulate fluid flow directly in REV size three-dimensional x-ray images of a naturally occurring sandstone. We analyze the quality and consistency of the predicted flow behavior and calculate absolute permeability, which compares well with the available network modeling and Lattice-Boltzmann permeabilities available in the literature for the same sandstone. We show that the model conserves mass very well and is stable computationally even at very narrow fluid conduits. The transient- and the steady-state fluid flow patterns are presented as well as the steady-state flow rates to compute absolute permeability. Furthermore, we discuss the vital role of our adaptive particle resolution scheme in preserving the original pore connectivity of the samples and their narrow channels through splitting and merging of fluid particles.
An adaptable parallel algorithm for the direct numerical simulation of incompressible turbulent flows using a Fourier spectral/hp element method and MPI virtual topologies

NASA Astrophysics Data System (ADS)

Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.

2016-09-01

A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
[Adaptability of APSIM model in Southwestern China: A case study of winter wheat in Chongqing City].

PubMed

Dai, Tong; Wang, Jing; He, Di; Zhang, Jian-ping; Wang, Na

2015-04-01

Field experimental data of winter wheat and parallel daily meteorological data at four typical stations in Chongqing City were used to calibrate and validate APSIM-wheat model and determine the genetic parameters for 12 varieties of winter wheat. The results showed that there was a good agreement between the simulated and observed growth periods from sowing to emergence, flowering and maturity of wheat. Root mean squared errors (RMSEs) between simulated and observed emergence, flowering and maturity were 0-3, 1-8, and 0-8 d, respectively. Normalized root mean squared errors (NRMSEs) between simulated and observed above-ground biomass for 12 study varieties were less than 30%. NRMSE between simulated and observed yields for 10 varieties out of 12 study varieties were less than 30%. APSIM-wheat model performed well in simulating phenology, aboveground biomass and yield of winter wheat in Chongqing City, which could provide a foundational support for assessing the impact of climate change on wheat production in the study area based on the model.
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.

1999-01-01

Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.

What is adaptive about adaptive decision making? A parallel constraint satisfaction account.

PubMed

Glöckner, Andreas; Hilbig, Benjamin E; Jekel, Marc

2014-12-01

There is broad consensus that human cognition is adaptive. However, the vital question of how exactly this adaptivity is achieved has remained largely open. Herein, we contrast two frameworks which account for adaptive decision making, namely broad and general single-mechanism accounts vs. multi-strategy accounts. We propose and fully specify a single-mechanism model for decision making based on parallel constraint satisfaction processes (PCS-DM) and contrast it theoretically and empirically against a multi-strategy account. To achieve sufficiently sensitive tests, we rely on a multiple-measure methodology including choice, reaction time, and confidence data as well as eye-tracking. Results show that manipulating the environmental structure produces clear adaptive shifts in choice patterns - as both frameworks would predict. However, results on the process level (reaction time, confidence), in information acquisition (eye-tracking), and from cross-predicting choice consistently corroborate single-mechanisms accounts in general, and the proposed parallel constraint satisfaction model for decision making in particular. Copyright © 2014 Elsevier B.V. All rights reserved.
Simulation Exploration through Immersive Parallel Planes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas J; Bush, Brian W; Gruchalla, Kenny M

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
Simulation Exploration through Immersive Parallel Planes: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
A path-level exact parallelization strategy for sequential simulation

NASA Astrophysics Data System (ADS)

Peredo, Oscar F.; Baeza, Daniel; Ortiz, Julián M.; Herrero, José R.

2018-01-01

Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.
Regional-scale yield simulations using crop and climate models: assessing uncertainties, sensitivity to temperature and adaptation options

NASA Astrophysics Data System (ADS)

Challinor, A. J.

2010-12-01

Recent progress in assessing the impacts of climate variability and change on crops using multiple regional-scale simulations of crop and climate (i.e. ensembles) is presented. Simulations for India and China used perturbed responses to elevated carbon dioxide constrained using observations from FACE studies and controlled environments. Simulations with crop parameter sets representing existing and potential future adapted varieties were also carried out. The results for India are compared to sensitivity tests on two other crop models. For China, a parallel approach used socio-economic data to account for autonomous farmer adaptation. Results for the USA analysed cardinal temperatures under a range of local warming scenarios for 2711 varieties of spring wheat. The results are as follows: 1. Quantifying and reducing uncertainty. The relative contribution of uncertainty in crop and climate simulation to the total uncertainty in projected yield changes is examined. The observational constraints from FACE and controlled environment studies are shown to be the likely critical factor in maintaining relatively low crop parameter uncertainty. Without these constraints, crop simulation uncertainty in a doubled CO2 environment would likely be greater than uncertainty in simulating climate. However, consensus across crop models in India varied across different biophysical processes. 2. The response of yield to changes in local mean temperature was examined and compared to that found in the literature. No consistent response to temperature change was found across studies. 3. Implications for adaptation. China. The simulations of spring wheat in China show the relative importance of tolerance to water and heat stress in avoiding future crop failures. The greatest potential for reducing the number of harvests less than one standard deviation below the baseline mean yield value comes from alleviating water stress; the greatest potential for reducing harvests less than two standard deviations below the mean comes from alleviation of heat stress. The socio-economic analysis suggests that adaptation is also possible through measures such as greater investment. India. The simulations of groundnut in India identified regions where heat stress will play an increasing role in limiting crop yields, and other regions where crops with greater thermal time requirement will be needed. The simulations were used, together with an observed dataset and a simple analysis of crop cardinal temperatures and thermal time, to estimate the potential for adaptation using existing cultivars. USA. Analysis of spring wheat in the USA showed that at +2oC of local warming, 87% of the 2711 varieties examined, and all of the five most common varieties, could be used to maintain the crop duration of the current climate (i.e. successful adaptation to mean warming). At +4o this fell to 54% of all varieties, and two of the top five. 4. Future research. The results, and the limitations of the study, suggest directions for research to link climate and crop models, socio-economic analyses and crop variety trial data in order to prioritise adaptation options such as capacity building, plant breeding and biotechnology.
Final Technical Report for "Applied Mathematics Research: Simulation Based Optimization and Application to Electromagnetic Inverse Problems"

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haber, Eldad

2014-03-17

The focus of research was: Developing adaptive mesh for the solution of Maxwell's equations; Developing a parallel framework for time dependent inverse Maxwell's equations; Developing multilevel methods for optimization problems with inequality constraints; A new inversion code for inverse Maxwell's equations in the 0th frequency (DC resistivity); A new inversion code for inverse Maxwell's equations in low frequency regime. Although the research concentrated on electromagnetic forward and in- verse problems the results of the research was applied to the problem of image registration.
Numerical Analysis of Ginzburg-Landau Models for Superconductivity.

NASA Astrophysics Data System (ADS)

Coskun, Erhan

Thin film conventional, as well as High T _{c} superconductors of various geometric shapes placed under both uniform and variable strength magnetic field are studied using the universially accepted macroscopic Ginzburg-Landau model. A series of new theoretical results concerning the properties of solution is presented using the semi -discrete time-dependent Ginzburg-Landau equations, staggered grid setup and natural boundary conditions. Efficient serial algorithms including a novel adaptive algorithm is developed and successfully implemented for solving the governing highly nonlinear parabolic system of equations. Refinement technique used in the adaptive algorithm is based on modified forward Euler method which was also developed by us to ease the restriction on time step size for stability considerations. Stability and convergence properties of forward and modified forward Euler schemes are studied. Numerical simulations of various recent physical experiments of technological importance such as vortes motion and pinning are performed. The numerical code for solving time-dependent Ginzburg-Landau equations is parallelized using BlockComm -Chameleon and PCN. The parallel code was run on the distributed memory multiprocessors intel iPSC/860, IBM-SP1 and cluster of Sun Sparc workstations, all located at Mathematics and Computer Science Division, Argonne National Laboratory.
Computation of free energy profiles with parallel adaptive dynamics

NASA Astrophysics Data System (ADS)

Lelièvre, Tony; Rousset, Mathias; Stoltz, Gabriel

2007-04-01

We propose a formulation of an adaptive computation of free energy differences, in the adaptive biasing force or nonequilibrium metadynamics spirit, using conditional distributions of samples of configurations which evolve in time. This allows us to present a truly unifying framework for these methods, and to prove convergence results for certain classes of algorithms. From a numerical viewpoint, a parallel implementation of these methods is very natural, the replicas interacting through the reconstructed free energy. We demonstrate how to improve this parallel implementation by resorting to some selection mechanism on the replicas. This is illustrated by computations on a model system of conformational changes.
Dual-thread parallel control strategy for ophthalmic adaptive optics.

PubMed

Yu, Yongxin; Zhang, Yuhua

To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope.
Dual-thread parallel control strategy for ophthalmic adaptive optics

PubMed Central

Yu, Yongxin; Zhang, Yuhua

2015-01-01

To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope. PMID:25866498
Predator-induced phenotypic plasticity of shape and behavior: parallel and unique patterns across sexes and species

PubMed Central

Kinnison, Michael T.

2017-01-01

Abstract Phenotypic plasticity is often an adaptation of organisms to cope with temporally or spatially heterogenous landscapes. Like other adaptations, one would predict that different species, populations, or sexes might thus show some degree of parallel evolution of plasticity, in the form of parallel reaction norms, when exposed to analogous environmental gradients. Indeed, one might even expect parallelism of plasticity to repeatedly evolve in multiple traits responding to the same gradient, resulting in integrated parallelism of plasticity. In this study, we experimentally tested for parallel patterns of predator-mediated plasticity of size, shape, and behavior of 2 species and sexes of mosquitofish. Examination of behavioral trials indicated that the 2 species showed unique patterns of behavioral plasticity, whereas the 2 sexes in each species showed parallel responses. Fish shape showed parallel patterns of plasticity for both sexes and species, albeit males showed evidence of unique plasticity related to reproductive anatomy. Moreover, patterns of shape plasticity due to predator exposure were broadly parallel to what has been depicted for predator-mediated population divergence in other studies (slender bodies, expanded caudal regions, ventrally located eyes, and reduced male gonopodia). We did not find evidence of phenotypic plasticity in fish size for either species or sex. Hence, our findings support broadly integrated parallelism of plasticity for sexes within species and less integrated parallelism for species. We interpret these findings with respect to their potential broader implications for the interacting roles of adaptation and constraint in the evolutionary origins of parallelism of plasticity in general. PMID:29491997
Trinity Phase 2 Open Science: CTH

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruggirello, Kevin Patrick; Vogler, Tracy

CTH is an Eulerian hydrocode developed by Sandia National Laboratories (SNL) to solve a wide range of shock wave propagation and material deformation problems. Adaptive mesh refinement is also used to improve efficiency for problems with a wide range of spatial scales. The code has a history of running on a variety of computing platforms ranging from desktops to massively parallel distributed-data systems. For the Trinity Phase 2 Open Science campaign, CTH was used to study mesoscale simulations of the hypervelocity penetration of granular SiC powders. The simulations were compared to experimental data. A scaling study of CTH up tomore » 8192 KNL nodes was also performed, and several improvements were made to the code to improve the scalability.« less
Anticipation from sensation: using anticipating synchronization to stabilize a system with inherent sensory delay.

PubMed

Eberle, Henry; Nasuto, Slawomir J; Hayashi, Yoshikatsu

2018-03-01

We present a novel way of using a dynamical model for predictive tracking control that can adapt to a wide range of delays without parameter update. This is achieved by incorporating the paradigm of anticipating synchronization (AS), where a 'slave' system predicts a 'master' via delayed self-feedback. By treating the delayed output of the plant as one half of a 'sensory' AS coupling, the plant and an internal dynamical model can be synchronized such that the plant consistently leads the target's motion. We use two simulated robotic systems with differing arrangements of the plant and internal model ('parallel' and 'serial') to demonstrate that this form of control adapts to a wide range of delays without requiring the parameters of the controller to be changed.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Freud: a software suite for high-throughput simulation analysis

NASA Astrophysics Data System (ADS)

Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon

Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.
Tile-based Level of Detail for the Parallel Age

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niski, K; Cohen, J D

Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

PubMed Central

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers.

PubMed

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System

NASA Astrophysics Data System (ADS)

Wang, R.; Harris, C.; Wicenec, A.

2016-07-01

In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
ColDICE: A parallel Vlasov–Poisson solver using moving adaptive simplicial tessellation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sousbie, Thierry, E-mail: tsousbie@gmail.com; Department of Physics, The University of Tokyo, Tokyo 113-0033; Research Center for the Early Universe, School of Science, The University of Tokyo, Tokyo 113-0033

2016-09-15

Resolving numerically Vlasov–Poisson equations for initially cold systems can be reduced to following the evolution of a three-dimensional sheet evolving in six-dimensional phase-space. We describe a public parallel numerical algorithm consisting in representing the phase-space sheet with a conforming, self-adaptive simplicial tessellation of which the vertices follow the Lagrangian equations of motion. The algorithm is implemented both in six- and four-dimensional phase-space. Refinement of the tessellation mesh is performed using the bisection method and a local representation of the phase-space sheet at second order relying on additional tracers created when needed at runtime. In order to preserve in the bestmore » way the Hamiltonian nature of the system, refinement is anisotropic and constrained by measurements of local Poincaré invariants. Resolution of Poisson equation is performed using the fast Fourier method on a regular rectangular grid, similarly to particle in cells codes. To compute the density projected onto this grid, the intersection of the tessellation and the grid is calculated using the method of Franklin and Kankanhalli [65–67] generalised to linear order. As preliminary tests of the code, we study in four dimensional phase-space the evolution of an initially small patch in a chaotic potential and the cosmological collapse of a fluctuation composed of two sinusoidal waves. We also perform a “warm” dark matter simulation in six-dimensional phase-space that we use to check the parallel scaling of the code.« less

X-ray luminescence computed tomography imaging based on X-ray distribution model and adaptively split Bregman method

PubMed Central

Chen, Dongmei; Zhu, Shouping; Cao, Xu; Zhao, Fengjun; Liang, Jimin

2015-01-01

X-ray luminescence computed tomography (XLCT) has become a promising imaging technology for biological application based on phosphor nanoparticles. There are mainly three kinds of XLCT imaging systems: pencil beam XLCT, narrow beam XLCT and cone beam XLCT. Narrow beam XLCT can be regarded as a balance between the pencil beam mode and the cone-beam mode in terms of imaging efficiency and image quality. The collimated X-ray beams are assumed to be parallel ones in the traditional narrow beam XLCT. However, we observe that the cone beam X-rays are collimated into X-ray beams with fan-shaped broadening instead of parallel ones in our prototype narrow beam XLCT. Hence we incorporate the distribution of the X-ray beams in the physical model and collected the optical data from only two perpendicular directions to further speed up the scanning time. Meanwhile we propose a depth related adaptive regularized split Bregman (DARSB) method in reconstruction. The simulation experiments show that the proposed physical model and method can achieve better results in the location error, dice coefficient, mean square error and the intensity error than the traditional split Bregman method and validate the feasibility of method. The phantom experiment can obtain the location error less than 1.1 mm and validate that the incorporation of fan-shaped X-ray beams in our model can achieve better results than the parallel X-rays. PMID:26203388
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Adaptive graph-based multiple testing procedures

PubMed Central

Klinglmueller, Florian; Posch, Martin; Koenig, Franz

2016-01-01

Multiple testing procedures defined by directed, weighted graphs have recently been proposed as an intuitive visual tool for constructing multiple testing strategies that reflect the often complex contextual relations between hypotheses in clinical trials. Many well-known sequentially rejective tests, such as (parallel) gatekeeping tests or hierarchical testing procedures are special cases of the graph based tests. We generalize these graph-based multiple testing procedures to adaptive trial designs with an interim analysis. These designs permit mid-trial design modifications based on unblinded interim data as well as external information, while providing strong family wise error rate control. To maintain the familywise error rate, it is not required to prespecify the adaption rule in detail. Because the adaptive test does not require knowledge of the multivariate distribution of test statistics, it is applicable in a wide range of scenarios including trials with multiple treatment comparisons, endpoints or subgroups, or combinations thereof. Examples of adaptations are dropping of treatment arms, selection of subpopulations, and sample size reassessment. If, in the interim analysis, it is decided to continue the trial as planned, the adaptive test reduces to the originally planned multiple testing procedure. Only if adaptations are actually implemented, an adjusted test needs to be applied. The procedure is illustrated with a case study and its operating characteristics are investigated by simulations. PMID:25319733
Parallel tempering for the traveling salesman problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Percus, Allon; Wang, Richard; Hyman, Jeffrey

We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less
GPU accelerated population annealing algorithm

NASA Astrophysics Data System (ADS)

Barash, Lev Yu.; Weigel, Martin; Borovský, Michal; Janke, Wolfhard; Shchur, Lev N.

2017-11-01

Population annealing is a promising recent approach for Monte Carlo simulations in statistical physics, in particular for the simulation of systems with complex free-energy landscapes. It is a hybrid method, combining importance sampling through Markov chains with elements of sequential Monte Carlo in the form of population control. While it appears to provide algorithmic capabilities for the simulation of such systems that are roughly comparable to those of more established approaches such as parallel tempering, it is intrinsically much more suitable for massively parallel computing. Here, we tap into this structural advantage and present a highly optimized implementation of the population annealing algorithm on GPUs that promises speed-ups of several orders of magnitude as compared to a serial implementation on CPUs. While the sample code is for simulations of the 2D ferromagnetic Ising model, it should be easily adapted for simulations of other spin models, including disordered systems. Our code includes implementations of some advanced algorithmic features that have only recently been suggested, namely the automatic adaptation of temperature steps and a multi-histogram analysis of the data at different temperatures. Program Files doi:http://dx.doi.org/10.17632/sgzt4b7b3m.1 Licensing provisions: Creative Commons Attribution license (CC BY 4.0) Programming language: C, CUDA External routines/libraries: NVIDIA CUDA Toolkit 6.5 or newer Nature of problem: The program calculates the internal energy, specific heat, several magnetization moments, entropy and free energy of the 2D Ising model on square lattices of edge length L with periodic boundary conditions as a function of inverse temperature β. Solution method: The code uses population annealing, a hybrid method combining Markov chain updates with population control. The code is implemented for NVIDIA GPUs using the CUDA language and employs advanced techniques such as multi-spin coding, adaptive temperature steps and multi-histogram reweighting. Additional comments: Code repository at https://github.com/LevBarash/PAising. The system size and size of the population of replicas are limited depending on the memory of the GPU device used. For the default parameter values used in the sample programs, L = 64, θ = 100, β0 = 0, βf = 1, Δβ = 0 . 005, R = 20 000, a typical run time on an NVIDIA Tesla K80 GPU is 151 seconds for the single spin coded (SSC) and 17 seconds for the multi-spin coded (MSC) program (see Section 2 for a description of these parameters).
The Importance of Considering Differences in Study Design in Network Meta-analysis: An Application Using Anti-Tumor Necrosis Factor Drugs for Ulcerative Colitis.

PubMed

Cameron, Chris; Ewara, Emmanuel; Wilson, Florence R; Varu, Abhishek; Dyrda, Peter; Hutton, Brian; Ingham, Michael

2017-11-01

Adaptive trial designs present a methodological challenge when performing network meta-analysis (NMA), as data from such adaptive trial designs differ from conventional parallel design randomized controlled trials (RCTs). We aim to illustrate the importance of considering study design when conducting an NMA. Three NMAs comparing anti-tumor necrosis factor drugs for ulcerative colitis were compared and the analyses replicated using Bayesian NMA. The NMA comprised 3 RCTs comparing 4 treatments (adalimumab 40 mg, golimumab 50 mg, golimumab 100 mg, infliximab 5 mg/kg) and placebo. We investigated the impact of incorporating differences in the study design among the 3 RCTs and presented 3 alternative methods on how to convert outcome data derived from one form of adaptive design to more conventional parallel RCTs. Combining RCT results without considering variations in study design resulted in effect estimates that were biased against golimumab. In contrast, using the 3 alternative methods to convert outcome data from one form of adaptive design to a format more consistent with conventional parallel RCTs facilitated more transparent consideration of differences in study design. This approach is more likely to yield appropriate estimates of comparative efficacy when conducting an NMA, which includes treatments that use an alternative study design. RCTs based on adaptive study designs should not be combined with traditional parallel RCT designs in NMA. We have presented potential approaches to convert data from one form of adaptive design to more conventional parallel RCTs to facilitate transparent and less-biased comparisons.
''A Parallel Adaptive Simulation Tool for Two Phase Steady State Reacting Flows in Industrial Boilers and Furnaces''

DOE Office of Scientific and Technical Information (OSTI.GOV)

Michael J. Bockelie

2002-01-04

This DOE SBIR Phase II final report summarizes research that has been performed to develop a parallel adaptive tool for modeling steady, two phase turbulent reacting flow. The target applications for the new tool are full scale, fossil-fuel fired boilers and furnaces such as those used in the electric utility industry, chemical process industry and mineral/metal process industry. The type of analyses to be performed on these systems are engineering calculations to evaluate the impact on overall furnace performance due to operational, process or equipment changes. To develop a Computational Fluid Dynamics (CFD) model of an industrial scale furnace requiresmore » a carefully designed grid that will capture all of the large and small scale features of the flowfield. Industrial systems are quite large, usually measured in tens of feet, but contain numerous burners, air injection ports, flames and localized behavior with dimensions that are measured in inches or fractions of inches. To create an accurate computational model of such systems requires capturing length scales within the flow field that span several orders of magnitude. In addition, to create an industrially useful model, the grid can not contain too many grid points - the model must be able to execute on an inexpensive desktop PC in a matter of days. An adaptive mesh provides a convenient means to create a grid that can capture both fine flow field detail within a very large domain with a ''reasonable'' number of grid points. However, the use of an adaptive mesh requires the development of a new flow solver. To create the new simulation tool, we have combined existing reacting CFD modeling software with new software based on emerging block structured Adaptive Mesh Refinement (AMR) technologies developed at Lawrence Berkeley National Laboratory (LBNL). Specifically, we combined: -physical models, modeling expertise, and software from existing combustion simulation codes used by Reaction Engineering International; -mesh adaption, data management, and parallelization software and technology being developed by users of the BoxLib library at LBNL; and -solution methods for problems formulated on block structured grids that were being developed in collaboration with technical staff members at the University of Utah Center for High Performance Computing (CHPC) and at LBNL. The combustion modeling software used by Reaction Engineering International represents an investment of over fifty man-years of development, conducted over a period of twenty years. Thus, it was impractical to achieve our objective by starting from scratch. The research program resulted in an adaptive grid, reacting CFD flow solver that can be used only on limited problems. In current form the code is appropriate for use on academic problems with simplified geometries. The new solver is not sufficiently robust or sufficiently general to be used in a ''production mode'' for industrial applications. The principle difficulty lies with the multi-level solver technology. The use of multi-level solvers on adaptive grids with embedded boundaries is not yet a mature field and there are many issues that remain to be resolved. From the lessons learned in this SBIR program, we have started work on a new flow solver with an AMR capability. The new code is based on a conventional cell-by-cell mesh refinement strategy used in unstructured grid solvers that employ hexahedral cells. The new solver employs several of the concepts and solution strategies developed within this research program. The formulation of the composite grid problem for the new solver has been designed to avoid the embedded boundary complications encountered in this SBIR project. This follow-on effort will result in a reacting flow CFD solver with localized mesh capability that can be used to perform engineering calculations on industrial problems in a production mode.« less
The cerebellum in action: a simulation and robotics study.

PubMed

Hofstötter, Constanze; Mintz, Matti; Verschure, Paul F M J

2002-10-01

The control or prediction of the precise timing of events are central aspects of the many tasks assigned to the cerebellum. Despite much detailed knowledge of its physiology and anatomy, it remains unclear how the cerebellar circuitry can achieve such an adaptive timing function. We present a computational model pursuing this question for one extensively studied type of cerebellar-mediated learning: the classical conditioning of discrete motor responses. This model combines multiple current assumptions on the function of the cerebellar circuitry and was used to investigate whether plasticity in the cerebellar cortex alone can mediate adaptive conditioned response timing. In particular, we studied the effect of changes in the strength of the synapses formed between parallel fibres and Purkinje cells under the control of a negative feedback loop formed between inferior olive, cerebellar cortex and cerebellar deep nuclei. The learning performance of the model was evaluated at the circuit level in simulated conditioning experiments as well as at the behavioural level using a mobile robot. We demonstrate that the model supports adaptively timed responses under real-world conditions. Thus, in contrast to many other models that have focused on cerebellar-mediated conditioning, we investigated whether and how the suggested underlying mechanisms could give rise to behavioural phenomena.
A new parallelization scheme for adaptive mesh refinement

DOE PAGES

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.; ...

2016-05-06

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
A new parallelization scheme for adaptive mesh refinement

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loffler, Frank; Cao, Zhoujian; Brandt, Steven R.

Here, we present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered inmore » this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions.« less
Advances in Patch-Based Adaptive Mesh Refinement Scalability

DOE PAGES

Gunney, Brian T.N.; Anderson, Robert W.

2015-12-18

Patch-based structured adaptive mesh refinement (SAMR) is widely used for high-resolution simu- lations. Combined with modern supercomputers, it could provide simulations of unprecedented size and resolution. A persistent challenge for this com- bination has been managing dynamically adaptive meshes on more and more MPI tasks. The dis- tributed mesh management scheme in SAMRAI has made some progress SAMR scalability, but early al- gorithms still had trouble scaling past the regime of 105 MPI tasks. This work provides two critical SAMR regridding algorithms, which are integrated into that scheme to ensure efficiency of the whole. The clustering algorithm is an extensionmore » of the tile- clustering approach, making it more flexible and efficient in both clustering and parallelism. The partitioner is a new algorithm designed to prevent the network congestion experienced by its prede- cessor. We evaluated performance using weak- and strong-scaling benchmarks designed to be difficult for dynamic adaptivity. Results show good scaling on up to 1.5M cores and 2M MPI tasks. Detailed timing diagnostics suggest scaling would continue well past that.« less
Advances in Patch-Based Adaptive Mesh Refinement Scalability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gunney, Brian T.N.; Anderson, Robert W.

Patch-based structured adaptive mesh refinement (SAMR) is widely used for high-resolution simu- lations. Combined with modern supercomputers, it could provide simulations of unprecedented size and resolution. A persistent challenge for this com- bination has been managing dynamically adaptive meshes on more and more MPI tasks. The dis- tributed mesh management scheme in SAMRAI has made some progress SAMR scalability, but early al- gorithms still had trouble scaling past the regime of 105 MPI tasks. This work provides two critical SAMR regridding algorithms, which are integrated into that scheme to ensure efficiency of the whole. The clustering algorithm is an extensionmore » of the tile- clustering approach, making it more flexible and efficient in both clustering and parallelism. The partitioner is a new algorithm designed to prevent the network congestion experienced by its prede- cessor. We evaluated performance using weak- and strong-scaling benchmarks designed to be difficult for dynamic adaptivity. Results show good scaling on up to 1.5M cores and 2M MPI tasks. Detailed timing diagnostics suggest scaling would continue well past that.« less
MPI-AMRVAC 2.0 for Solar and Astrophysical Applications

NASA Astrophysics Data System (ADS)

Xia, C.; Teunissen, J.; El Mellah, I.; Chané, E.; Keppens, R.

2018-02-01

We report on the development of MPI-AMRVAC version 2.0, which is an open-source framework for parallel, grid-adaptive simulations of hydrodynamic and magnetohydrodynamic (MHD) astrophysical applications. The framework now supports radial grid stretching in combination with adaptive mesh refinement (AMR). The advantages of this combined approach are demonstrated with one-dimensional, two-dimensional, and three-dimensional examples of spherically symmetric Bondi accretion, steady planar Bondi–Hoyle–Lyttleton flows, and wind accretion in supergiant X-ray binaries. Another improvement is support for the generic splitting of any background magnetic field. We present several tests relevant for solar physics applications to demonstrate the advantages of field splitting on accuracy and robustness in extremely low-plasma β environments: a static magnetic flux rope, a magnetic null-point, and magnetic reconnection in a current sheet with either uniform or anomalous resistivity. Our implementation for treating anisotropic thermal conduction in multi-dimensional MHD applications is also described, which generalizes the original slope-limited symmetric scheme from two to three dimensions. We perform ring diffusion tests that demonstrate its accuracy and robustness, and show that it prevents the unphysical thermal flux present in traditional schemes. The improved parallel scaling of the code is demonstrated with three-dimensional AMR simulations of solar coronal rain, which show satisfactory strong scaling up to 2000 cores. Other framework improvements are also reported: the modernization and reorganization into a library, the handling of automatic regression tests, the use of inline/online Doxygen documentation, and a new future-proof data format for input/output.
Development of gradient descent adaptive algorithms to remove common mode artifact for improvement of cardiovascular signal quality.

PubMed

Ciaccio, Edward J; Micheli-Tzanakou, Evangelia

2007-07-01

Common-mode noise degrades cardiovascular signal quality and diminishes measurement accuracy. Filtering to remove noise components in the frequency domain often distorts the signal. Two adaptive noise canceling (ANC) algorithms were tested to adjust weighted reference signals for optimal subtraction from a primary signal. Update of weight w was based upon the gradient term of the steepest descent equation: [see text], where the error epsilon is the difference between primary and weighted reference signals. nabla was estimated from Deltaepsilon(2) and Deltaw without using a variable Deltaw in the denominator which can cause instability. The Parallel Comparison (PC) algorithm computed Deltaepsilon(2) using fixed finite differences +/- Deltaw in parallel during each discrete time k. The ALOPEX algorithm computed Deltaepsilon(2)x Deltaw from time k to k + 1 to estimate nabla, with a random number added to account for Deltaepsilon(2) . Deltaw--> 0 near the optimal weighting. Using simulated data, both algorithms stably converged to the optimal weighting within 50-2000 discrete sample points k even with a SNR = 1:8 and weights which were initialized far from the optimal. Using a sharply pulsatile cardiac electrogram signal with added noise so that the SNR = 1:5, both algorithms exhibited stable convergence within 100 ms (100 sample points). Fourier spectral analysis revealed minimal distortion when comparing the signal without added noise to the ANC restored signal. ANC algorithms based upon difference calculations can rapidly and stably converge to the optimal weighting in simulated and real cardiovascular data. Signal quality is restored with minimal distortion, increasing the accuracy of biophysical measurement.
Biologically driven neural platform invoking parallel electrophoretic separation and urinary metabolite screening.

PubMed

Page, Tessa; Nguyen, Huong Thi Huynh; Hilts, Lindsey; Ramos, Lorena; Hanrahan, Grady

2012-06-01

This work reveals a computational framework for parallel electrophoretic separation of complex biological macromolecules and model urinary metabolites. More specifically, the implementation of a particle swarm optimization (PSO) algorithm on a neural network platform for multiparameter optimization of multiplexed 24-capillary electrophoresis technology with UV detection is highlighted. Two experimental systems were examined: (1) separation of purified rabbit metallothioneins and (2) separation of model toluene urinary metabolites and selected organic acids. Results proved superior to the use of neural networks employing standard back propagation when examining training error, fitting response, and predictive abilities. Simulation runs were obtained as a result of metaheuristic examination of the global search space with experimental responses in good agreement with predicted values. Full separation of selected analytes was realized after employing optimal model conditions. This framework provides guidance for the application of metaheuristic computational tools to aid in future studies involving parallel chemical separation and screening. Adaptable pseudo-code is provided to enable users of varied software packages and modeling framework to implement the PSO algorithm for their desired use.
Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

NASA Astrophysics Data System (ADS)

Xing, F.; Masson, R.; Lopez, S.

2017-09-01

This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.
Visual information processing II; Proceedings of the Meeting, Orlando, FL, Apr. 14-16, 1993

NASA Technical Reports Server (NTRS)

Huck, Friedrich O. (Editor); Juday, Richard D. (Editor)

1993-01-01

Various papers on visual information processing are presented. Individual topics addressed include: aliasing as noise, satellite image processing using a hammering neural network, edge-detetion method using visual perception, adaptive vector median filters, design of a reading test for low-vision image warping, spatial transformation architectures, automatic image-enhancement method, redundancy reduction in image coding, lossless gray-scale image compression by predictive GDF, information efficiency in visual communication, optimizing JPEG quantization matrices for different applications, use of forward error correction to maintain image fidelity, effect of peanoscanning on image compression. Also discussed are: computer vision for autonomous robotics in space, optical processor for zero-crossing edge detection, fractal-based image edge detection, simulation of the neon spreading effect by bandpass filtering, wavelet transform (WT) on parallel SIMD architectures, nonseparable 2D wavelet image representation, adaptive image halftoning based on WT, wavelet analysis of global warming, use of the WT for signal detection, perfect reconstruction two-channel rational filter banks, N-wavelet coding for pattern classification, simulation of image of natural objects, number-theoretic coding for iconic systems.
Improved artificial bee colony algorithm for wavefront sensor-less system in free space optical communication

NASA Astrophysics Data System (ADS)

Niu, Chaojun; Han, Xiang'e.

2015-10-01

Adaptive optics (AO) technology is an effective way to alleviate the effect of turbulence on free space optical communication (FSO). A new adaptive compensation method can be used without a wave-front sensor. Artificial bee colony algorithm (ABC) is a population-based heuristic evolutionary algorithm inspired by the intelligent foraging behaviour of the honeybee swarm with the advantage of simple, good convergence rate, robust and less parameter setting. In this paper, we simulate the application of the improved ABC to correct the distorted wavefront and proved its effectiveness. Then we simulate the application of ABC algorithm, differential evolution (DE) algorithm and stochastic parallel gradient descent (SPGD) algorithm to the FSO system and analyze the wavefront correction capabilities by comparison of the coupling efficiency, the error rate and the intensity fluctuation in different turbulence before and after the correction. The results show that the ABC algorithm has much faster correction speed than DE algorithm and better correct ability for strong turbulence than SPGD algorithm. Intensity fluctuation can be effectively reduced in strong turbulence, but not so effective in week turbulence.
Cart3D Simulations for the Second AIAA Sonic Boom Prediction Workshop

NASA Technical Reports Server (NTRS)

Anderson, George R.; Aftosmis, Michael J.; Nemec, Marian

2017-01-01

Simulation results are presented for all test cases prescribed in the Second AIAA Sonic Boom Prediction Workshop. For each of the four nearfield test cases, we compute pressure signatures at specified distances and off-track angles, using an inviscid, embedded-boundary Cartesian-mesh flow solver with output-based mesh adaptation. The cases range in complexity from an axisymmetric body to a full low-boom aircraft configuration with a powered nacelle. For efficiency, boom carpets are decomposed into sets of independent meshes and computed in parallel. This also facilitates the use of more effective meshing strategies - each off-track angle is computed on a mesh with good azimuthal alignment, higher aspect ratio cells, and more tailored adaptation. The nearfield signatures generally exhibit good convergence with mesh refinement. We introduce a local error estimation procedure to highlight regions of the signatures most sensitive to mesh refinement. Results are also presented for the two propagation test cases, which investigate the effects of atmospheric profiles on ground noise. Propagation is handled with an augmented Burgers' equation method (NASA's sBOOM), and ground noise metrics are computed with LCASB.
Enhanced sampling simulations to construct free-energy landscape of protein-partner substrate interaction.

PubMed

Ikebe, Jinzen; Umezawa, Koji; Higo, Junichi

2016-03-01

Molecular dynamics (MD) simulations using all-atom and explicit solvent models provide valuable information on the detailed behavior of protein-partner substrate binding at the atomic level. As the power of computational resources increase, MD simulations are being used more widely and easily. However, it is still difficult to investigate the thermodynamic properties of protein-partner substrate binding and protein folding with conventional MD simulations. Enhanced sampling methods have been developed to sample conformations that reflect equilibrium conditions in a more efficient manner than conventional MD simulations, thereby allowing the construction of accurate free-energy landscapes. In this review, we discuss these enhanced sampling methods using a series of case-by-case examples. In particular, we review enhanced sampling methods conforming to trivial trajectory parallelization, virtual-system coupled multicanonical MD, and adaptive lambda square dynamics. These methods have been recently developed based on the existing method of multicanonical MD simulation. Their applications are reviewed with an emphasis on describing their practical implementation. In our concluding remarks we explore extensions of the enhanced sampling methods that may allow for even more efficient sampling.

IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Repercussion of geometric and dynamic constraints on the 3D rendering quality in structurally adaptive multi-view shooting systems

NASA Astrophysics Data System (ADS)

Ali-Bey, Mohamed; Moughamir, Saïd; Manamanni, Noureddine

2011-12-01

in this paper a simulator of a multi-view shooting system with parallel optical axes and structurally variable configuration is proposed. The considered system is dedicated to the production of 3D contents for auto-stereoscopic visualization. The global shooting/viewing geometrical process, which is the kernel of this shooting system, is detailed and the different viewing, transformation and capture parameters are then defined. An appropriate perspective projection model is afterward derived to work out a simulator. At first, this latter is used to validate the global geometrical process in the case of a static configuration. Next, the simulator is used to show the limitations of a static configuration of this shooting system type by considering the case of dynamic scenes and then a dynamic scheme is achieved to allow a correct capture of this kind of scenes. After that, the effect of the different geometrical capture parameters on the 3D rendering quality and the necessity or not of their adaptation is studied. Finally, some dynamic effects and their repercussions on the 3D rendering quality of dynamic scenes are analyzed using error images and some image quantization tools. Simulation and experimental results are presented throughout this paper to illustrate the different studied points. Some conclusions and perspectives end the paper. [Figure not available: see fulltext.
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arumugam, Kamesh

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
Comparison of Procedures for Dual and Triple Closely Spaced Parallel Runways

NASA Technical Reports Server (NTRS)

Verma, Savita; Ballinger, Deborah; Subramanian Shobana; Kozon, Thomas

2012-01-01

A human-in-the-loop high fidelity flight simulation experiment was conducted, which investigated and compared breakout procedures for Very Closely Spaced Parallel Approaches (VCSPA) with two and three runways. To understand the feasibility, usability and human factors of two and three runway VCSPA, data were collected and analyzed on the dependent variables of breakout cross track error and pilot workload. Independent variables included number of runways, cause of breakout and location of breakout. Results indicated larger cross track error and higher workload using three runways as compared to 2-runway operations. Significant interaction effects involving breakout cause and breakout location were also observed. Across all conditions, cross track error values showed high levels of breakout trajectory accuracy and pilot workload remained manageable. Results suggest possible avenues of future adaptation for adopting these procedures (e.g., pilot training), while also showing potential promise of the concept.
On the suitability of the connection machine for direct particle simulation

NASA Technical Reports Server (NTRS)

Dagum, Leonard

1990-01-01

The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Parallel multiscale simulations of a brain aneurysm

PubMed Central

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2012-01-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066
Parallel multiscale simulations of a brain aneurysm.

PubMed

Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Parallel multiscale simulations of a brain aneurysm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em, E-mail: george_karniadakis@brown.edu

2013-07-01

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm.more » The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.« less
Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

NASA Astrophysics Data System (ADS)

Yan, Hui; Wang, K. G.; Jones, Jim E.

2016-06-01

A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Candidate genes and adaptive radiation: insights from transcriptional adaptation to the limnetic niche among coregonine fishes (Coregonus spp., Salmonidae).

PubMed

Jeukens, Julie; Bittner, David; Knudsen, Rune; Bernatchez, Louis

2009-01-01

In the past 40 years, there has been increasing acceptance that variation in levels of gene expression represents a major source of evolutionary novelty. Gene expression divergence is therefore likely to be involved in the emergence of incipient species, namely, in a context of adaptive radiation. In the lake whitefish species complex (Coregonus clupeaformis), previous microarray experiments have led to the identification of candidate genes potentially implicated in the parallel evolution of the limnetic dwarf lake whitefish, which is highly distinct from the benthic normal lake whitefish in life history, morphology, metabolism, and behavior, and yet diverged from it only approximately 15,000 years before present. The aim of the present study was to address transcriptional divergence for six candidate genes among lake whitefish and European whitefish (Coregonus lavaretus) species pairs, as well as lake cisco (Coregonus artedi) and vendace (Coregonus albula). The main goal was to test the hypothesis that parallel phenotypic adaptation toward the use of the limnetic niche in coregonine fishes is accompanied by parallelism in candidate gene transcription as measured by quantitative real-time polymerase chain reaction. Results obtained for three candidate genes, whereby parallelism in expression was observed across all whitefish species pairs, provide strong support for the hypothesis that divergent natural selection plays an important role in the adaptive radiation of whitefish species. However, this parallelism in expression did not extend to cisco and vendace, thereby infirming transcriptional convergence between limnetic whitefish species and their limnetic congeners for these genes. As recently proposed (Lynch 2007a. The evolution of genetic networks by non-adaptive processes. Nat Rev Genet. 8:803-813), these results may suggest that convergent phenotypic evolution can result from nonadaptive shaping of genome architecture in independently evolved coregonine lineages.
Suppressing correlations in massively parallel simulations of lattice models

NASA Astrophysics Data System (ADS)

Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle

2017-11-01

For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.
A Unified Nonlinear Adaptive Approach for Detection and Isolation of Engine Faults

NASA Technical Reports Server (NTRS)

Tang, Liang; DeCastro, Jonathan A.; Zhang, Xiaodong; Farfan-Ramos, Luis; Simon, Donald L.

2010-01-01

A challenging problem in aircraft engine health management (EHM) system development is to detect and isolate faults in system components (i.e., compressor, turbine), actuators, and sensors. Existing nonlinear EHM methods often deal with component faults, actuator faults, and sensor faults separately, which may potentially lead to incorrect diagnostic decisions and unnecessary maintenance. Therefore, it would be ideal to address sensor faults, actuator faults, and component faults under one unified framework. This paper presents a systematic and unified nonlinear adaptive framework for detecting and isolating sensor faults, actuator faults, and component faults for aircraft engines. The fault detection and isolation (FDI) architecture consists of a parallel bank of nonlinear adaptive estimators. Adaptive thresholds are appropriately designed such that, in the presence of a particular fault, all components of the residual generated by the adaptive estimator corresponding to the actual fault type remain below their thresholds. If the faults are sufficiently different, then at least one component of the residual generated by each remaining adaptive estimator should exceed its threshold. Therefore, based on the specific response of the residuals, sensor faults, actuator faults, and component faults can be isolated. The effectiveness of the approach was evaluated using the NASA C-MAPSS turbofan engine model, and simulation results are presented.
Self-consistent field theory simulations of polymers on arbitrary domains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ouaknin, Gaddiel, E-mail: gaddielouaknin@umail.ucsb.edu; Laachi, Nabil; Delaney, Kris

2016-12-15

We introduce a framework for simulating the mesoscale self-assembly of block copolymers in arbitrary confined geometries subject to Neumann boundary conditions. We employ a hybrid finite difference/volume approach to discretize the mean-field equations on an irregular domain represented implicitly by a level-set function. The numerical treatment of the Neumann boundary conditions is sharp, i.e. it avoids an artificial smearing in the irregular domain boundary. This strategy enables the study of self-assembly in confined domains and enables the computation of physically meaningful quantities at the domain interface. In addition, we employ adaptive grids encoded with Quad-/Oc-trees in parallel to automatically refinemore » the grid where the statistical fields vary rapidly as well as at the boundary of the confined domain. This approach results in a significant reduction in the number of degrees of freedom and makes the simulations in arbitrary domains using effective boundary conditions computationally efficient in terms of both speed and memory requirement. Finally, in the case of regular periodic domains, where pseudo-spectral approaches are superior to finite differences in terms of CPU time and accuracy, we use the adaptive strategy to store chain propagators, reducing the memory footprint without loss of accuracy in computed physical observables.« less
An Immersed Boundary - Adaptive Mesh Refinement solver (IB-AMR) for high fidelity fully resolved wind turbine simulations

NASA Astrophysics Data System (ADS)

Angelidis, Dionysios; Sotiropoulos, Fotis

2015-11-01

The geometrical details of wind turbines determine the structure of the turbulence in the near and far wake and should be taken in account when performing high fidelity calculations. Multi-resolution simulations coupled with an immersed boundary method constitutes a powerful framework for high-fidelity calculations past wind farms located over complex terrains. We develop a 3D Immersed-Boundary Adaptive Mesh Refinement flow solver (IB-AMR) which enables turbine-resolving LES of wind turbines. The idea of using a hybrid staggered/non-staggered grid layout adopted in the Curvilinear Immersed Boundary Method (CURVIB) has been successfully incorporated on unstructured meshes and the fractional step method has been employed. The overall performance and robustness of the second order accurate, parallel, unstructured solver is evaluated by comparing the numerical simulations against conforming grid calculations and experimental measurements of laminar and turbulent flows over complex geometries. We also present turbine-resolving multi-scale LES considering all the details affecting the induced flow field; including the geometry of the tower, the nacelle and especially the rotor blades of a wind tunnel scale turbine. This material is based upon work supported by the Department of Energy under Award Number DE-EE0005482 and the Sandia National Laboratories.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Steinman, Jeff S.

1992-01-01

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0

NASA Technical Reports Server (NTRS)

Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine

2004-01-01

We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
Wavelet Transforms in Parallel Image Processing

DTIC Science & Technology

1994-01-27

NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, Y.; Xiong, Y. Y.; Chen, S. Y., E-mail: sychen531@163.com

2016-04-15

The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognizedmore » as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.« less
The soliton transform and a possible application to nonlinear Alfven waves in space

NASA Technical Reports Server (NTRS)

Hada, T.; Hamilton, R. L.; Kennel, C. F.

1993-01-01

The inverse scattering transform (IST) based on the derivative nonlinear Schroedinger (DNLS) equation is applied to a complex time series of nonlinear Alfven wave data generated by numerical simulation. The IST describes the long-time evolution of quasi-parallel Alfven waves more efficiently than the Fourier transform, which is adapted to linear rather than nonlinear problems. When dissipation is added, so the conditions for the validity of the DNLS are not strictly satisfied, the IST continues to provide a compact description of the wavefield in terms of a small number of decaying envelope solitons.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Challabotla, Niranjan Reddy; Zhao, Lihao; Andersson, Helge I.

The rotational motion of inertia-free spheroids has been studied in a numerically simulated turbulent channel flow. Although inertia-free spheroids were translated as tracers with the flow, neither the disk-like nor the rod-like particles adapted to the fluid rotation. The flattest disks preferentially aligned their symmetry axes normal to the wall, whereas the longest rods were parallel with the wall. The shape-dependence of the particle orientations carried over to the particle rotation such that the mean spin was reduced with increasing departure from sphericity. The streamwise spin fluctuations were enhanced due to asphericity, but substantially more for prolate than for oblatemore » spheroids.« less
Random number generators for large-scale parallel Monte Carlo simulations on FPGA

NASA Astrophysics Data System (ADS)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
A sweep algorithm for massively parallel simulation of circuit-switched networks

NASA Technical Reports Server (NTRS)

Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

1992-01-01

A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Identification of Arbitrary Zonation in Groundwater Parameters using the Level Set Method and a Parallel Genetic Algorithm

NASA Astrophysics Data System (ADS)

Lei, H.; Lu, Z.; Vesselinov, V. V.; Ye, M.

2017-12-01

Simultaneous identification of both the zonation structure of aquifer heterogeneity and the hydrogeological parameters associated with these zones is challenging, especially for complex subsurface heterogeneity fields. In this study, a new approach, based on the combination of the level set method and a parallel genetic algorithm is proposed. Starting with an initial guess for the zonation field (including both zonation structure and the hydraulic properties of each zone), the level set method ensures that material interfaces are evolved through the inverse process such that the total residual between the simulated and observed state variables (hydraulic head) always decreases, which means that the inversion result depends on the initial guess field and the minimization process might fail if it encounters a local minimum. To find the global minimum, the genetic algorithm (GA) is utilized to explore the parameters that define initial guess fields, and the minimal total residual corresponding to each initial guess field is considered as the fitness function value in the GA. Due to the expensive evaluation of the fitness function, a parallel GA is adapted in combination with a simulated annealing algorithm. The new approach has been applied to several synthetic cases in both steady-state and transient flow fields, including a case with real flow conditions at the chromium contaminant site at the Los Alamos National Laboratory. The results show that this approach is capable of identifying the arbitrary zonation structures of aquifer heterogeneity and the hydrogeological parameters associated with these zones effectively.
Adaptive radial basis function mesh deformation using data reduction

NASA Astrophysics Data System (ADS)

Gillebaart, T.; Blom, D. S.; van Zuijlen, A. H.; Bijl, H.

2016-09-01

Radial Basis Function (RBF) mesh deformation is one of the most robust mesh deformation methods available. Using the greedy (data reduction) method in combination with an explicit boundary correction, results in an efficient method as shown in literature. However, to ensure the method remains robust, two issues are addressed: 1) how to ensure that the set of control points remains an accurate representation of the geometry in time and 2) how to use/automate the explicit boundary correction, while ensuring a high mesh quality. In this paper, we propose an adaptive RBF mesh deformation method, which ensures the set of control points always represents the geometry/displacement up to a certain (user-specified) criteria, by keeping track of the boundary error throughout the simulation and re-selecting when needed. Opposed to the unit displacement and prescribed displacement selection methods, the adaptive method is more robust, user-independent and efficient, for the cases considered. Secondly, the analysis of a single high aspect ratio cell is used to formulate an equation for the correction radius needed, depending on the characteristics of the correction function used, maximum aspect ratio, minimum first cell height and boundary error. Based on the analysis two new radial basis correction functions are derived and proposed. This proposed automated procedure is verified while varying the correction function, Reynolds number (and thus first cell height and aspect ratio) and boundary error. Finally, the parallel efficiency is studied for the two adaptive methods, unit displacement and prescribed displacement for both the CPU as well as the memory formulation with a 2D oscillating and translating airfoil with oscillating flap, a 3D flexible locally deforming tube and deforming wind turbine blade. Generally, the memory formulation requires less work (due to the large amount of work required for evaluating RBF's), but the parallel efficiency reduces due to the limited bandwidth available between CPU and memory. In terms of parallel efficiency/scaling the different studied methods perform similarly, with the greedy algorithm being the bottleneck. In terms of absolute computational work the adaptive methods are better for the cases studied due to their more efficient selection of the control points. By automating most of the RBF mesh deformation, a robust, efficient and almost user-independent mesh deformation method is presented.
Parallel Evolution of Cold Tolerance within Drosophila melanogaster

PubMed Central

Braun, Dylan T.; Lack, Justin B.

2017-01-01

Drosophila melanogaster originated in tropical Africa before expanding into strikingly different temperate climates in Eurasia and beyond. Here, we find elevated cold tolerance in three distinct geographic regions: beyond the well-studied non-African case, we show that populations from the highlands of Ethiopia and South Africa have significantly increased cold tolerance as well. We observe greater cold tolerance in outbred versus inbred flies, but only in populations with higher inversion frequencies. Each cold-adapted population shows lower inversion frequencies than a closely-related warm-adapted population, suggesting that inversion frequencies may decrease with altitude in addition to latitude. Using the FST-based “Population Branch Excess” statistic (PBE), we found only limited evidence for parallel genetic differentiation at the scale of ∼4 kb windows, specifically between Ethiopian and South African cold-adapted populations. And yet, when we looked for single nucleotide polymorphisms (SNPs) with codirectional frequency change in two or three cold-adapted populations, strong genomic enrichments were observed from all comparisons. These findings could reflect an important role for selection on standing genetic variation leading to “soft sweeps”. One SNP showed sufficient codirectional frequency change in all cold-adapted populations to achieve experiment-wide significance: an intronic variant in the synaptic gene Prosap. Another codirectional outlier SNP, at senseless-2, had a strong association with our cold trait measurements, but in the opposite direction as predicted. More generally, proteins involved in neurotransmission were enriched as potential targets of parallel adaptation. The ability to study cold tolerance evolution in a parallel framework will enhance this classic study system for climate adaptation. PMID:27777283
High-Performance High-Order Simulation of Wave and Plasma Phenomena

NASA Astrophysics Data System (ADS)

Klockner, Andreas

This thesis presents results aiming to enhance and broaden the applicability of the discontinuous Galerkin ("DG") method in a variety of ways. DG was chosen as a foundation for this work because it yields high-order finite element discretizations with very favorable numerical properties for the treatment of hyperbolic conservation laws. In a first part, I examine progress that can be made on implementation aspects of DG. In adapting the method to mass-market massively parallel computation hardware in the form of graphics processors ("GPUs"), I obtain an increase in computation performance per unit of cost by more than an order of magnitude over conventional processor architectures. Key to this advance is a recipe that adapts DG to a variety of hardware through automated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU and greater than 3 TFlops/s on a 16-GPU cluster in simulations of electromagnetics problems in three dimensions. I further briefly discuss the solver infrastructure that makes this possible. In the second part of the thesis, I introduce a number of new numerical methods whose motivation is partly rooted in the opportunity created by GPU-DG: First, I construct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity, helps stabilize DG computations in gas dynamics and a number of other fields. Second, I describe my pursuit of a method that allows the simulation of rarefied plasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, with a focus on applicability to DG discretizations of time-dependent problems.
Relation of Parallel Discrete Event Simulation algorithms with physical models

NASA Astrophysics Data System (ADS)

Shchur, L. N.; Shchur, L. V.

2015-09-01

We extend concept of local simulation times in parallel discrete event simulation (PDES) in order to take into account architecture of the current hardware and software in high-performance computing. We shortly review previous research on the mapping of PDES on physical problems, and emphasise how physical results may help to predict parallel algorithms behaviour.
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.

DTIC Science & Technology

1994-11-01

inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP
GPU-accelerated Red Blood Cells Simulations with Transport Dissipative Particle Dynamics.

PubMed

Blumers, Ansel L; Tang, Yu-Hang; Li, Zhen; Li, Xuejin; Karniadakis, George E

2017-08-01

Mesoscopic numerical simulations provide a unique approach for the quantification of the chemical influences on red blood cell functionalities. The transport Dissipative Particles Dynamics (tDPD) method can lead to such effective multiscale simulations due to its ability to simultaneously capture mesoscopic advection, diffusion, and reaction. In this paper, we present a GPU-accelerated red blood cell simulation package based on a tDPD adaptation of our red blood cell model, which can correctly recover the cell membrane viscosity, elasticity, bending stiffness, and cross-membrane chemical transport. The package essentially processes all computational workloads in parallel by GPU, and it incorporates multi-stream scheduling and non-blocking MPI communications to improve inter-node scalability. Our code is validated for accuracy and compared against the CPU counterpart for speed. Strong scaling and weak scaling are also presented to characterizes scalability. We observe a speedup of 10.1 on one GPU over all 16 cores within a single node, and a weak scaling efficiency of 91% across 256 nodes. The program enables quick-turnaround and high-throughput numerical simulations for investigating chemical-driven red blood cell phenomena and disorders.
Xyce parallel electronic simulator users guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users' guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Data parallel sorting for particle simulation

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1992-01-01

Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
High performance ultrasonic field simulation on complex geometries

NASA Astrophysics Data System (ADS)

Chouh, H.; Rougeron, G.; Chatillon, S.; Iehl, J. C.; Farrugia, J. P.; Ostromoukhov, V.

2016-02-01

Ultrasonic field simulation is a key ingredient for the design of new testing methods as well as a crucial step for NDT inspection simulation. As presented in a previous paper [1], CEA-LIST has worked on the acceleration of these simulations focusing on simple geometries (planar interfaces, isotropic materials). In this context, significant accelerations were achieved on multicore processors and GPUs (Graphics Processing Units), bringing the execution time of realistic computations in the 0.1 s range. In this paper, we present recent works that aim at similar performances on a wider range of configurations. We adapted the physical model used by the CIVA platform to design and implement a new algorithm providing a fast ultrasonic field simulation that yields nearly interactive results for complex cases. The improvements over the CIVA pencil-tracing method include adaptive strategies for pencil subdivisions to achieve a good refinement of the sensor geometry while keeping a reasonable number of ray-tracing operations. Also, interpolation of the times of flight was used to avoid time consuming computations in the impulse response reconstruction stage. To achieve the best performance, our algorithm runs on multi-core superscalar CPUs and uses high performance specialized libraries such as Intel Embree for ray-tracing, Intel MKL for signal processing and Intel TBB for parallelization. We validated the simulation results by comparing them to the ones produced by CIVA on identical test configurations including mono-element and multiple-element transducers, homogeneous, meshed 3D CAD specimens, isotropic and anisotropic materials and wave paths that can involve several interactions with interfaces. We show performance results on complete simulations that achieve computation times in the 1s range.
A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator

DOE PAGES

Engelmann, Christian; Naughton, III, Thomas J.

2016-03-22

Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matchingmore » algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.« less
A Systems Approach to Scalable Transportation Network Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S

2006-01-01

Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
ANNarchy: a code generation approach to neural simulations on parallel hardware

PubMed Central

Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

2015-01-01

Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Xyce parallel electronic simulator : users' guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-artmore » algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.« less
A fast sorting algorithm for a hypersonic rarefied flow particle simulation on the connection machine

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1989-01-01

The data parallel implementation of a particle simulation for hypersonic rarefied flow described by Dagum associates a single parallel data element with each particle in the simulation. The simulated space is divided into discrete regions called cells containing a variable and constantly changing number of particles. The implementation requires a global sort of the parallel data elements so as to arrange them in an order that allows immediate access to the information associated with cells in the simulation. Described here is a very fast algorithm for performing the necessary ranking of the parallel data elements. The performance of the new algorithm is compared with that of the microcoded instruction for ranking on the Connection Machine.

Symplectic molecular dynamics simulations on specially designed parallel computers.

PubMed

Borstnik, Urban; Janezic, Dusanka

2005-01-01

We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Parallel discrete-event simulation of FCFS stochastic queueing networks

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Multimillion atom simulations of dynamics of oxidation of an aluminum nanoparticle and nanoindentation on ceramics.

PubMed

Vashishta, Priya; Kalia, Rajiv K; Nakano, Aiichiro

2006-03-02

We have developed a first-principles-based hierarchical simulation framework, which seamlessly integrates (1) a quantum mechanical description based on the density functional theory (DFT), (2) multilevel molecular dynamics (MD) simulations based on a reactive force field (ReaxFF) that describes chemical reactions and polarization, a nonreactive force field that employs dynamic atomic charges, and an effective force field (EFF), and (3) an atomistically informed continuum model to reach macroscopic length scales. For scalable hierarchical simulations, we have developed parallel linear-scaling algorithms for (1) DFT calculation based on a divide-and-conquer algorithm on adaptive multigrids, (2) chemically reactive MD based on a fast ReaxFF (F-ReaxFF) algorithm, and (3) EFF-MD based on a space-time multiresolution MD (MRMD) algorithm. On 1920 Intel Itanium2 processors, we have demonstrated 1.4 million atom (0.12 trillion grid points) DFT, 0.56 billion atom F-ReaxFF, and 18.9 billion atom MRMD calculations, with parallel efficiency as high as 0.953. Through the use of these algorithms, multimillion atom MD simulations have been performed to study the oxidation of an aluminum nanoparticle. Structural and dynamic correlations in the oxide region are calculated as well as the evolution of charges, surface oxide thickness, diffusivities of atoms, and local stresses. In the microcanonical ensemble, the oxidizing reaction becomes explosive in both molecular and atomic oxygen environments, due to the enormous energy release associated with Al-O bonding. In the canonical ensemble, an amorphous oxide layer of a thickness of approximately 40 angstroms is formed after 466 ps, in good agreement with experiments. Simulations have been performed to study nanoindentation on crystalline, amorphous, and nanocrystalline silicon nitride and silicon carbide. Simulation on nanocrystalline silicon carbide reveals unusual deformation mechanisms in brittle nanophase materials, due to coexistence of brittle grains and soft amorphous-like grain boundary phases. Simulations predict a crossover from intergranular continuous deformation to intragrain discrete deformation at a critical indentation depth.
Synthetic consciousness: the distributed adaptive control perspective

PubMed Central

2016-01-01

Understanding the nature of consciousness is one of the grand outstanding scientific challenges. The fundamental methodological problem is how phenomenal first person experience can be accounted for in a third person verifiable form, while the conceptual challenge is to both define its function and physical realization. The distributed adaptive control theory of consciousness (DACtoc) proposes answers to these three challenges. The methodological challenge is answered relative to the hard problem and DACtoc proposes that it can be addressed using a convergent synthetic methodology using the analysis of synthetic biologically grounded agents, or quale parsing. DACtoc hypothesizes that consciousness in both its primary and secondary forms serves the ability to deal with the hidden states of the world and emerged during the Cambrian period, affording stable multi-agent environments to emerge. The process of consciousness is an autonomous virtualization memory, which serializes and unifies the parallel and subconscious simulations of the hidden states of the world that are largely due to other agents and the self with the objective to extract norms. These norms are in turn projected as value onto the parallel simulation and control systems that are driving action. This functional hypothesis is mapped onto the brainstem, midbrain and the thalamo-cortical and cortico-cortical systems and analysed with respect to our understanding of deficits of consciousness. Subsequently, some of the implications and predictions of DACtoc are outlined, in particular, the prediction that normative bootstrapping of conscious agents is predicated on an intentionality prior. In the view advanced here, human consciousness constitutes the ultimate evolutionary transition by allowing agents to become autonomous with respect to their evolutionary priors leading to a post-biological Anthropocene. This article is part of the themed issue ‘The major synthetic evolutionary transitions’. PMID:27431526
Synthetic consciousness: the distributed adaptive control perspective.

PubMed

Verschure, Paul F M J

2016-08-19

Understanding the nature of consciousness is one of the grand outstanding scientific challenges. The fundamental methodological problem is how phenomenal first person experience can be accounted for in a third person verifiable form, while the conceptual challenge is to both define its function and physical realization. The distributed adaptive control theory of consciousness (DACtoc) proposes answers to these three challenges. The methodological challenge is answered relative to the hard problem and DACtoc proposes that it can be addressed using a convergent synthetic methodology using the analysis of synthetic biologically grounded agents, or quale parsing. DACtoc hypothesizes that consciousness in both its primary and secondary forms serves the ability to deal with the hidden states of the world and emerged during the Cambrian period, affording stable multi-agent environments to emerge. The process of consciousness is an autonomous virtualization memory, which serializes and unifies the parallel and subconscious simulations of the hidden states of the world that are largely due to other agents and the self with the objective to extract norms. These norms are in turn projected as value onto the parallel simulation and control systems that are driving action. This functional hypothesis is mapped onto the brainstem, midbrain and the thalamo-cortical and cortico-cortical systems and analysed with respect to our understanding of deficits of consciousness. Subsequently, some of the implications and predictions of DACtoc are outlined, in particular, the prediction that normative bootstrapping of conscious agents is predicated on an intentionality prior. In the view advanced here, human consciousness constitutes the ultimate evolutionary transition by allowing agents to become autonomous with respect to their evolutionary priors leading to a post-biological Anthropocene.This article is part of the themed issue 'The major synthetic evolutionary transitions'. © 2016 The Author(s).
Progress in Unsteady Turbopump Flow Simulations

NASA Technical Reports Server (NTRS)

Kiris, Cetin C.; Chan, William; Kwak, Dochan; Williams, Robert

2002-01-01

This viewgraph presentation discusses unsteady flow simulations for a turbopump intended for a reusable launch vehicle (RLV). The simulation process makes use of computational grids and parallel processing. The architecture of the parallel computers used is discussed, as is the scripting of turbopump simulations.
A derivation and scalable implementation of the synchronous parallel kinetic Monte Carlo method for simulating long-time dynamics

NASA Astrophysics Data System (ADS)

Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2017-10-01

Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.
A hybrid parallel framework for the cellular Potts model simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiang, Yi; He, Kejing; Dong, Shoubin

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
Parallel discrete event simulation: A shared memory approach

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Parallel, adaptive finite element methods for conservation laws

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Devine, Karen D.; Flaherty, Joseph E.

1994-01-01

We construct parallel finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. A posteriori estimates of spatial errors are obtained by a p-refinement technique using superconvergence at Radau points. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We compare results using different limiting schemes and demonstrate parallel efficiency through computations on an NCUBE/2 hypercube. We also present results using adaptive h- and p-refinement to reduce the computational cost of the method.
Multimodel Kalman filtering for adaptive nonuniformity correction in infrared sensors.

PubMed

Pezoa, Jorge E; Hayat, Majeed M; Torres, Sergio N; Rahman, Md Saifur

2006-06-01

We present an adaptive technique for the estimation of nonuniformity parameters of infrared focal-plane arrays that is robust with respect to changes and uncertainties in scene and sensor characteristics. The proposed algorithm is based on using a bank of Kalman filters in parallel. Each filter independently estimates state variables comprising the gain and the bias matrices of the sensor, according to its own dynamic-model parameters. The supervising component of the algorithm then generates the final estimates of the state variables by forming a weighted superposition of all the estimates rendered by each Kalman filter. The weights are computed and updated iteratively, according to the a posteriori-likelihood principle. The performance of the estimator and its ability to compensate for fixed-pattern noise is tested using both simulated and real data obtained from two cameras operating in the mid- and long-wave infrared regime.
The Center for Multiscale Plasma Dynamics, Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gombosi, Tamas I.

The University of Michigan participated in the joint UCLA/Maryland fusion science center focused on plasma physics problems for which the traditional separation of the dynamics into microscale and macroscale processes breaks down. These processes involve large scale flows and magnetic fields tightly coupled to the small scale, kinetic dynamics of turbulence, particle acceleration and energy cascade. The interaction between these vastly disparate scales controls the evolution of the system. The enormous range of temporal and spatial scales associated with these problems renders direct simulation intractable even in computations that use the largest existing parallel computers. Our efforts focused on twomore » main problems: the development of Hall MHD solvers on solution adaptive grids and the development of solution adaptive grids using generalized coordinates so that the proper geometry of inertial confinement can be taken into account and efficient refinement strategies can be obtained.« less
Turbulence Resolving Flow Simulations of a Francis Turbine in Part Load using Highly Parallel CFD Simulations

NASA Astrophysics Data System (ADS)

Krappel, Timo; Riedelbauch, Stefan; Jester-Zuerker, Roland; Jung, Alexander; Flurl, Benedikt; Unger, Friedeman; Galpin, Paul

2016-11-01

The operation of Francis turbines in part load conditions causes high fluctuations and dynamic loads in the turbine and especially in the draft tube. At the hub of the runner outlet a rotating vortex rope within a low pressure zone arises and propagates into the draft tube cone. The investigated part load operating point is at about 72% discharge of best efficiency. To reduce the possible influence of boundary conditions on the solution, a flow simulation of a complete Francis turbine is conducted consisting of spiral case, stay and guide vanes, runner and draft tube. As the flow has a strong swirling component for the chosen operating point, it is very challenging to accurately predict the flow and in particular the flow losses in the diffusor. The goal of this study is to reach significantly better numerical prediction of this flow type. This is achieved by an improved resolution of small turbulent structures. Therefore, the Scale Adaptive Simulation SAS-SST turbulence model - a scale resolving turbulence model - is applied and compared to the widely used RANS-SST turbulence model. The largest mesh contains 300 million elements, which achieves LES-like resolution throughout much of the computational domain. The simulations are evaluated in terms of the hydraulic losses in the machine, evaluation of the velocity field, pressure oscillations in the draft tube and visual comparisons of turbulent flow structures. A pre-release version of ANSYS CFX 17.0 is used in this paper, as this CFD solver has a parallel performance up to several thousands of cores for this application which includes a transient rotor-stator interface to support the relative motion between the runner and the stationary portions of the water turbine.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A.; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Parallel Tempering of Dark Matter from the Ebola Virus Proteome: Comparison of CHARMM36m and CHARMM22 Force Fields with Implicit Solvent.

PubMed

Olson, Mark A

2018-01-22

Intrinsically disordered proteins are characterized by their large manifold of thermally accessible conformations and their related statistical weights, making them an interesting target of simulation studies. To assess the development of a computational framework for modeling this distinct class of proteins, this work examines temperature-based replica-exchange simulations to generate a conformational ensemble of a 28-residue peptide from the Ebola virus protein VP35. Starting from a prefolded helix-β-turn-helix topology observed in a crystallographic assembly, the simulation strategy tested is the recently refined CHARMM36m force field combined with a generalized Born solvent model. A comparison of two replica-exchange methods is provided, where one is a traditional approach with a fixed set of temperatures and the other is an adaptive scheme in which the thermal windows are allowed to move in temperature space. The assessment is further extended to include a comparison with equivalent CHARMM22 simulation data sets. The analysis finds CHARMM36m to shift the minimum in the potential of mean force (PMF) to a lower fractional helicity compared with CHARMM22, while the latter showed greater conformational plasticity along the helix-forming reaction coordinate. Among the simulation models, only the adaptive tempering method with CHARMM36m found an ensemble of conformational heterogeneity consisting of transitions between α-helix-β-hairpin folds and unstructured states that produced a PMF of fractional fold propensity in qualitative agreement with circular dichroism experiments reporting a disordered peptide.
Xyce Parallel Electronic Simulator Users' Guide Version 6.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Development of a scalable generic platform for adaptive optics real time control

NASA Astrophysics Data System (ADS)

Surendran, Avinash; Burse, Mahesh P.; Ramaprakash, A. N.; Parihar, Padmakar

2015-06-01

The main objective of the present project is to explore the viability of an adaptive optics control system based exclusively on Field Programmable Gate Arrays (FPGAs), making strong use of their parallel processing capability. In an Adaptive Optics (AO) system, the generation of the Deformable Mirror (DM) control voltages from the Wavefront Sensor (WFS) measurements is usually through the multiplication of the wavefront slopes with a predetermined reconstructor matrix. The ability to access several hundred hard multipliers and memories concurrently in an FPGA allows performance far beyond that of a modern CPU or GPU for tasks with a well-defined structure such as Adaptive Optics control. The target of the current project is to generate a signal for a real time wavefront correction, from the signals coming from a Wavefront Sensor, wherein the system would be flexible to accommodate all the current Wavefront Sensing techniques and also the different methods which are used for wavefront compensation. The system should also accommodate for different data transmission protocols (like Ethernet, USB, IEEE 1394 etc.) for transmitting data to and from the FPGA device, thus providing a more flexible platform for Adaptive Optics control. Preliminary simulation results for the formulation of the platform, and a design of a fully scalable slope computer is presented.
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Advances in locally constrained k-space-based parallel MRI.

PubMed

Samsonov, Alexey A; Block, Walter F; Arunachalam, Arjun; Field, Aaron S

2006-02-01

In this article, several theoretical and methodological developments regarding k-space-based, locally constrained parallel MRI (pMRI) reconstruction are presented. A connection between Parallel MRI with Adaptive Radius in k-Space (PARS) and GRAPPA methods is demonstrated. The analysis provides a basis for unified treatment of both methods. Additionally, a weighted PARS reconstruction is proposed, which may absorb different weighting strategies for improved image reconstruction. Next, a fast and efficient method for pMRI reconstruction of data sampled on non-Cartesian trajectories is described. In the new technique, the computational burden associated with the numerous matrix inversions in the original PARS method is drastically reduced by limiting direct calculation of reconstruction coefficients to only a few reference points. The rest of the coefficients are found by interpolating between the reference sets, which is possible due to the similar configuration of points participating in reconstruction for highly symmetric trajectories, such as radial and spirals. As a result, the time requirements are drastically reduced, which makes it practical to use pMRI with non-Cartesian trajectories in many applications. The new technique was demonstrated with simulated and actual data sampled on radial trajectories. Copyright 2006 Wiley-Liss, Inc.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

Jones, Mark Howard

1987-01-01

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

Methods of parallel computation applied on granular simulations

NASA Astrophysics Data System (ADS)

Martins, Gustavo H. B.; Atman, Allbens P. F.

2017-06-01

Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.
Turbomachinery CFD on parallel computers

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

1992-01-01

The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
Load Balancing Unstructured Adaptive Grids for CFD Problems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid

1996-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
Massively parallel multicanonical simulations

NASA Astrophysics Data System (ADS)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Solar wind interaction with Venus and Mars in a parallel hybrid code

NASA Astrophysics Data System (ADS)

Jarvinen, Riku; Sandroos, Arto

2013-04-01

We discuss the development and applications of a new parallel hybrid simulation, where ions are treated as particles and electrons as a charge-neutralizing fluid, for the interaction between the solar wind and Venus and Mars. The new simulation code under construction is based on the algorithm of the sequential global planetary hybrid model developed at the Finnish Meteorological Institute (FMI) and on the Corsair parallel simulation platform also developed at the FMI. The FMI's sequential hybrid model has been used for studies of plasma interactions of several unmagnetized and weakly magnetized celestial bodies for more than a decade. Especially, the model has been used to interpret in situ particle and magnetic field observations from plasma environments of Mars, Venus and Titan. Further, Corsair is an open source MPI (Message Passing Interface) particle and mesh simulation platform, mainly aimed for simulations of diffusive shock acceleration in solar corona and interplanetary space, but which is now also being extended for global planetary hybrid simulations. In this presentation we discuss challenges and strategies of parallelizing a legacy simulation code as well as possible applications and prospects of a scalable parallel hybrid model for the solar wind interactions of Venus and Mars.
Performance evaluation of spatial compounding in the presence of aberration and adaptive imaging

NASA Astrophysics Data System (ADS)

Dahl, Jeremy J.; Guenther, Drake; Trahey, Gregg E.

2003-05-01

Spatial compounding has been used for years to reduce speckle in ultrasonic images and to resolve anatomical features hidden behind the grainy appearance of speckle. Adaptive imaging restores image contrast and resolution by compensating for beamforming errors caused by tissue-induced phase errors. Spatial compounding represents a form of incoherent imaging, whereas adaptive imaging attempts to maintain a coherent, diffraction-limited aperture in the presence of aberration. Using a Siemens Antares scanner, we acquired single channel RF data on a commercially available 1-D probe. Individual channel RF data was acquired on a cyst phantom in the presence of a near field electronic phase screen. Simulated data was also acquired for both a 1-D and a custom built 8x96, 1.75-D probe (Tetrad Corp.). The data was compounded using a receive spatial compounding algorithm; a widely used algorithm because it takes advantage of parallel beamforming to avoid reductions in frame rate. Phase correction was also performed by using a least mean squares algorithm to estimate the arrival time errors. We present simulation and experimental data comparing the performance of spatial compounding to phase correction in contrast and resolution tasks. We evaluate spatial compounding and phase correction, and combinations of the two methods, under varying aperture sizes, aperture overlaps, and aberrator strength to examine the optimum configuration and conditions in which spatial compounding will provide a similar or better result than adaptive imaging. We find that, in general, phase correction is hindered at high aberration strengths and spatial frequencies, whereas spatial compounding is helped by these aberrators.
Acoustic simulation in architecture with parallel algorithm

NASA Astrophysics Data System (ADS)

Li, Xiaohong; Zhang, Xinrong; Li, Dan

2004-03-01

In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
Surface Modification Engineered Assembly of Novel Quantum Dot Architectures for Advanced Applications

DTIC Science & Technology

2008-02-09

Campbell, S. Ogata, and F. Shimojo, “ Multimillion atom simulations of nanosystems on parallel computers,” in Proceedings of the International...nanomesas: multimillion -atom molecular dynamics simulations on parallel computers,” J. Appl. Phys. 94, 6762 (2003). 21. P. Vashishta, R. K. Kalia...and A. Nakano, “ Multimillion atom molecular dynamics simulations of nanoparticles on parallel computers,” Journal of Nanoparticle Research 5, 119-135
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.

PubMed

Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S

2017-10-01

An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT, or to design OCT systems with improved performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Compensating Atmospheric Turbulence Effects at High Zenith Angles with Adaptive Optics Using Advanced Phase Reconstructors

NASA Astrophysics Data System (ADS)

Roggemann, M.; Soehnel, G.; Archer, G.

Atmospheric turbulence degrades the resolution of images of space objects far beyond that predicted by diffraction alone. Adaptive optics telescopes have been widely used for compensating these effects, but as users seek to extend the envelopes of operation of adaptive optics telescopes to more demanding conditions, such as daylight operation, and operation at low elevation angles, the level of compensation provided will degrade. We have been investigating the use of advanced wave front reconstructors and post detection image reconstruction to overcome the effects of turbulence on imaging systems in these more demanding scenarios. In this paper we show results comparing the optical performance of the exponential reconstructor, the least squares reconstructor, and two versions of a reconstructor based on the stochastic parallel gradient descent algorithm in a closed loop adaptive optics system using a conventional continuous facesheet deformable mirror and a Hartmann sensor. The performance of these reconstructors has been evaluated under a range of source visual magnitudes and zenith angles ranging up to 70 degrees. We have also simulated satellite images, and applied speckle imaging, multi-frame blind deconvolution algorithms, and deconvolution algorithms that presume the average point spread function is known to compute object estimates. Our work thus far indicates that the combination of adaptive optics and post detection image processing will extend the useful envelope of the current generation of adaptive optics telescopes.
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

NASA Technical Reports Server (NTRS)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Fast Realistic MRI Simulations Based on Generalized Multi-Pool Exchange Tissue Model.

PubMed

Liu, Fang; Velikina, Julia V; Block, Walter F; Kijowski, Richard; Samsonov, Alexey A

2017-02-01

We present MRiLab, a new comprehensive simulator for large-scale realistic MRI simulations on a regular PC equipped with a modern graphical processing unit (GPU). MRiLab combines realistic tissue modeling with numerical virtualization of an MRI system and scanning experiment to enable assessment of a broad range of MRI approaches including advanced quantitative MRI methods inferring microstructure on a sub-voxel level. A flexible representation of tissue microstructure is achieved in MRiLab by employing the generalized tissue model with multiple exchanging water and macromolecular proton pools rather than a system of independent proton isochromats typically used in previous simulators. The computational power needed for simulation of the biologically relevant tissue models in large 3D objects is gained using parallelized execution on GPU. Three simulated and one actual MRI experiments were performed to demonstrate the ability of the new simulator to accommodate a wide variety of voxel composition scenarios and demonstrate detrimental effects of simplified treatment of tissue micro-organization adapted in previous simulators. GPU execution allowed ∼ 200× improvement in computational speed over standard CPU. As a cross-platform, open-source, extensible environment for customizing virtual MRI experiments, MRiLab streamlines the development of new MRI methods, especially those aiming to infer quantitatively tissue composition and microstructure.
Fast Realistic MRI Simulations Based on Generalized Multi-Pool Exchange Tissue Model

PubMed Central

Velikina, Julia V.; Block, Walter F.; Kijowski, Richard; Samsonov, Alexey A.

2017-01-01

We present MRiLab, a new comprehensive simulator for large-scale realistic MRI simulations on a regular PC equipped with a modern graphical processing unit (GPU). MRiLab combines realistic tissue modeling with numerical virtualization of an MRI system and scanning experiment to enable assessment of a broad range of MRI approaches including advanced quantitative MRI methods inferring microstructure on a sub-voxel level. A flexibl representation of tissue microstructure is achieved in MRiLab by employing the generalized tissue model with multiple exchanging water and macromolecular proton pools rather than a system of independent proton isochromats typically used in previous simulators. The computational power needed for simulation of the biologically relevant tissue models in large 3D objects is gained using parallelized execution on GPU. Three simulated and one actual MRI experiments were performed to demonstrate the ability of the new simulator to accommodate a wide variety of voxel composition scenarios and demonstrate detrimental effects of simplifie treatment of tissue micro-organization adapted in previous simulators. GPU execution allowed ∼200× improvement in computational speed over standard CPU. As a cross-platform, open-source, extensible environment for customizing virtual MRI experiments, MRiLab streamlines the development of new MRI methods, especially those aiming to infer quantitatively tissue composition and microstructure. PMID:28113746
DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, Ashish Kumar, E-mail: ashish.memech@gmail.com; Singh, Akhileshwar; Mokhalingam, A.

Atomistic simulations were conducted to estimate the effect of the carbon nanotube (CNT) reinforcement on the mechanical behavior of CNT-reinforced aluminum (Al) nanocomposite. The periodic system of CNT-Al nanocomposite was built and simulated using molecular dynamics (MD) software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator). The mechanical properties of the nanocomposite were investigated by the application of uniaxial load on one end of the representative volume element (RVE) and fixing the other end. The interactions between the atoms of Al were modeled using embedded atom method (EAM) potentials, whereas Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential was used for themore » interactions among carbon atoms and these pair potentials are coupled with the Lennard-Jones (LJ) potential. The results show that the incorporation of CNT into the Al matrix can increase the Young’s modulus of the nanocomposite substantially. In the present case, i.e. for approximately 9 with % reinforcement of CNT can increase the axial Young’s modulus of the Al matrix up to 77 % as compared to pure Al.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dorier, Matthieu; Sisneros, Roberto; Bautista Gomez, Leonard

While many parallel visualization tools now provide in situ visualization capabilities, the trend has been to feed such tools with large amounts of unprocessed output data and let them render everything at the highest possible resolution. This leads to an increased run time of simulations that still have to complete within a fixed-length job allocation. In this paper, we tackle the challenge of enabling in situ visualization under performance constraints. Our approach shuffles data across processes according to its content and filters out part of it in order to feed a visualization pipeline with only a reorganized subset of themore » data produced by the simulation. Our framework leverages fast, generic evaluation procedures to score blocks of data, using information theory, statistics, and linear algebra. It monitors its own performance and adapts dynamically to achieve appropriate visual fidelity within predefined performance constraints. Experiments on the Blue Waters supercomputer with the CM1 simulation show that our approach enables a 5 speedup with respect to the initial visualization pipeline and is able to meet performance constraints.« less
Extent of QTL Reuse During Repeated Phenotypic Divergence of Sympatric Threespine Stickleback.

PubMed

Conte, Gina L; Arnegard, Matthew E; Best, Jacob; Chan, Yingguang Frank; Jones, Felicity C; Kingsley, David M; Schluter, Dolph; Peichel, Catherine L

2015-11-01

How predictable is the genetic basis of phenotypic adaptation? Answering this question begins by estimating the repeatability of adaptation at the genetic level. Here, we provide a comprehensive estimate of the repeatability of the genetic basis of adaptive phenotypic evolution in a natural system. We used quantitative trait locus (QTL) mapping to discover genomic regions controlling a large number of morphological traits that have diverged in parallel between pairs of threespine stickleback (Gasterosteus aculeatus species complex) in Paxton and Priest lakes, British Columbia. We found that nearly half of QTL affected the same traits in the same direction in both species pairs. Another 40% influenced a parallel phenotypic trait in one lake but not the other. The remaining 10% of QTL had phenotypic effects in opposite directions in the two species pairs. Similarity in the proportional contributions of all QTL to parallel trait differences was about 0.4. Surprisingly, QTL reuse was unrelated to phenotypic effect size. Our results indicate that repeated use of the same genomic regions is a pervasive feature of parallel phenotypic adaptation, at least in sticklebacks. Identifying the causes of this pattern would aid prediction of the genetic basis of phenotypic evolution. Copyright © 2015 by the Genetics Society of America.
Instability of cooperative adaptive cruise control traffic flow: A macroscopic approach

NASA Astrophysics Data System (ADS)

Ngoduy, D.

2013-10-01

This paper proposes a macroscopic model to describe the operations of cooperative adaptive cruise control (CACC) traffic flow, which is an extension of adaptive cruise control (ACC) traffic flow. In CACC traffic flow a vehicle can exchange information with many preceding vehicles through wireless communication. Due to such communication the CACC vehicle can follow its leader at a closer distance than the ACC vehicle. The stability diagrams are constructed from the developed model based on the linear and nonlinear stability method for a certain model parameter set. It is found analytically that CACC vehicles enhance the stabilization of traffic flow with respect to both small and large perturbations compared to ACC vehicles. Numerical simulation is carried out to support our analytical findings. Based on the nonlinear stability analysis, we will show analytically and numerically that the CACC system better improves the dynamic equilibrium capacity over the ACC system. We have argued that in parallel to microscopic models for CACC traffic flow, the newly developed macroscopic will provide a complete insight into the dynamics of intelligent traffic flow.
Low-complexity nonlinear adaptive filter based on a pipelined bilinear recurrent neural network.

PubMed

Zhao, Haiquan; Zeng, Xiangping; He, Zhengyou

2011-09-01

To reduce the computational complexity of the bilinear recurrent neural network (BLRNN), a novel low-complexity nonlinear adaptive filter with a pipelined bilinear recurrent neural network (PBLRNN) is presented in this paper. The PBLRNN, inheriting the modular architectures of the pipelined RNN proposed by Haykin and Li, comprises a number of BLRNN modules that are cascaded in a chained form. Each module is implemented by a small-scale BLRNN with internal dynamics. Since those modules of the PBLRNN can be performed simultaneously in a pipelined parallelism fashion, it would result in a significant improvement of computational efficiency. Moreover, due to nesting module, the performance of the PBLRNN can be further improved. To suit for the modular architectures, a modified adaptive amplitude real-time recurrent learning algorithm is derived on the gradient descent approach. Extensive simulations are carried out to evaluate the performance of the PBLRNN on nonlinear system identification, nonlinear channel equalization, and chaotic time series prediction. Experimental results show that the PBLRNN provides considerably better performance compared to the single BLRNN and RNN models.
Improvement of resolution in full-view linear-array photoacoustic computed tomography using a novel adaptive weighting method

NASA Astrophysics Data System (ADS)

Omidi, Parsa; Diop, Mamadou; Carson, Jeffrey; Nasiriavanaki, Mohammadreza

2017-03-01

Linear-array-based photoacoustic computed tomography is a popular methodology for deep and high resolution imaging. However, issues such as phase aberration, side-lobe effects, and propagation limitations deteriorate the resolution. The effect of phase aberration due to acoustic attenuation and constant assumption of the speed of sound (SoS) can be reduced by applying an adaptive weighting method such as the coherence factor (CF). Utilizing an adaptive beamforming algorithm such as the minimum variance (MV) can improve the resolution at the focal point by eliminating the side-lobes. Moreover, invisibility of directional objects emitting parallel to the detection plane, such as vessels and other absorbing structures stretched in the direction perpendicular to the detection plane can degrade resolution. In this study, we propose a full-view array level weighting algorithm in which different weighs are assigned to different positions of the linear array based on an orientation algorithm which uses the histogram of oriented gradient (HOG). Simulation results obtained from a synthetic phantom show the superior performance of the proposed method over the existing reconstruction methods.
OpenRBC: Redefining the Frontier of Red Blood Cell Simulations at Protein Resolution

NASA Astrophysics Data System (ADS)

Tang, Yu-Hang; Lu, Lu; Li, He; Grinberg, Leopold; Sachdeva, Vipin; Evangelinos, Constantinos; Karniadakis, George

We present a from-scratch development of OpenRBC, a coarse-grained molecular dynamics code, which is capable of performing an unprecedented in silico experiment - simulating an entire mammal red blood cell lipid bilayer and cytoskeleton modeled by 4 million mesoscopic particles - on a single shared memory node. To achieve this, we invented an adaptive spatial searching algorithm to accelerate the computation of short-range pairwise interactions in an extremely sparse 3D space. The algorithm is based on a Voronoi partitioning of the point cloud of coarse-grained particles, and is continuously updated over the course of the simulation. The algorithm enables the construction of a lattice-free cell list, i.e. the key spatial searching data structure in our code, in O (N) time and space space with cells whose position and shape adapts automatically to the local density and curvature. The code implements NUMA/NUCA-aware OpenMP parallelization and achieves perfect scaling with up to hundreds of hardware threads. The code outperforms a legacy solver by more than 8 times in time-to-solution and more than 20 times in problem size, thus providing a new venue for probing the cytomechanics of red blood cells. This work was supported by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Model- ing of Materials (CM4). YHT acknowledges partial financial support from an IBM Ph.D. Scholarship Award.

Crashworthiness simulations with DYNA3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schauer, D.A.; Hoover, C.G.; Kay, G.J.

1996-04-01

Current progress in parallel algorithm research and applications in vehicle crash simulation is described for the explicit, finite element algorithms in DYNA3D. Problem partitioning methods and parallel algorithms for contact at material interfaces are the two challenging algorithm research problems that are addressed. Two prototype parallel contact algorithms have been developed for treating the cases of local and arbitrary contact. Demonstration problems for local contact are crashworthiness simulations with 222 locally defined contact surfaces and a vehicle/barrier collision modeled with arbitrary contact. A simulation of crash tests conducted for a vehicle impacting a U-channel small sign post embedded in soilmore » has been run on both the serial and parallel versions of DYNA3D. A significant reduction in computational time has been observed when running these problems on the parallel version. However, to achieve maximum efficiency, complex problems must be appropriately partitioned, especially when contact dominates the computation.« less
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

PubMed

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
n-body simulations using message passing parallel computers.

NASA Astrophysics Data System (ADS)

Grama, A. Y.; Kumar, V.; Sameh, A.

The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.
A conservative approach to parallelizing the Sharks World simulation

NASA Technical Reports Server (NTRS)

Nicol, David M.; Riffe, Scott E.

1990-01-01

Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Progress in fast, accurate multi-scale climate simulations

DOE PAGES

Collins, W. D.; Johansen, H.; Evans, K. J.; ...

2015-06-01

We present a survey of physical and computational techniques that have the potential to contribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth with these computational improvements include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enablingmore » improved accuracy and fidelity in simulation of dynamics and allowing more complete representations of climate features at the global scale. At the same time, partnerships with computer science teams have focused on taking advantage of evolving computer architectures such as many-core processors and GPUs. As a result, approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
apGA: An adaptive parallel genetic algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liepins, G.E.; Baluja, S.

1991-01-01

We develop apGA, a parallel variant of the standard generational GA, that combines aggressive search with perpetual novelty, yet is able to preserve enough genetic structure to optimally solve variably scaled, non-uniform block deceptive and hierarchical deceptive problems. apGA combines elitism, adaptive mutation, adaptive exponential scaling, and temporal memory. We present empirical results for six classes of problems, including the DeJong test suite. Although we have not investigated hybrids, we note that apGA could be incorporated into other recent GA variants such as GENITOR, CHC, and the recombination stage of mGA. 12 refs., 2 figs., 2 tabs.
Cosmological N-body Simulation

NASA Astrophysics Data System (ADS)

Lake, George

1994-05-01

.90ex> }}} The ``N'' in N-body calculations has doubled every year for the last two decades. To continue this trend, the UW N-body group is working on algorithms for the fast evaluation of gravitational forces on parallel computers and establishing rigorous standards for the computations. In these algorithms, the computational cost per time step is ~ 10(3) pairwise forces per particle. A new adaptive time integrator enables us to perform high quality integrations that are fully temporally and spatially adaptive. SPH--smoothed particle hydrodynamics will be added to simulate the effects of dissipating gas and magnetic fields. The importance of these calculations is two-fold. First, they determine the nonlinear consequences of theories for the structure of the Universe. Second, they are essential for the interpretation of observations. Every galaxy has six coordinates of velocity and position. Observations determine two sky coordinates and a line of sight velocity that bundles universal expansion (distance) together with a random velocity created by the mass distribution. Simulations are needed to determine the underlying structure and masses. The importance of simulations has moved from ex post facto explanation to an integral part of planning large observational programs. I will show why high quality simulations with ``large N'' are essential to accomplish our scientific goals. This year, our simulations have N >~ 10(7) . This is sufficient to tackle some niche problems, but well short of our 5 year goal--simulating The Sloan Digital Sky Survey using a few Billion particles (a Teraflop-year simulation). Extrapolating past trends, we would have to ``wait'' 7 years for this hundred-fold improvement. Like past gains, significant changes in the computational methods are required for these advances. I will describe new algorithms, algorithmic hacks and a dedicated computer to perform Billion particle simulations. Finally, I will describe research that can be enabled by Petaflop computers. This research is supported by the NASA HPCC/ESS program.
AC losses in horizontally parallel HTS tapes for possible wireless power transfer applications

NASA Astrophysics Data System (ADS)

Shen, Boyang; Geng, Jianzhao; Zhang, Xiuchang; Fu, Lin; Li, Chao; Zhang, Heng; Dong, Qihuan; Ma, Jun; Gawith, James; Coombs, T. A.

2017-12-01

This paper presents the concept of using horizontally parallel HTS tapes with AC loss study, and the investigation on possible wireless power transfer (WPT) applications. An example of three parallel HTS tapes was proposed, whose AC loss study was carried out both from experiment using electrical method; and simulation using 2D H-formulation on the FEM platform of COMSOL Multiphysics. The electromagnetic induction around the three parallel tapes was monitored using COMSOL simulation. The electromagnetic induction and AC losses generated by a conventional three turn coil was simulated as well, and then compared to the case of three parallel tapes with the same AC transport current. The analysis demonstrates that HTS parallel tapes could be potentially used into wireless power transfer systems, which could have lower total AC losses than conventional HTS coils.
Divide-and-conquer density functional theory on hierarchical real-space grids: Parallel implementation and applications

NASA Astrophysics Data System (ADS)

Shimojo, Fuyuki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2008-02-01

A linear-scaling algorithm based on a divide-and-conquer (DC) scheme has been designed to perform large-scale molecular-dynamics (MD) simulations, in which interatomic forces are computed quantum mechanically in the framework of the density functional theory (DFT). Electronic wave functions are represented on a real-space grid, which is augmented with a coarse multigrid to accelerate the convergence of iterative solutions and with adaptive fine grids around atoms to accurately calculate ionic pseudopotentials. Spatial decomposition is employed to implement the hierarchical-grid DC-DFT algorithm on massively parallel computers. The largest benchmark tests include 11.8×106 -atom ( 1.04×1012 electronic degrees of freedom) calculation on 131 072 IBM BlueGene/L processors. The DC-DFT algorithm has well-defined parameters to control the data locality, with which the solutions converge rapidly. Also, the total energy is well conserved during the MD simulation. We perform first-principles MD simulations based on the DC-DFT algorithm, in which large system sizes bring in excellent agreement with x-ray scattering measurements for the pair-distribution function of liquid Rb and allow the description of low-frequency vibrational modes of graphene. The band gap of a CdSe nanorod calculated by the DC-DFT algorithm agrees well with the available conventional DFT results. With the DC-DFT algorithm, the band gap is calculated for larger system sizes until the result reaches the asymptotic value.
Genetic adaptations of the plateau zokor in high-elevation burrows.

PubMed

Shao, Yong; Li, Jin-Xiu; Ge, Ri-Li; Zhong, Li; Irwin, David M; Murphy, Robert W; Zhang, Ya-Ping

2015-11-25

The plateau zokor (Myospalax baileyi) spends its entire life underground in sealed burrows. Confronting limited oxygen and high carbon dioxide concentrations, and complete darkness, they epitomize a successful physiological adaptation. Here, we employ transcriptome sequencing to explore the genetic underpinnings of their adaptations to this unique habitat. Compared to Rattus norvegicus, genes belonging to GO categories related to energy metabolism (e.g. mitochondrion and fatty acid beta-oxidation) underwent accelerated evolution in the plateau zokor. Furthermore, the numbers of positively selected genes were significantly enriched in the gene categories involved in ATPase activity, blood vessel development and respiratory gaseous exchange, functional categories that are relevant to adaptation to high altitudes. Among the 787 genes with evidence of parallel evolution, and thus identified as candidate genes, several GO categories (e.g. response to hypoxia, oxygen homeostasis and erythrocyte homeostasis) are significantly enriched, are two genes, EPAS1 and AJUBA, involved in the response to hypoxia, where the parallel evolved sites are at positions that are highly conserved in sequence alignments from multiple species. Thus, accelerated evolution of GO categories, positive selection and parallel evolution at the molecular level provide evidences to parse the genetic adaptations of the plateau zokor for living in high-elevation burrows.
The dynamics of diverse segmental amplifications in populations of Saccharomyces cerevisiae adapting to strong selection.

PubMed

Payen, Celia; Di Rienzi, Sara C; Ong, Giang T; Pogachar, Jamie L; Sanchez, Joseph C; Sunshine, Anna B; Raghuraman, M K; Brewer, Bonita J; Dunham, Maitreya J

2014-03-20

Population adaptation to strong selection can occur through the sequential or parallel accumulation of competing beneficial mutations. The dynamics, diversity, and rate of fixation of beneficial mutations within and between populations are still poorly understood. To study how the mutational landscape varies across populations during adaptation, we performed experimental evolution on seven parallel populations of Saccharomyces cerevisiae continuously cultured in limiting sulfate medium. By combining quantitative polymerase chain reaction, array comparative genomic hybridization, restriction digestion and contour-clamped homogeneous electric field gel electrophoresis, and whole-genome sequencing, we followed the trajectory of evolution to determine the identity and fate of beneficial mutations. During a period of 200 generations, the yeast populations displayed parallel evolutionary dynamics that were driven by the coexistence of independent beneficial mutations. Selective amplifications rapidly evolved under this selection pressure, in particular common inverted amplifications containing the sulfate transporter gene SUL1. Compared with single clones, detailed analysis of the populations uncovers a greater complexity whereby multiple subpopulations arise and compete despite a strong selection. The most common evolutionary adaptation to strong selection in these populations grown in sulfate limitation is determined by clonal interference, with adaptive variants both persisting and replacing one another.
The Dynamics of Diverse Segmental Amplifications in Populations of Saccharomyces cerevisiae Adapting to Strong Selection

PubMed Central

Payen, Celia; Di Rienzi, Sara C.; Ong, Giang T.; Pogachar, Jamie L.; Sanchez, Joseph C.; Sunshine, Anna B.; Raghuraman, M. K.; Brewer, Bonita J.; Dunham, Maitreya J.

2014-01-01

Population adaptation to strong selection can occur through the sequential or parallel accumulation of competing beneficial mutations. The dynamics, diversity, and rate of fixation of beneficial mutations within and between populations are still poorly understood. To study how the mutational landscape varies across populations during adaptation, we performed experimental evolution on seven parallel populations of Saccharomyces cerevisiae continuously cultured in limiting sulfate medium. By combining quantitative polymerase chain reaction, array comparative genomic hybridization, restriction digestion and contour-clamped homogeneous electric field gel electrophoresis, and whole-genome sequencing, we followed the trajectory of evolution to determine the identity and fate of beneficial mutations. During a period of 200 generations, the yeast populations displayed parallel evolutionary dynamics that were driven by the coexistence of independent beneficial mutations. Selective amplifications rapidly evolved under this selection pressure, in particular common inverted amplifications containing the sulfate transporter gene SUL1. Compared with single clones, detailed analysis of the populations uncovers a greater complexity whereby multiple subpopulations arise and compete despite a strong selection. The most common evolutionary adaptation to strong selection in these populations grown in sulfate limitation is determined by clonal interference, with adaptive variants both persisting and replacing one another. PMID:24368781
Parallel trait adaptation across opposing thermal environments in experimental Drosophila melanogaster populations

PubMed Central

Tobler, Ray; Hermisson, Joachim; Schlötterer, Christian

2015-01-01

Thermal stress is a pervasive selective agent in natural populations that impacts organismal growth, survival, and reproduction. Drosophila melanogaster exhibits a variety of putatively adaptive phenotypic responses to thermal stress in natural and experimental settings; however, accompanying assessments of fitness are typically lacking. Here, we quantify changes in fitness and known thermal tolerance traits in replicated experimental D. melanogaster populations following more than 40 generations of evolution to either cyclic cold or hot temperatures. By evaluating fitness for both evolved populations alongside a reconstituted starting population, we show that the evolved populations were the best adapted within their respective thermal environments. More strikingly, the evolved populations exhibited increased fitness in both environments and improved resistance to both acute heat and cold stress. This unexpected parallel response appeared to be an adaptation to the rapid temperature changes that drove the cycling thermal regimes, as parallel fitness changes were not observed when tested in a constant thermal environment. Our results add to a small, but growing group of studies that demonstrate the importance of fluctuating temperature changes for thermal adaptation and highlight the need for additional work in this area. PMID:26080903
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Parallel architectures for iterative methods on adaptive, block structured grids

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1983-01-01

A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.« less
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH

NASA Astrophysics Data System (ADS)

Lee, D.; Gopal, S.; Mohapatra, P.

2012-07-01

We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
On efficiency of fire simulation realization: parallelization with greater number of computational meshes

NASA Astrophysics Data System (ADS)

Valasek, Lukas; Glasa, Jan

2017-12-01

Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
Development of a parallel FE simulator for modeling the whole trans-scale failure process of rock from meso- to engineering-scale

NASA Astrophysics Data System (ADS)

Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao

2017-01-01

Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.

Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmalz, Mark S

2011-07-24

Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Development of the Fully Adaptive Storm Tide (FAST) Model for hurricane induced storm surges and associated inundation

NASA Astrophysics Data System (ADS)

Teng, Y. C.; Kelly, D.; Li, Y.; Zhang, K.

2016-02-01

A new state-of-the-art model (the Fully Adaptive Storm Tide model, FAST) for the prediction of storm surges over complex landscapes is presented. The FAST model is based on the conservation form of the full non-linear depth-averaged long wave equations. The equations are solved via an explicit finite volume scheme with interfacial fluxes being computed via Osher's approximate Riemann solver. Geometric source terms are treated in a high order manner that is well-balanced. The numerical solution technique has been chosen to enable the accurate simulation of wetting and drying over complex topography. Another important feature of the FAST model is the use of a simple underlying Cartesian mesh with tree-based static and dynamic adaptive mesh refinement (AMR). This permits the simulation of unsteady flows over varying landscapes (including localized features such as canals) by locally increasing (or relaxing) grid resolution in a dynamic fashion. The use of (dynamic) AMR lowers the computational cost of the storm surge model whilst retaining high resolution (and thus accuracy) where and when it is required. In additional, the FAST model has been designed to execute in a parallel computational environment with localized time-stepping. The FAST model has already been carefully verified against a series of benchmark type problems (Kelly et al. 2015). Here we present two simulations of the storm tide due to Hurricane Ike(2008) and Hurricane Sandy (2012). The model incorporates high resolution LIDAR data for the major portion of the New York City. Results compare favorably with water elevations measured by NOAA tidal gauges, by mobile sensors deployed and high water marks collected by the USGS.
The Clawpack Community of Codes

NASA Astrophysics Data System (ADS)

Mandli, K. T.; LeVeque, R. J.; Ketcheson, D.; Ahmadia, A. J.

2014-12-01

Clawpack, the Conservation Laws Package, has long been one of the standards for solving hyperbolic conservation laws but over the years has extended well beyond this role. Today a community of open-source codes have been developed that address a multitude of different needs including non-conservative balance laws, high-order accurate methods, and parallelism while remaining extensible and easy to use, largely by the judicious use of Python and the original Fortran codes that it wraps. This talk will present some of the recent developments in projects under the Clawpack umbrella, notably the GeoClaw and PyClaw projects. GeoClaw was originally developed as a tool for simulating tsunamis using adaptive mesh refinement but has since encompassed a large number of other geophysically relevant flows including storm surge and debris-flows. PyClaw originated as a Python version of the original Clawpack algorithms but has since been both a testing ground for new algorithmic advances in the Clawpack framework but also an easily extensible framework for solving hyperbolic balance laws. Some of these extensions include the addition of WENO high-order methods, massively parallel capabilities, and adaptive mesh refinement technologies, made possible largely by the flexibility of the Python language and community libraries such as NumPy and PETSc. Because of the tight integration with Python tecnologies, both packages have benefited also from the focus on reproducibility in the Python community, notably IPython notebooks.
3D magnetospheric parallel hybrid multi-grid method applied to planet–plasma interactions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leclercq, L., E-mail: ludivine.leclercq@latmos.ipsl.fr; Modolo, R., E-mail: ronan.modolo@latmos.ipsl.fr; Leblanc, F.

2016-03-15

We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet–plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order tomore » conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.« less
Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

PubMed

Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

2016-10-01

Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
Real-time electron dynamics for massively parallel excited-state simulations

NASA Astrophysics Data System (ADS)

Andrade, Xavier

The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Fitting neuron models to spike trains.

PubMed

Rossant, Cyrille; Goodman, Dan F M; Fontaine, Bertrand; Platkiewicz, Jonathan; Magnusson, Anna K; Brette, Romain

2011-01-01

Computational modeling is increasingly used to understand the function of neural circuits in systems neuroscience. These studies require models of individual neurons with realistic input-output properties. Recently, it was found that spiking models can accurately predict the precisely timed spike trains produced by cortical neurons in response to somatically injected currents, if properly fitted. This requires fitting techniques that are efficient and flexible enough to easily test different candidate models. We present a generic solution, based on the Brian simulator (a neural network simulator in Python), which allows the user to define and fit arbitrary neuron models to electrophysiological recordings. It relies on vectorization and parallel computing techniques to achieve efficiency. We demonstrate its use on neural recordings in the barrel cortex and in the auditory brainstem, and confirm that simple adaptive spiking models can accurately predict the response of cortical neurons. Finally, we show how a complex multicompartmental model can be reduced to a simple effective spiking model.
SDAV Viz July Progress Update: LANL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher Meyer

2012-07-30

SDAV Viz July Progress Update: (1) VPIC (Vector Particle in Cell) Kinetic Plasma Simulation Code - (a) Implemented first version of an in-situ adapter based on Paraview CoProcessing Library, (b) Three pipelines: vtkDataSetMapper, vtkContourFilter, vtkPistonContour, (c) Next, resolve issue at boundaries of processor domains; add more advanced viz/analysis pipelines; (2) Halo finding/merger trees - (a) Summer student Wathsala W. from University of Utah is working on data-parallel halo finder algorithm using PISTON, (b) Timo Bremer (LLNL), Valerio Pascucci (Utah), George Zagaris (Kitware), and LANL people are interested in using merger trees for tracking the evolution of halos in cosmo simulations;more » discussed possible overlap with work by Salman Habib and Katrin Heitmann (Argonne) during their visit to LANL 7/11; (3) PISTON integration in ParaView - Now available from ParaView github.« less
A class of hybrid finite element methods for electromagnetics: A review

NASA Technical Reports Server (NTRS)

Volakis, J. L.; Chatterjee, A.; Gong, J.

1993-01-01

Integral equation methods have generally been the workhorse for antenna and scattering computations. In the case of antennas, they continue to be the prominent computational approach, but for scattering applications the requirement for large-scale computations has turned researchers' attention to near neighbor methods such as the finite element method, which has low O(N) storage requirements and is readily adaptable in modeling complex geometrical features and material inhomogeneities. In this paper, we review three hybrid finite element methods for simulating composite scatterers, conformal microstrip antennas, and finite periodic arrays. Specifically, we discuss the finite element method and its application to electromagnetic problems when combined with the boundary integral, absorbing boundary conditions, and artificial absorbers for terminating the mesh. Particular attention is given to large-scale simulations, methods, and solvers for achieving low memory requirements and code performance on parallel computing architectures.
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over themore » $$\\mu$$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.« less
Parallel evolutionary computation in bioinformatics applications.

PubMed

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
The Planetary and Space Simulation Facilities at DLR Cologne

NASA Astrophysics Data System (ADS)

Rabbow, Elke; Parpart, André; Reitz, Günther

2016-06-01

Astrobiology strives to increase our knowledge on the origin, evolution and distribution of life, on Earth and beyond. In the past centuries, life has been found on Earth in environments with extreme conditions that were expected to be uninhabitable. Scientific investigations of the underlying metabolic mechanisms and strategies that lead to the high adaptability of these extremophile organisms increase our understanding of evolution and distribution of life on Earth. Life as we know it depends on the availability of liquid water. Exposure of organisms to defined and complex extreme environmental conditions, in particular those that limit the water availability, allows the investigation of the survival mechanisms as well as an estimation of the possibility of the distribution to and survivability on other celestial bodies of selected organisms. Space missions in low Earth orbit (LEO) provide access for experiments to complex environmental conditions not available on Earth, but studies on the molecular and cellular mechanisms of adaption to these hostile conditions and on the limits of life cannot be performed exclusively in space experiments. Experimental space is limited and allows only the investigation of selected endpoints. An additional intensive ground based program is required, with easy to access facilities capable to simulate space and planetary environments, in particular with focus on temperature, pressure, atmospheric composition and short wavelength solar ultraviolet radiation (UV). DLR Cologne operates a number of Planetary and Space Simulation facilities (PSI) where microorganisms from extreme terrestrial environments or known for their high adaptability are exposed for mechanistic studies. Space or planetary parameters are simulated individually or in combination in temperature controlled vacuum facilities equipped with a variety of defined and calibrated irradiation sources. The PSI support basic research and were recurrently used for pre-flight test programs for several astrobiological space missions. Parallel experiments on ground provided essential complementary data supporting the scientific interpretation of the data received from the space missions.
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fathany, Maulana Yusuf, E-mail: myfathany@gmail.com; Fuada, Syifaul, E-mail: fsyifaul@gmail.com; Lawu, Braham Lawas, E-mail: bram-labs@rocketmail.com

2016-04-19

This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
High-resolution coupled ice sheet-ocean modeling using the POPSICLES model

NASA Astrophysics Data System (ADS)

Ng, E. G.; Martin, D. F.; Asay-Davis, X.; Price, S. F.; Collins, W.

2014-12-01

It is expected that a primary driver of future change of the Antarctic ice sheet will be changes in submarine melting driven by incursions of warm ocean water into sub-ice shelf cavities. Correctly modeling this response on a continental scale will require high-resolution modeling of the coupled ice-ocean system. We describe the computational and modeling challenges in our simulations of the full Southern Ocean coupled to a continental-scale Antarctic ice sheet model at unprecedented spatial resolutions (0.1 degree for the ocean model and adaptive mesh refinement down to 500m in the ice sheet model). The POPSICLES model couples the POP2x ocean model, a modified version of the Parallel Ocean Program (Smith and Gent, 2002), with the BISICLES ice-sheet model (Cornford et al., 2012) using a synchronous offline-coupling scheme. Part of the PISCEES SciDAC project and built on the Chombo framework, BISICLES makes use of adaptive mesh refinement to fully resolve dynamically-important regions like grounding lines and employs a momentum balance similar to the vertically-integrated formulation of Schoof and Hindmarsh (2009). Results of BISICLES simulations have compared favorably to comparable simulations with a Stokes momentum balance in both idealized tests like MISMIP3D (Pattyn et al., 2013) and realistic configurations (Favier et al. 2014). POP2x includes sub-ice-shelf circulation using partial top cells (Losch, 2008) and boundary layer physics following Holland and Jenkins (1999), Jenkins (2001), and Jenkins et al. (2010). Standalone POP2x output compares well with standard ice-ocean test cases (e.g., ISOMIP; Losch, 2008) and other continental-scale simulations and melt-rate observations (Kimura et al., 2013; Rignot et al., 2013). For the POPSICLES Antarctic-Southern Ocean simulations, ice sheet and ocean models communicate at one-month coupling intervals.
Distributed parameter system coupled ARMA expansion identification and adaptive parallel IIR filtering - A unified problem statement. [Auto Regressive Moving-Average

NASA Technical Reports Server (NTRS)

Johnson, C. R., Jr.; Balas, M. J.

1980-01-01

A novel interconnection of distributed parameter system (DPS) identification and adaptive filtering is presented, which culminates in a common statement of coupled autoregressive, moving-average expansion or parallel infinite impulse response configuration adaptive parameterization. The common restricted complexity filter objectives are seen as similar to the reduced-order requirements of the DPS expansion description. The interconnection presents the possibility of an exchange of problem formulations and solution approaches not yet easily addressed in the common finite dimensional lumped-parameter system context. It is concluded that the shared problems raised are nevertheless many and difficult.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

NASA Technical Reports Server (NTRS)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fijany, A.; Milman, M.; Redding, D.

1994-12-31

In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less

Parallel-Processing Test Bed For Simulation Software

NASA Technical Reports Server (NTRS)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Efficient Round-Trip Time Optimization for Replica-Exchange Enveloping Distribution Sampling (RE-EDS).

PubMed

Sidler, Dominik; Cristòfol-Clough, Michael; Riniker, Sereina

2017-06-13

Replica-exchange enveloping distribution sampling (RE-EDS) allows the efficient estimation of free-energy differences between multiple end-states from a single molecular dynamics (MD) simulation. In EDS, a reference state is sampled, which can be tuned by two types of parameters, i.e., smoothness parameters(s) and energy offsets, such that all end-states are sufficiently sampled. However, the choice of these parameters is not trivial. Replica exchange (RE) or parallel tempering is a widely applied technique to enhance sampling. By combining EDS with the RE technique, the parameter choice problem could be simplified and the challenge shifted toward an optimal distribution of the replicas in the smoothness-parameter space. The choice of a certain replica distribution can alter the sampling efficiency significantly. In this work, global round-trip time optimization (GRTO) algorithms are tested for the use in RE-EDS simulations. In addition, a local round-trip time optimization (LRTO) algorithm is proposed for systems with slowly adapting environments, where a reliable estimate for the round-trip time is challenging to obtain. The optimization algorithms were applied to RE-EDS simulations of a system of nine small-molecule inhibitors of phenylethanolamine N-methyltransferase (PNMT). The energy offsets were determined using our recently proposed parallel energy-offset (PEOE) estimation scheme. While the multistate GRTO algorithm yielded the best replica distribution for the ligands in water, the multistate LRTO algorithm was found to be the method of choice for the ligands in complex with PNMT. With this, the 36 alchemical free-energy differences between the nine ligands were calculated successfully from a single RE-EDS simulation 10 ns in length. Thus, RE-EDS presents an efficient method for the estimation of relative binding free energies.
Collisionless stellar hydrodynamics as an efficient alternative to N-body methods

NASA Astrophysics Data System (ADS)

Mitchell, Nigel L.; Vorobyov, Eduard I.; Hensler, Gerhard

2013-01-01

The dominant constituents of the Universe's matter are believed to be collisionless in nature and thus their modelling in any self-consistent simulation is extremely important. For simulations that deal only with dark matter or stellar systems, the conventional N-body technique is fast, memory efficient and relatively simple to implement. However when extending simulations to include the effects of gas physics, mesh codes are at a distinct disadvantage compared to Smooth Particle Hydrodynamics (SPH) codes. Whereas implementing the N-body approach into SPH codes is fairly trivial, the particle-mesh technique used in mesh codes to couple collisionless stars and dark matter to the gas on the mesh has a series of significant scientific and technical limitations. These include spurious entropy generation resulting from discreteness effects, poor load balancing and increased communication overhead which spoil the excellent scaling in massively parallel grid codes. In this paper we propose the use of the collisionless Boltzmann moment equations as a means to model the collisionless material as a fluid on the mesh, implementing it into the massively parallel FLASH Adaptive Mesh Refinement (AMR) code. This approach which we term `collisionless stellar hydrodynamics' enables us to do away with the particle-mesh approach and since the parallelization scheme is identical to that used for the hydrodynamics, it preserves the excellent scaling of the FLASH code already demonstrated on peta-flop machines. We find that the classic hydrodynamic equations and the Boltzmann moment equations can be reconciled under specific conditions, allowing us to generate analytic solutions for collisionless systems using conventional test problems. We confirm the validity of our approach using a suite of demanding test problems, including the use of a modified Sod shock test. By deriving the relevant eigenvalues and eigenvectors of the Boltzmann moment equations, we are able to use high order accurate characteristic tracing methods with Riemann solvers to generate numerical solutions which show excellent agreement with our analytic solutions. We conclude by demonstrating the ability of our code to model complex phenomena by simulating the evolution of a two-armed spiral galaxy whose properties agree with those predicted by the swing amplification theory.
Towards Exascale Seismic Imaging and Inversion

NASA Astrophysics Data System (ADS)

Tromp, J.; Bozdag, E.; Lefebvre, M. P.; Smith, J. A.; Lei, W.; Ruan, Y.

2015-12-01

Post-petascale supercomputers are now available to solve complex scientific problems that were thought unreachable a few decades ago. They also bring a cohort of concerns tied to obtaining optimum performance. Several issues are currently being investigated by the HPC community. These include energy consumption, fault resilience, scalability of the current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. In this presentation, we focus on the last three issues. In the context of seismic imaging and inversion, in particular for simulations based on adjoint methods, workflows are well defined.They consist of a few collective steps (e.g., mesh generation or model updates) and of a large number of independent steps (e.g., forward and adjoint simulations of each seismic event, pre- and postprocessing of seismic traces). The greater goal is to reduce the time to solution, that is, obtaining a more precise representation of the subsurface as fast as possible. This brings us to consider both the workflow in its entirety and the parts comprising it. The usual approach is to speedup the purely computational parts based on code optimization in order to reach higher FLOPS and better memory management. This still remains an important concern, but larger scale experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for purely computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). Parallel I/O libraries, namely HDF5 and ADIOS, are used to drastically reduce the cost of disk access. Parallel visualization tools, such as VisIt, are able to take advantage of ADIOS metadata to extract features and display massive datasets. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the imaging process with the integration of scientific workflow management tools, specifically Pegasus.
Applications of New Surrogate Global Optimization Algorithms including Efficient Synchronous and Asynchronous Parallelism for Calibration of Expensive Nonlinear Geophysical Simulation Models.

NASA Astrophysics Data System (ADS)

Shoemaker, C. A.; Pang, M.; Akhtar, T.; Bindel, D.

2016-12-01

New parallel surrogate global optimization algorithms are developed and applied to objective functions that are expensive simulations (possibly with multiple local minima). The algorithms can be applied to most geophysical simulations, including those with nonlinear partial differential equations. The optimization does not require simulations be parallelized. Asynchronous (and synchronous) parallel execution is available in the optimization toolbox "pySOT". The parallel algorithms are modified from serial to eliminate fine grained parallelism. The optimization is computed with open source software pySOT, a Surrogate Global Optimization Toolbox that allows user to pick the type of surrogate (or ensembles), the search procedure on surrogate, and the type of parallelism (synchronous or asynchronous). pySOT also allows the user to develop new algorithms by modifying parts of the code. In the applications here, the objective function takes up to 30 minutes for one simulation, and serial optimization can take over 200 hours. Results from Yellowstone (NSF) and NCSS (Singapore) supercomputers are given for groundwater contaminant hydrology simulations with applications to model parameter estimation and decontamination management. All results are compared with alternatives. The first results are for optimization of pumping at many wells to reduce cost for decontamination of groundwater at a superfund site. The optimization runs with up to 128 processors. Superlinear speed up is obtained for up to 16 processors, and efficiency with 64 processors is over 80%. Each evaluation of the objective function requires the solution of nonlinear partial differential equations to describe the impact of spatially distributed pumping and model parameters on model predictions for the spatial and temporal distribution of groundwater contaminants. The second application uses an asynchronous parallel global optimization for groundwater quality model calibration. The time for a single objective function evaluation varies unpredictably, so efficiency is improved with asynchronous parallel calculations to improve load balancing. The third application (done at NCSS) incorporates new global surrogate multi-objective parallel search algorithms into pySOT and applies it to a large watershed calibration problem.
Exploring the Ability of a Coarse-grained Potential to Describe the Stress-strain Response of Glassy Polystyrene

DTIC Science & Technology

2012-10-01

using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
Look-ahead Dynamic Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-10-20

Look-ahead dynamic simulation software system incorporates the high performance parallel computing technologies, significantly reduces the solution time for each transient simulation case, and brings the dynamic simulation analysis into on-line applications to enable more transparency for better reliability and asset utilization. It takes the snapshot of the current power grid status, functions in parallel computing the system dynamic simulation, and outputs the transient response of the power system in real time.
Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

NASA Astrophysics Data System (ADS)

Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

2011-10-01

SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
Unstructured Adaptive Grid Computations on an Array of SMPs

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.

1996-01-01

Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2

NASA Technical Reports Server (NTRS)

Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.

1996-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

NASA Astrophysics Data System (ADS)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-01

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Parallel trait adaptation across opposing thermal environments in experimental Drosophila melanogaster populations.

PubMed

Tobler, Ray; Hermisson, Joachim; Schlötterer, Christian

2015-07-01

Thermal stress is a pervasive selective agent in natural populations that impacts organismal growth, survival, and reproduction. Drosophila melanogaster exhibits a variety of putatively adaptive phenotypic responses to thermal stress in natural and experimental settings; however, accompanying assessments of fitness are typically lacking. Here, we quantify changes in fitness and known thermal tolerance traits in replicated experimental D. melanogaster populations following more than 40 generations of evolution to either cyclic cold or hot temperatures. By evaluating fitness for both evolved populations alongside a reconstituted starting population, we show that the evolved populations were the best adapted within their respective thermal environments. More strikingly, the evolved populations exhibited increased fitness in both environments and improved resistance to both acute heat and cold stress. This unexpected parallel response appeared to be an adaptation to the rapid temperature changes that drove the cycling thermal regimes, as parallel fitness changes were not observed when tested in a constant thermal environment. Our results add to a small, but growing group of studies that demonstrate the importance of fluctuating temperature changes for thermal adaptation and highlight the need for additional work in this area. © 2015 The Author(s). Evolution published by Wiley Periodicals, Inc. on behalf of The Society for the Study of Evolution.
Mechanisms mediating parallel action monitoring in fronto-striatal circuits.

PubMed

Beste, Christian; Ness, Vanessa; Lukas, Carsten; Hoffmann, Rainer; Stüwe, Sven; Falkenstein, Michael; Saft, Carsten

2012-08-01

Flexible response adaptation and the control of conflicting information play a pivotal role in daily life. Yet, little is known about the neuronal mechanisms mediating parallel control of these processes. We examined these mechanisms using a multi-methodological approach that integrated data from event-related potentials (ERPs) with structural MRI data and source localisation using sLORETA. Moreover, we calculated evoked wavelet oscillations. We applied this multi-methodological approach in healthy subjects and patients in a prodromal phase of a major basal ganglia disorder (i.e., Huntington's disease), to directly focus on fronto-striatal networks. Behavioural data indicated, especially the parallel execution of conflict monitoring and flexible response adaptation was modulated across the examined cohorts. When both processes do not co-incide a high integrity of fronto-striatal loops seems to be dispensable. The neurophysiological data suggests that conflict monitoring (reflected by the N2 ERP) and working memory processes (reflected by the P3 ERP) differentially contribute to this pattern of results. Flexible response adaptation under the constraint of high conflict processing affected the N2 and P3 ERP, as well as their delta frequency band oscillations. Yet, modulatory effects were strongest for the N2 ERP and evoked wavelet oscillations in this time range. The N2 ERPs were localized in the anterior cingulate cortex (BA32, BA24). Modulations of the P3 ERP were localized in parietal areas (BA7). In addition, MRI-determined caudate head volume predicted modulations in conflict monitoring, but not working memory processes. The results show how parallel conflict monitoring and flexible adaptation of action is mediated via fronto-striatal networks. While both, response monitoring and working memory processes seem to play a role, especially response selection processes and ACC-basal ganglia networks seem to be the driving force in mediating parallel conflict monitoring and flexible adaptation of actions. Copyright © 2012 Elsevier Inc. All rights reserved.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

2004-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for allmore » numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.« less
Numerical Simulation of Shock-Dispersed Fuel Charges

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bell, John B.; Day, Marcus; Beckner, Vincent

Successfully attacking underground storage facilities for chemical and biological (C/B) weapons is an important mission area for the Department of Defense. The fate of a C/B agent during an attack depends critically on the pressure and thermal environment that the agent experiences. The initial environment is determined by the blast wave from an explosive device. The byproducts of the detonation provide a fuel source that burn when mixed with oxidizer (after burning). Additional energy can be released by the ignition of the C/B agent as it mixes with the explosion products and the air in the chamber. Hot plumes ventingmore » material from any openings in the chamber can provide fuel for additional energy release when mixed with additional oxidizer. Assessment of the effectiveness of current explosives as well as the development of new explosive systems requires a detailed understanding of all of these modes of energy release. Using methodologies based on the use of higher-order Godunov schemes combined with Adaptive Mesh Refinement (AMR), implemented in a parallel adaptive framework suited to the massively parallel computer systems provided by the DOD High-Performance Computing Modernization program, we use a suite of programs to develop predictive models for the simulation of the energetics of blast waves, deflagration waves and ejecta plumes. The programs use realistic reaction kinetic and thermodynamic models provided by standard components (such as CHEMKIN) as well as other novel methods to model enhanced explosive devices. The work described here focuses on the validation of these models against a series of bomb calorimetry experiments performed at the Ernst-Mach Institute. In this paper, we present three-dimensional simulations of the experiments, examining the explosion dynamics and the role of subsequent burning on the explosion products on the thermal and pressure environment within the calorimeter. The effects of burning are quantified by comparing two sets of computations, one in which the calorimeter is filled with nitrogen so there is no after burning and a second in which the calorimeter contains air.« less
Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak

1996-01-01

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

NASA Astrophysics Data System (ADS)

Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

2016-07-01

Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
PFC2D simulation of thermally induced cracks in concrete specimens

NASA Astrophysics Data System (ADS)

Liu, Xinghong; Chang, Xiaolin; Zhou, Wei; Li, Shuirong

2013-06-01

The appearance of cracks exposed to severe environmental conditions can be critical for concrete structures. The research is to validate Particle Flow Code(PFC2D) method in the context of concrete thermally-induced cracking simulations. First, concrete was discreted as meso-level units of aggregate, cement mortar and the interfaces between them. Parallel bonded-particle model in PFC2D was adapted to describe the constitutive relation of the cementing material. Then, the concrete mechanics meso-parameters were obtained through several groups of biaxial tests, in order to make the numerical results comply with the law of the indoor test. The concrete thermal meso-parameters were determined by compared with the parameters in the empirical formula through the simulations imposing a constant heat flow to the left margin of concrete specimens. At last, a case of 1000mm×500mm concrete specimen model was analyzed. It simulated the formation and development process of the thermally-induced cracks under the cold waves of different durations and temperature decline. Good agreements in fracture morphology and process were observed between the simulations, previous studies and laboratory data. The temperature decline limits during cold waves were obtained when its tensile strength was given as 3MPa. And it showed the feasibility of using PFC2D to simulate concrete thermally-induced cracking.

Methodology of modeling and measuring computer architectures for plasma simulations

NASA Technical Reports Server (NTRS)

Wang, L. P. T.

1977-01-01

A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
PENTACLE: Parallelized particle-particle particle-tree code for planet formation

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Oshino, Shoichi; Fujii, Michiko S.; Hori, Yasunori

2017-10-01

We have newly developed a parallelized particle-particle particle-tree code for planet formation, PENTACLE, which is a parallelized hybrid N-body integrator executed on a CPU-based (super)computer. PENTACLE uses a fourth-order Hermite algorithm to calculate gravitational interactions between particles within a cut-off radius and a Barnes-Hut tree method for gravity from particles beyond. It also implements an open-source library designed for full automatic parallelization of particle simulations, FDPS (Framework for Developing Particle Simulator), to parallelize a Barnes-Hut tree algorithm for a memory-distributed supercomputer. These allow us to handle 1-10 million particles in a high-resolution N-body simulation on CPU clusters for collisional dynamics, including physical collisions in a planetesimal disc. In this paper, we show the performance and the accuracy of PENTACLE in terms of \\tilde{R}_cut and a time-step Δt. It turns out that the accuracy of a hybrid N-body simulation is controlled through Δ t / \\tilde{R}_cut and Δ t / \\tilde{R}_cut ˜ 0.1 is necessary to simulate accurately the accretion process of a planet for ≥106 yr. For all those interested in large-scale particle simulations, PENTACLE, customized for planet formation, will be freely available from https://github.com/PENTACLE-Team/PENTACLE under the MIT licence.
Numerical characteristics of quantum computer simulation

NASA Astrophysics Data System (ADS)

Chernyavskiy, A.; Khamitov, K.; Teplov, A.; Voevodin, V.; Voevodin, Vl.

2016-12-01

The simulation of quantum circuits is significantly important for the implementation of quantum information technologies. The main difficulty of such modeling is the exponential growth of dimensionality, thus the usage of modern high-performance parallel computations is relevant. As it is well known, arbitrary quantum computation in circuit model can be done by only single- and two-qubit gates, and we analyze the computational structure and properties of the simulation of such gates. We investigate the fact that the unique properties of quantum nature lead to the computational properties of the considered algorithms: the quantum parallelism make the simulation of quantum gates highly parallel, and on the other hand, quantum entanglement leads to the problem of computational locality during simulation. We use the methodology of the AlgoWiki project (algowiki-project.org) to analyze the algorithm. This methodology consists of theoretical (sequential and parallel complexity, macro structure, and visual informational graph) and experimental (locality and memory access, scalability and more specific dynamic characteristics) parts. Experimental part was made by using the petascale Lomonosov supercomputer (Moscow State University, Russia). We show that the simulation of quantum gates is a good base for the research and testing of the development methods for data intense parallel software, and considered methodology of the analysis can be successfully used for the improvement of the algorithms in quantum information science.
Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Design of object-oriented distributed simulation classes

NASA Technical Reports Server (NTRS)

Schoeffler, James D. (Principal Investigator)

1995-01-01

Distributed simulation of aircraft engines as part of a computer aided design package is being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for 'Numerical Propulsion Simulation System'. NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT 'Actor' model of a concurrent object and uses 'connectors' to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Design of Object-Oriented Distributed Simulation Classes

NASA Technical Reports Server (NTRS)

Schoeffler, James D.

1995-01-01

Distributed simulation of aircraft engines as part of a computer aided design package being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for "Numerical Propulsion Simulation System". NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT "Actor" model of a concurrent object and uses "connectors" to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Partitioning and packing mathematical simulation models for calculation on parallel computers

NASA Technical Reports Server (NTRS)

Arpasi, D. J.; Milner, E. J.

1986-01-01

The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

NASA Astrophysics Data System (ADS)

Yi, Hongsuk

2014-03-01

Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
An information-theoretic approach to motor action decoding with a reconfigurable parallel architecture.

PubMed

Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C

2011-01-01

Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.

PubMed

Slażyński, Leszek; Bohte, Sander

2012-01-01

The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
Modified Method of Adaptive Artificial Viscosity for Solution of Gas Dynamics Problems on Parallel Computer Systems

NASA Astrophysics Data System (ADS)

Popov, Igor; Sukov, Sergey

2018-02-01

A modification of the adaptive artificial viscosity (AAV) method is considered. This modification is based on one stage time approximation and is adopted to calculation of gasdynamics problems on unstructured grids with an arbitrary type of grid elements. The proposed numerical method has simplified logic, better performance and parallel efficiency compared to the implementation of the original AAV method. Computer experiments evidence the robustness and convergence of the method to difference solution.
Incentive Compatible Online Scheduling of Malleable Parallel Jobs with Individual Deadlines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carroll, Thomas E.; Grosu, Daniel

2010-09-13

We consider the online scheduling of malleable jobs on parallel systems, such as clusters, symmetric multiprocessing computers, and multi-core processor computers. Malleable jobs is a model of parallel processing in which jobs adapt to the number of processors assigned to them. This model permits the scheduler and resource manager to make more efficient use of the available resources. Each malleable job is characterized by arrival time, deadline, and value. If the job completes by its deadline, the user earns the payoff indicated by the value; otherwise, she earns a payoff of zero. The scheduling objective is to maximize the summore » of the values of the jobs that complete by their associated deadlines. Complicating the matter is that users in the real world are rational and they will attempt to manipulate the scheduler by misreporting their jobs’ parameters if it benefits them to do so. To mitigate this behavior, we design an incentive compatible online scheduling mechanism. Incentive compatibility assures us that the users will obtain the maximum payoff only if they truthfully report their jobs’ parameters to the scheduler. Finally, we simulate and study the mechanism to show the effects of misreports on the cheaters and on the system.« less
A parallel domain decomposition-based implicit method for the Cahn–Hilliard–Cook phase-field equation in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng, Xiang; Yang, Chao; State Key Laboratory of Computer Science, Chinese Academy of Sciences, Beijing 100190

2015-03-15

We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracymore » (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.« less
A new version of Stochastic-parallel-gradient-descent algorithm (SPGD) for phase correction of a distorted orbital angular momentum (OAM) beam

NASA Astrophysics Data System (ADS)

Jiao Ling, LIn; Xiaoli, Yin; Huan, Chang; Xiaozhou, Cui; Yi-Lin, Guo; Huan-Yu, Liao; Chun-YU, Gao; Guohua, Wu; Guang-Yao, Liu; Jin-KUn, Jiang; Qing-Hua, Tian

2018-02-01

Atmospheric turbulence limits the performance of orbital angular momentum-based free-space optical communication (FSO-OAM) system. In order to compensate phase distortion induced by atmospheric turbulence, wavefront sensorless adaptive optics (WSAO) has been proposed and studied in recent years. In this paper a new version of SPGD called MZ-SPGD, which combines the Z-SPGD based on the deformable mirror influence function and the M-SPGD based on the Zernike polynomials, is proposed. Numerical simulations show that the hybrid method decreases convergence times markedly but can achieve the same compensated effect compared to Z-SPGD and M-SPGD.
International Symposium on Numerical Methods in Engineering, 5th, Ecole Polytechnique Federale de Lausanne, Switzerland, Sept. 11-15, 1989, Proceedings. Volumes 1 & 2

NASA Astrophysics Data System (ADS)

Gruber, Ralph; Periaux, Jaques; Shaw, Richard Paul

Recent advances in computational mechanics are discussed in reviews and reports. Topics addressed include spectral superpositions on finite elements for shear banding problems, strain-based finite plasticity, numerical simulation of hypersonic viscous continuum flow, constitutive laws in solid mechanics, dynamics problems, fracture mechanics and damage tolerance, composite plates and shells, contact and friction, metal forming and solidification, coupling problems, and adaptive FEMs. Consideration is given to chemical flows, convection problems, free boundaries and artificial boundary conditions, domain-decomposition and multigrid methods, combustion and thermal analysis, wave propagation, mixed and hybrid FEMs, integral-equation methods, optimization, software engineering, and vector and parallel computing.
GEOS. User Tutorials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Pengchen; Settgast, Randolph R.; Johnson, Scott M.

2014-12-17

GEOS is a massively parallel, multi-physics simulation application utilizing high performance computing (HPC) to address subsurface reservoir stimulation activities with the goal of optimizing current operations and evaluating innovative stimulation methods. GEOS enables coupling of di erent solvers associated with the various physical processes occurring during reservoir stimulation in unique and sophisticated ways, adapted to various geologic settings, materials and stimulation methods. Developed at the Lawrence Livermore National Laboratory (LLNL) as a part of a Laboratory-Directed Research and Development (LDRD) Strategic Initiative (SI) project, GEOS represents the culmination of a multi-year ongoing code development and improvement e ort that hasmore » leveraged existing code capabilities and sta expertise to design new computational geosciences software.« less
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques

DOT National Transportation Integrated Search

1995-01-01

Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

PubMed

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-07-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
On extending parallelism to serial simulators

NASA Technical Reports Server (NTRS)

Nicol, David; Heidelberger, Philip

1994-01-01

This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
A divide-conquer-recombine algorithmic paradigm for large spatiotemporal quantum molecular dynamics simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shimojo, Fuyuki; Hattori, Shinnosuke; Department of Physics, Kumamoto University, Kumamoto 860-8555

We introduce an extension of the divide-and-conquer (DC) algorithmic paradigm called divide-conquer-recombine (DCR) to perform large quantum molecular dynamics (QMD) simulations on massively parallel supercomputers, in which interatomic forces are computed quantum mechanically in the framework of density functional theory (DFT). In DCR, the DC phase constructs globally informed, overlapping local-domain solutions, which in the recombine phase are synthesized into a global solution encompassing large spatiotemporal scales. For the DC phase, we design a lean divide-and-conquer (LDC) DFT algorithm, which significantly reduces the prefactor of the O(N) computational cost for N electrons by applying a density-adaptive boundary condition at themore » peripheries of the DC domains. Our globally scalable and locally efficient solver is based on a hybrid real-reciprocal space approach that combines: (1) a highly scalable real-space multigrid to represent the global charge density; and (2) a numerically efficient plane-wave basis for local electronic wave functions and charge density within each domain. Hybrid space-band decomposition is used to implement the LDC-DFT algorithm on parallel computers. A benchmark test on an IBM Blue Gene/Q computer exhibits an isogranular parallel efficiency of 0.984 on 786 432 cores for a 50.3 × 10{sup 6}-atom SiC system. As a test of production runs, LDC-DFT-based QMD simulation involving 16 661 atoms is performed on the Blue Gene/Q to study on-demand production of hydrogen gas from water using LiAl alloy particles. As an example of the recombine phase, LDC-DFT electronic structures are used as a basis set to describe global photoexcitation dynamics with nonadiabatic QMD (NAQMD) and kinetic Monte Carlo (KMC) methods. The NAQMD simulations are based on the linear response time-dependent density functional theory to describe electronic excited states and a surface-hopping approach to describe transitions between the excited states. A series of techniques are employed for efficiently calculating the long-range exact exchange correction and excited-state forces. The NAQMD trajectories are analyzed to extract the rates of various excitonic processes, which are then used in KMC simulation to study the dynamics of the global exciton flow network. This has allowed the study of large-scale photoexcitation dynamics in 6400-atom amorphous molecular solid, reaching the experimental time scales.« less
The cost of conservative synchronization in parallel discrete event simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1990-01-01

The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Evolutionary algorithm optimization of biological learning parameters in a biomimetic neuroprosthesis

PubMed Central

Dura-Bernal, S.; Neymotin, S. A.; Kerr, C. C.; Sivagnanam, S.; Majumdar, A.; Francis, J. T.; Lytton, W. W.

2017-01-01

Biomimetic simulation permits neuroscientists to better understand the complex neuronal dynamics of the brain. Embedding a biomimetic simulation in a closed-loop neuroprosthesis, which can read and write signals from the brain, will permit applications for amelioration of motor, psychiatric, and memory-related brain disorders. Biomimetic neuroprostheses require real-time adaptation to changes in the external environment, thus constituting an example of a dynamic data-driven application system. As model fidelity increases, so does the number of parameters and the complexity of finding appropriate parameter configurations. Instead of adapting synaptic weights via machine learning, we employed major biological learning methods: spike-timing dependent plasticity and reinforcement learning. We optimized the learning metaparameters using evolutionary algorithms, which were implemented in parallel and which used an island model approach to obtain sufficient speed. We employed these methods to train a cortical spiking model to utilize macaque brain activity, indicating a selected target, to drive a virtual musculoskeletal arm with realistic anatomical and biomechanical properties to reach to that target. The optimized system was able to reproduce macaque data from a comparable experimental motor task. These techniques can be used to efficiently tune the parameters of multiscale systems, linking realistic neuronal dynamics to behavior, and thus providing a useful tool for neuroscience and neuroprosthetics. PMID:29200477
A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant

NASA Astrophysics Data System (ADS)

Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.

2018-04-01

A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
A New Parallel Boundary Condition for Turbulence Simulations in Stellarators

NASA Astrophysics Data System (ADS)

Martin, Mike F.; Landreman, Matt; Dorland, William; Xanthopoulos, Pavlos

2017-10-01

For gyrokinetic simulations of core turbulence, the ``twist-and-shift'' parallel boundary condition (Beer et al., PoP, 1995), which involves a shift in radial wavenumber proportional to the global shear and a quantization of the simulation domain's aspect ratio, is the standard choice. But as this condition was derived under the assumption of axisymmetry, ``twist-and-shift'' as it stands is formally incorrect for turbulence simulations in stellarators. Moreover, for low-shear stellarators like W7X and HSX, the use of a global shear in the traditional boundary condition places an inflexible constraint on the aspect ratio of the domain, requiring more grid points to fully resolve its extent. Here, we present a parallel boundary condition for ``stellarator-symmetric'' simulations that relies on the local shear along a field line. This boundary condition is similar to ``twist-and-shift'', but has an added flexibility in choosing the parallel length of the domain based on local shear consideration in order to optimize certain parameters such as the aspect ratio of the simulation domain.
Genomics of Parallel Ecological Speciation in Lake Victoria Cichlids.

PubMed

Meier, Joana Isabel; Marques, David Alexander; Wagner, Catherine Elise; Excoffier, Laurent; Seehausen, Ole

2018-06-01

The genetic basis of parallel evolution of similar species is of great interest in evolutionary biology. In the adaptive radiation of Lake Victoria cichlid fishes, sister species with either blue or red-back male nuptial coloration have evolved repeatedly, often associated with shallower and deeper water, respectively. One such case is blue and red-backed Pundamilia species, for which we recently showed that a young species pair may have evolved through "hybrid parallel speciation". Coalescent simulations suggested that the older species P. pundamilia (blue) and P. nyererei (red-back) admixed in the Mwanza Gulf and that new "nyererei-like" and "pundamilia-like" species evolved from the admixed population. Here, we use genome scans to study the genomic architecture of differentiation, and assess the influence of hybridization on the evolution of the younger species pair. For each of the two species pairs, we find over 300 genomic regions, widespread across the genome, which are highly differentiated. A subset of the most strongly differentiated regions of the older pair are also differentiated in the younger pair. These shared differentiated regions often show parallel allele frequency differences, consistent with the hypothesis that admixture-derived alleles were targeted by divergent selection in the hybrid population. However, two-thirds of the genomic regions that are highly differentiated between the younger species are not highly differentiated between the older species, suggesting independent evolutionary responses to selection pressures. Our analyses reveal how divergent selection on admixture-derived genetic variation can facilitate new speciation events.
Parallelizing Timed Petri Net simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1993-01-01

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
3D streamers simulation in a pin to plane configuration using massively parallel computing

NASA Astrophysics Data System (ADS)

Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.

2018-03-01

This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
A software platform for continuum modeling of ion channels based on unstructured mesh

NASA Astrophysics Data System (ADS)

Tu, B.; Bai, S. Y.; Chen, M. X.; Xie, Y.; Zhang, L. B.; Lu, B. Z.

2014-01-01

Most traditional continuum molecular modeling adopted finite difference or finite volume methods which were based on a structured mesh (grid). Unstructured meshes were only occasionally used, but an increased number of applications emerge in molecular simulations. To facilitate the continuum modeling of biomolecular systems based on unstructured meshes, we are developing a software platform with tools which are particularly beneficial to those approaches. This work describes the software system specifically for the simulation of a typical, complex molecular procedure: ion transport through a three-dimensional channel system that consists of a protein and a membrane. The platform contains three parts: a meshing tool chain for ion channel systems, a parallel finite element solver for the Poisson-Nernst-Planck equations describing the electrodiffusion process of ion transport, and a visualization program for continuum molecular modeling. The meshing tool chain in the platform, which consists of a set of mesh generation tools, is able to generate high-quality surface and volume meshes for ion channel systems. The parallel finite element solver in our platform is based on the parallel adaptive finite element package PHG which wass developed by one of the authors [1]. As a featured component of the platform, a new visualization program, VCMM, has specifically been developed for continuum molecular modeling with an emphasis on providing useful facilities for unstructured mesh-based methods and for their output analysis and visualization. VCMM provides a graphic user interface and consists of three modules: a molecular module, a meshing module and a numerical module. A demonstration of the platform is provided with a study of two real proteins, the connexin 26 and hemolysin ion channels.
Porting LAMMPS to GPUs.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, William Michael; Plimpton, Steven James; Wang, Peng

2010-03-01

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Research in parallel computing

NASA Technical Reports Server (NTRS)

Ortega, James M.; Henderson, Charles

1994-01-01

This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

NASA Technical Reports Server (NTRS)

Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

1992-01-01

A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

NASA Astrophysics Data System (ADS)

Furuichi, M.; Nishiura, D.

2015-12-01

Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
A tool for simulating parallel branch-and-bound methods

NASA Astrophysics Data System (ADS)

Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

2016-01-01

The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Parallel replica dynamics with a heterogeneous distribution of barriers: Application to n-hexadecane pyrolysis

NASA Astrophysics Data System (ADS)

Kum, Oyeon; Dickson, Brad M.; Stuart, Steven J.; Uberuaga, Blas P.; Voter, Arthur F.

2004-11-01

Parallel replica dynamics simulation methods appropriate for the simulation of chemical reactions in molecular systems with many conformational degrees of freedom have been developed and applied to study the microsecond-scale pyrolysis of n-hexadecane in the temperature range of 2100-2500 K. The algorithm uses a transition detection scheme that is based on molecular topology, rather than energetic basins. This algorithm allows efficient parallelization of small systems even when using more processors than particles (in contrast to more traditional parallelization algorithms), and even when there are frequent conformational transitions (in contrast to previous implementations of the parallel replica algorithm). The parallel efficiency for pyrolysis initiation reactions was over 90% on 61 processors for this 50-atom system. The parallel replica dynamics technique results in reaction probabilities that are statistically indistinguishable from those obtained from direct molecular dynamics, under conditions where both are feasible, but allows simulations at temperatures as much as 1000 K lower than direct molecular dynamics simulations. The rate of initiation displayed Arrhenius behavior over the entire temperature range, with an activation energy and frequency factor of Ea=79.7 kcal/mol and log A/s-1=14.8, respectively, in reasonable agreement with experiment and empirical kinetic models. Several interesting unimolecular reaction mechanisms were observed in simulations of the chain propagation reactions above 2000 K, which are not included in most coarse-grained kinetic models. More studies are needed in order to determine whether these mechanisms are experimentally relevant, or specific to the potential energy surface used.

Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

NASA Technical Reports Server (NTRS)

Jones, W. H.

1983-01-01

The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomura, K; Seymour, R; Wang, W

2009-02-17

A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Finite Element Methods for real-time Haptic Feedback of Soft-Tissue Models in Virtual Reality Simulators

NASA Technical Reports Server (NTRS)

Frank, Andreas O.; Twombly, I. Alexander; Barth, Timothy J.; Smith, Jeffrey D.; Dalton, Bonnie P. (Technical Monitor)

2001-01-01

We have applied the linear elastic finite element method to compute haptic force feedback and domain deformations of soft tissue models for use in virtual reality simulators. Our results show that, for virtual object models of high-resolution 3D data (>10,000 nodes), haptic real time computations (>500 Hz) are not currently possible using traditional methods. Current research efforts are focused in the following areas: 1) efficient implementation of fully adaptive multi-resolution methods and 2) multi-resolution methods with specialized basis functions to capture the singularity at the haptic interface (point loading). To achieve real time computations, we propose parallel processing of a Jacobi preconditioned conjugate gradient method applied to a reduced system of equations resulting from surface domain decomposition. This can effectively be achieved using reconfigurable computing systems such as field programmable gate arrays (FPGA), thereby providing a flexible solution that allows for new FPGA implementations as improved algorithms become available. The resulting soft tissue simulation system would meet NASA Virtual Glovebox requirements and, at the same time, provide a generalized simulation engine for any immersive environment application, such as biomedical/surgical procedures or interactive scientific applications.
Shock interaction behind a pair of cylindrical obstacles

NASA Astrophysics Data System (ADS)

Liu, Heng; Mazumdar, Raoul; Eliasson, Veronica

2014-11-01

The body of work focuses on two-dimensional numerical simulations of shock interaction with a pair of cylindrical obstacles, varying the obstacle separation and incident shock strength. With the shock waves propagating parallel to the center-line between the two cylindrical obstacles, the shock strengths simulated vary from a Mach of 1.4 to a Mach of 2.4, against a wide range of obstacle separation distance to their diameters. These cases are simulated via a software package called Overture, which is used to solve the inviscid Euler equations of gas dynamics on overlapping grids with adaptive mesh refinement. The goal of these cases is to find a so-called ``safe'' region for obstacle spacing and varying shock Mach numbers, such that the pressure in the ``safe'' region is reduced downstream of the obstacles. The benefits apply to both building and armor design for the purpose of shock wave mitigation to keep humans and equipment safe. The results obtained from the simulations confirm that the length of the ``safe'' region and the degree of shock wave attenuation depend on the ratio of obstacle separation distance to obstacle diameter. The influence of various Mach number is also discussed.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A

2012-01-01

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM

NASA Astrophysics Data System (ADS)

de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl

2002-03-01

We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov; Weare, Jonathan Q., E-mail: weare@uchicago.edu; Weare, John H., E-mail: jweare@ucsd.edu

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for themore » trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, William D; Johansen, Hans; Evans, Katherine J

We present a survey of physical and computational techniques that have the potential to con- tribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enabling improved accuracy andmore » fidelity in simulation of dynamics and allow more complete representations of climate features at the global scale. At the same time, part- nerships with computer science teams have focused on taking advantage of evolving computer architectures, such as many-core processors and GPUs, so that these approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
Molecular dynamics study of mechanical properties of carbon nanotube reinforced aluminum composites

NASA Astrophysics Data System (ADS)

Srivastava, Ashish Kumar; Mokhalingam, A.; Singh, Akhileshwar; Kumar, Dinesh

2016-05-01

Atomistic simulations were conducted to estimate the effect of the carbon nanotube (CNT) reinforcement on the mechanical behavior of CNT-reinforced aluminum (Al) nanocomposite. The periodic system of CNT-Al nanocomposite was built and simulated using molecular dynamics (MD) software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator). The mechanical properties of the nanocomposite were investigated by the application of uniaxial load on one end of the representative volume element (RVE) and fixing the other end. The interactions between the atoms of Al were modeled using embedded atom method (EAM) potentials, whereas Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) potential was used for the interactions among carbon atoms and these pair potentials are coupled with the Lennard-Jones (LJ) potential. The results show that the incorporation of CNT into the Al matrix can increase the Young's modulus of the nanocomposite substantially. In the present case, i.e. for approximately 9 with % reinforcement of CNT can increase the axial Young's modulus of the Al matrix up to 77 % as compared to pure Al.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using amore » variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.« less
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

PubMed

Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
Computer-intensive simulation of solid-state NMR experiments using SIMPSON.

PubMed

Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas

2014-09-01

Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.
Adaptive Core Simulation Employing Discrete Inverse Theory - Part I: Theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdel-Khalik, Hany S.; Turinsky, Paul J.

2005-07-15

Use of adaptive simulation is intended to improve the fidelity and robustness of important core attribute predictions such as core power distribution, thermal margins, and core reactivity. Adaptive simulation utilizes a selected set of past and current reactor measurements of reactor observables, i.e., in-core instrumentation readings, to adapt the simulation in a meaningful way. A meaningful adaption will result in high-fidelity and robust adapted core simulator models. To perform adaption, we propose an inverse theory approach in which the multitudes of input data to core simulators, i.e., reactor physics and thermal-hydraulic data, are to be adjusted to improve agreement withmore » measured observables while keeping core simulator models unadapted. At first glance, devising such adaption for typical core simulators with millions of input and observables data would spawn not only several prohibitive challenges but also numerous disparaging concerns. The challenges include the computational burdens of the sensitivity-type calculations required to construct Jacobian operators for the core simulator models. Also, the computational burdens of the uncertainty-type calculations required to estimate the uncertainty information of core simulator input data present a demanding challenge. The concerns however are mainly related to the reliability of the adjusted input data. The methodologies of adaptive simulation are well established in the literature of data adjustment. We adopt the same general framework for data adjustment; however, we refrain from solving the fundamental adjustment equations in a conventional manner. We demonstrate the use of our so-called Efficient Subspace Methods (ESMs) to overcome the computational and storage burdens associated with the core adaption problem. We illustrate the successful use of ESM-based adaptive techniques for a typical boiling water reactor core simulator adaption problem.« less
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

NASA Astrophysics Data System (ADS)

Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

2017-11-01

In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations

PubMed Central

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-01-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
Mechanisms for Rapid Adaptive Control of Motion Processing in Macaque Visual Cortex.

PubMed

McLelland, Douglas; Baker, Pamela M; Ahmed, Bashir; Kohn, Adam; Bair, Wyeth

2015-07-15

A key feature of neural networks is their ability to rapidly adjust their function, including signal gain and temporal dynamics, in response to changes in sensory inputs. These adjustments are thought to be important for optimizing the sensitivity of the system, yet their mechanisms remain poorly understood. We studied adaptive changes in temporal integration in direction-selective cells in macaque primary visual cortex, where specific hypotheses have been proposed to account for rapid adaptation. By independently stimulating direction-specific channels, we found that the control of temporal integration of motion at one direction was independent of motion signals driven at the orthogonal direction. We also found that individual neurons can simultaneously support two different profiles of temporal integration for motion in orthogonal directions. These findings rule out a broad range of adaptive mechanisms as being key to the control of temporal integration, including untuned normalization and nonlinearities of spike generation and somatic adaptation in the recorded direction-selective cells. Such mechanisms are too broadly tuned, or occur too far downstream, to explain the channel-specific and multiplexed temporal integration that we observe in single neurons. Instead, we are compelled to conclude that parallel processing pathways are involved, and we demonstrate one such circuit using a computer model. This solution allows processing in different direction/orientation channels to be separately optimized and is sensible given that, under typical motion conditions (e.g., translation or looming), speed on the retina is a function of the orientation of image components. Many neurons in visual cortex are understood in terms of their spatial and temporal receptive fields. It is now known that the spatiotemporal integration underlying visual responses is not fixed but depends on the visual input. For example, neurons that respond selectively to motion direction integrate signals over a shorter time window when visual motion is fast and a longer window when motion is slow. We investigated the mechanisms underlying this useful adaptation by recording from neurons as they responded to stimuli moving in two different directions at different speeds. Computer simulations of our results enabled us to rule out several candidate theories in favor of a model that integrates across multiple parallel channels that operate at different time scales. Copyright © 2015 the authors 0270-6474/15/3510268-13$15.00/0.
Detached eddy simulation for turbulent fluid-structure interaction of moving bodies using the constraint-based immersed boundary method

NASA Astrophysics Data System (ADS)

Nangia, Nishant; Bhalla, Amneet P. S.; Griffith, Boyce E.; Patankar, Neelesh A.

2016-11-01

Flows over bodies of industrial importance often contain both an attached boundary layer region near the structure and a region of massively separated flow near its trailing edge. When simulating these flows with turbulence modeling, the Reynolds-averaged Navier-Stokes (RANS) approach is more efficient in the former, whereas large-eddy simulation (LES) is more accurate in the latter. Detached-eddy simulation (DES), based on the Spalart-Allmaras model, is a hybrid method that switches from RANS mode of solution in attached boundary layers to LES in detached flow regions. Simulations of turbulent flows over moving structures on a body-fitted mesh incur an enormous remeshing cost every time step. The constraint-based immersed boundary (cIB) method eliminates this operation by placing the structure on a Cartesian mesh and enforcing a rigidity constraint as an additional forcing in the Navier-Stokes momentum equation. We outline the formulation and development of a parallel DES-cIB method using adaptive mesh refinement. We show preliminary validation results for flows past stationary bodies with both attached and separated boundary layers along with results for turbulent flows past moving bodies. This work is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1324585.
Application of integration algorithms in a parallel processing environment for the simulation of jet engines

NASA Technical Reports Server (NTRS)

Krosel, S. M.; Milner, E. J.

1982-01-01

The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

PubMed

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Three dimensional modelling of earthquake rupture cycles on frictional faults

NASA Astrophysics Data System (ADS)

Simpson, Guy; May, Dave

2017-04-01

We are developing an efficient MPI-parallel numerical method to simulate earthquake sequences on preexisting faults embedding within a three dimensional viscoelastic half-space. We solve the velocity form of the elasto(visco)dynamic equations using a continuous Galerkin Finite Element Method on an unstructured pentahedral mesh, which thus permits local spatial refinement in the vicinity of the fault. Friction sliding is coupled to the viscoelastic solid via rate- and state-dependent friction laws using the split-node technique. Our coupled formulation employs a picard-type non-linear solver with a fully implicit, first order accurate time integrator that utilises an adaptive time step that efficiently evolves the system through multiple seismic cycles. The implementation leverages advanced parallel solvers, preconditioners and linear algebra from the Portable Extensible Toolkit for Scientific Computing (PETSc) library. The model can treat heterogeneous frictional properties and stress states on the fault and surrounding solid as well as non-planar fault geometries. Preliminary tests show that the model successfully reproduces dynamic rupture on a vertical strike-slip fault in a half-space governed by rate-state friction with the ageing law.
Response of the Antarctic ice sheet to ocean forcing using the POPSICLES coupled ice sheet-ocean model

NASA Astrophysics Data System (ADS)

Martin, D. F.; Asay-Davis, X.; Price, S. F.; Cornford, S. L.; Maltrud, M. E.; Ng, E. G.; Collins, W.

2014-12-01

We present the response of the continental Antarctic ice sheet to sub-shelf-melt forcing derived from POPSICLES simulation results covering the full Antarctic Ice Sheet and the Southern Ocean spanning the period 1990 to 2010. Simulations are performed at 0.1 degree (~5 km) ocean resolution and ice sheet resolution as fine as 500 m using adaptive mesh refinement. A comparison of fully-coupled and comparable standalone ice-sheet model results demonstrates the importance of two-way coupling between the ice sheet and the ocean. The POPSICLES model couples the POP2x ocean model, a modified version of the Parallel Ocean Program (Smith and Gent, 2002), and the BISICLES ice-sheet model (Cornford et al., 2012). BISICLES makes use of adaptive mesh refinement to fully resolve dynamically-important regions like grounding lines and employs a momentum balance similar to the vertically-integrated formulation of Schoof and Hindmarsh (2009). Results of BISICLES simulations have compared favorably to comparable simulations with a Stokes momentum balance in both idealized tests like MISMIP3D (Pattyn et al., 2013) and realistic configurations (Favier et al. 2014). POP2x includes sub-ice-shelf circulation using partial top cells (Losch, 2008) and boundary layer physics following Holland and Jenkins (1999), Jenkins (2001), and Jenkins et al. (2010). Standalone POP2x output compares well with standard ice-ocean test cases (e.g., ISOMIP; Losch, 2008) and other continental-scale simulations and melt-rate observations (Kimura et al., 2013; Rignot et al., 2013). A companion presentation, "Present-day circum-Antarctic simulations using the POPSICLES coupled land ice-ocean model" in session C027 describes the ocean-model perspective of this work, while we focus on the response of the ice sheet and on details of the model. The figure shows the BISICLES-computed vertically-integrated ice velocity field about 1 month into a 20-year coupled Antarctic run. Groundling lines are shown in green.
Decadal-Scale Response of the Antarctic Ice sheet to a Warming Ocean using the POPSICLES Coupled Ice Sheet-Ocean model

NASA Astrophysics Data System (ADS)

Martin, D. F.; Asay-Davis, X.; Cornford, S. L.; Price, S. F.; Ng, E. G.; Collins, W.

2015-12-01

We present POPSICLES simulation results covering the full Antarctic Ice Sheet and the Southern Ocean spanning the period from 1990 to 2010. We use the CORE v. 2 interannual forcing data to force the ocean model. Simulations are performed at 0.1o(~5 km) ocean resolution with adaptive ice sheet resolution as fine as 500 m to adequately resolve the grounding line dynamics. We discuss the effect of improved ocean mixing and subshelf bathymetry (vs. the standard Bedmap2 bathymetry) on the behavior of the coupled system, comparing time-averaged melt rates below a number of major ice shelves with those reported in the literature. We also present seasonal variability and decadal melting trends from several Antarctic regions, along with the response of the ice shelves and the consequent dynamic response of the grounded ice sheet.POPSICLES couples the POP2x ocean model, a modified version of the Parallel Ocean Program, and the BISICLES ice-sheet model. POP2x includes sub-ice-shelf circulation using partial top cells and the commonly used three-equation boundary layer physics. Standalone POP2x output compares well with standard ice-ocean test cases (e.g., ISOMIP) and other continental-scale simulations and melt-rate observations. BISICLES makes use of adaptive mesh refinement and a 1st-order accurate momentum balance similar to the L1L2 model of Schoof and Hindmarsh to accurately model regions of dynamic complexity, such as ice streams, outlet glaciers, and grounding lines. Results of BISICLES simulations have compared favorably to comparable simulations with a Stokes momentum balance in both idealized tests (MISMIP-3d) and realistic configurations.The figure shows the BISICLES-computed vertically-integrated grounded ice velocity field 5 years into a 20-year coupled full-continent Antarctic-Southern-Ocean simulation. Submarine melt rates are painted onto the surface of the floating ice shelves. Grounding lines are shown in green.
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

NASA Technical Reports Server (NTRS)

Sawyer, Darren Charles

1994-01-01

The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
Automated three-component synthesis of a library of γ-lactams

PubMed Central

Fenster, Erik; Hill, David; Reiser, Oliver

2012-01-01

Summary A three-component method for the synthesis of γ-lactams from commercially available maleimides, aldehydes, and amines was adapted to parallel library synthesis. Improvements to the chemistry over previous efforts include the optimization of the method to a one-pot process, the management of by-products and excess reagents, the development of an automated parallel sequence, and the adaption of the method to permit the preparation of enantiomerically enriched products. These efforts culminated in the preparation of a library of 169 γ-lactams. PMID:23209515
LMC: Logarithmantic Monte Carlo

NASA Astrophysics Data System (ADS)

Mantz, Adam B.

2017-06-01

LMC is a Markov Chain Monte Carlo engine in Python that implements adaptive Metropolis-Hastings and slice sampling, as well as the affine-invariant method of Goodman & Weare, in a flexible framework. It can be used for simple problems, but the main use case is problems where expensive likelihood evaluations are provided by less flexible third-party software, which benefit from parallelization across many nodes at the sampling level. The parallel/adaptive methods use communication through MPI, or alternatively by writing/reading files, and mostly follow the approaches pioneered by CosmoMC (ascl:1106.025).
Unstructured grids on SIMD torus machines

NASA Technical Reports Server (NTRS)

Bjorstad, Petter E.; Schreiber, Robert

1994-01-01

Unstructured grids lead to unstructured communication on distributed memory parallel computers, a problem that has been considered difficult. Here, we consider adaptive, offline communication routing for a SIMD processor grid. Our approach is empirical. We use large data sets drawn from supercomputing applications instead of an analytic model of communication load. The chief contribution of this paper is an experimental demonstration of the effectiveness of certain routing heuristics. Our routing algorithm is adaptive, nonminimal, and is generally designed to exploit locality. We have a parallel implementation of the router, and we report on its performance.
Independent Axes of Genetic Variation and Parallel Evolutionary Divergence Of Opercle Bone Shape in Threespine Stickleback

PubMed Central

Kimmel, Charles B.; Cresko, William A.; Phillips, Patrick C.; Ullmann, Bonnie; Currey, Mark; von Hippel, Frank; Kristjánsson, Bjarni K.; Gelmond, Ofer; McGuigan, Katrina

2014-01-01

Evolution of similar phenotypes in independent populations is often taken as evidence of adaptation to the same fitness optimum. However, the genetic architecture of traits might cause evolution to proceed more often toward particular phenotypes, and less often toward others, independently of the adaptive value of the traits. Freshwater populations of Alaskan threespine stickleback have repeatedly evolved the same distinctive opercle shape after divergence from an oceanic ancestor. Here we demonstrate that this pattern of parallel evolution is widespread, distinguishing oceanic and freshwater populations across the Pacific Coast of North America and Iceland. We test whether this parallel evolution reflects genetic bias by estimating the additive genetic variance– covariance matrix (G) of opercle shape in an Alaskan oceanic (putative ancestral) population. We find significant additive genetic variance for opercle shape and that G has the potential to be biasing, because of the existence of regions of phenotypic space with low additive genetic variation. However, evolution did not occur along major eigenvectors of G, rather it occurred repeatedly in the same directions of high evolvability. We conclude that the parallel opercle evolution is most likely due to selection during adaptation to freshwater habitats, rather than due to biasing effects of opercle genetic architecture. PMID:22276538
Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Simulations of the Molecular Crystal alphaRDX

DTIC Science & Technology

2013-08-01

potential for HMX / RDX (3, 9). ...................................................................................8 1 1. Purpose This work...6 dispersion and electrostatic interactions. Constants for the SB potential are given in table 1. 8 Table 1. SB potential for HMX / RDX (3, 9...modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular
Constitutive Model Calibration via Autonomous Multiaxial Experimentation (Postprint)

DTIC Science & Technology

2016-09-17

test machine. Experimental data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain...data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain conditions. Optimization methods...be used directly in finite element simulations of more complex geometries. Keywords Axial/torsional experimentation • Plasticity • Constitutive model
Parallel Implementation of the Discontinuous Galerkin Method

NASA Technical Reports Server (NTRS)

Baggag, Abdalkader; Atkins, Harold; Keyes, David

1999-01-01

This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses

NASA Technical Reports Server (NTRS)

Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Carpet: Adaptive Mesh Refinement for the Cactus Framework

NASA Astrophysics Data System (ADS)

Schnetter, Erik; Hawley, Scott; Hawke, Ian

2016-11-01

Carpet is an adaptive mesh refinement and multi-patch driver for the Cactus Framework (ascl:1102.013). Cactus is a software framework for solving time-dependent partial differential equations on block-structured grids, and Carpet acts as driver layer providing adaptive mesh refinement, multi-patch capability, as well as parallelization and efficient I/O.
Providing a parallel and distributed capability for JMASS using SPEEDES

NASA Astrophysics Data System (ADS)

Valinski, Maria; Driscoll, Jonathan; McGraw, Robert M.; Meyer, Bob

2002-07-01

The Joint Modeling And Simulation System (JMASS) is a Tri-Service simulation environment that supports engineering and engagement-level simulations. As JMASS is expanded to support other Tri-Service domains, the current set of modeling services must be expanded for High Performance Computing (HPC) applications by adding support for advanced time-management algorithms, parallel and distributed topologies, and high speed communications. By providing support for these services, JMASS can better address modeling domains requiring parallel computationally intense calculations such clutter, vulnerability and lethality calculations, and underwater-based scenarios. A risk reduction effort implementing some HPC services for JMASS using the SPEEDES (Synchronous Parallel Environment for Emulation and Discrete Event Simulation) Simulation Framework has recently concluded. As an artifact of the JMASS-SPEEDES integration, not only can HPC functionality be brought to the JMASS program through SPEEDES, but an additional HLA-based capability can be demonstrated that further addresses interoperability issues. The JMASS-SPEEDES integration provided a means of adding HLA capability to preexisting JMASS scenarios through an implementation of the standard JMASS port communication mechanism that allows players to communicate.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
CUBE: Information-optimized parallel cosmological N-body simulation code

NASA Astrophysics Data System (ADS)

Yu, Hao-Ran; Pen, Ue-Li; Wang, Xin

2018-05-01

CUBE, written in Coarray Fortran, is a particle-mesh based parallel cosmological N-body simulation code. The memory usage of CUBE can approach as low as 6 bytes per particle. Particle pairwise (PP) force, cosmological neutrinos, spherical overdensity (SO) halofinder are included.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu

2007-01-01

An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
Adaptive Nulling for the Terrestrial Planet Finder Interferometer

NASA Technical Reports Server (NTRS)

Peters, Robert D.; Lay, Oliver P.; Jeganathan, Muthu; Hirai, Akiko

2006-01-01

A description of adaptive nulling for Terrestrial Planet Finder Interferometer (TPFI) is presented. The topics include: 1) Nulling in TPF-I; 2) Why Do Adaptive Nulling; 3) Parallel High-Order Compensator Design; 4) Phase and Amplitude Control; 5) Development Activates; 6) Requirements; 7) Simplified Experimental Setup; 8) Intensity Correction; and 9) Intensity Dispersion Stability. A short summary is also given on adaptive nulling for the TPFI.
Contrast adaptation in cat visual cortex is not mediated by GABA.

PubMed

DeBruyn, E J; Bonds, A B

1986-09-24

The possible involvement of gamma-aminobutyric acid (GABA) in contrast adaptation in single cells in area 17 of the cat was investigated. Iontophoretic application of N-methyl bicuculline increased cell responses, but had no effect on the magnitude of adaptation. These results suggest that contrast adaptation is the result of inhibition through a parallel pathway, but that GABA does not mediate this process.

PLUM: Parallel Load Balancing for Unstructured Adaptive Meshes. Degree awarded by Colorado Univ.

NASA Technical Reports Server (NTRS)

Oliker, Leonid

1998-01-01

Dynamic mesh adaption on unstructured grids is a powerful tool for computing large-scale problems that require grid modifications to efficiently resolve solution features. By locally refining and coarsening the mesh to capture physical phenomena of interest, such procedures make standard computational methods more cost effective. Unfortunately, an efficient parallel implementation of these adaptive methods is rather difficult to achieve, primarily due to the load imbalance created by the dynamically-changing nonuniform grid. This requires significant communication at runtime, leading to idle processors and adversely affecting the total execution time. Nonetheless, it is generally thought that unstructured adaptive- grid techniques will constitute a significant fraction of future high-performance supercomputing. Various dynamic load balancing methods have been reported to date; however, most of them either lack a global view of loads across processors or do not apply their techniques to realistic large-scale applications.
Asynchronous multilevel adaptive methods for solving partial differential equations on multiprocessors - Performance results

NASA Technical Reports Server (NTRS)

Mccormick, S.; Quinlan, D.

1989-01-01

The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids (global and local) to provide adaptive resolution and fast solution of PDEs. Like all such methods, it offers parallelism by using possibly many disconnected patches per level, but is hindered by the need to handle these levels sequentially. The finest levels must therefore wait for processing to be essentially completed on all the coarser ones. A recently developed asynchronous version of FAC, called AFAC, completely eliminates this bottleneck to parallelism. This paper describes timing results for AFAC, coupled with a simple load balancing scheme, applied to the solution of elliptic PDEs on an Intel iPSC hypercube. These tests include performance of certain processes necessary in adaptive methods, including moving grids and changing refinement. A companion paper reports on numerical and analytical results for estimating convergence factors of AFAC applied to very large scale examples.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

PubMed

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Acceleration of the matrix multiplication of Radiance three phase daylighting simulations with parallel computing on heterogeneous hardware of personal computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2013-05-23

Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach wasmore » evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.« less
Using parallel computing for the display and simulation of the space debris environment

NASA Astrophysics Data System (ADS)

Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.

2011-07-01

Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment

NASA Astrophysics Data System (ADS)

Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter

Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Xyce Parallel Electronic Simulator Users' Guide Version 6.7.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one tomore » develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. All other trademarks are property of their respective owners. Contacts World Wide Web http://xyce.sandia.gov https://info.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only) Bug Reports (Sandia only) http://joseki-vm.sandia.gov/bugzilla http://morannon.sandia.gov/bugzilla« less
A Model for Speedup of Parallel Programs

DTIC Science & Technology

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration

NASA Astrophysics Data System (ADS)

Zhang, Y.; Key, K.; Ovall, J.; Holst, M.

2014-12-01

We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented adaptive refinement code named MARE2DEM. We demonstrate the performance and parallel scaling of this algorithm on a medium-scale computing cluster with a marine controlled-source EM example that includes a 3D array of receivers located over a 3D model that includes significant seafloor bathymetry variations and a heterogeneous subsurface.
Single-shot digital holography by use of the fractional Talbot effect.

PubMed

Martínez-León, Lluís; Araiza-E, María; Javidi, Bahram; Andrés, Pedro; Climent, Vicent; Lancis, Jesús; Tajahuerce, Enrique

2009-07-20

We present a method for recording in-line single-shot digital holograms based on the fractional Talbot effect. In our system, an image sensor records the interference between the light field scattered by the object and a properly codified parallel reference beam. A simple binary two-dimensional periodic grating is used to codify the reference beam generating a periodic three-step phase distribution over the sensor plane by fractional Talbot effect. This provides a method to perform single-shot phase-shifting interferometry at frame rates only limited by the sensor capabilities. Our technique is well adapted for dynamic wavefront sensing applications. Images of the object are digitally reconstructed from the digital hologram. Both computer simulations and experimental results are presented.
The dynamics of plate tectonics and mantle flow: from local to global scales.

PubMed

Stadler, Georg; Gurnis, Michael; Burstedde, Carsten; Wilcox, Lucas C; Alisic, Laura; Ghattas, Omar

2010-08-27

Plate tectonics is regulated by driving and resisting forces concentrated at plate boundaries, but observationally constrained high-resolution models of global mantle flow remain a computational challenge. We capitalized on advances in adaptive mesh refinement algorithms on parallel computers to simulate global mantle flow by incorporating plate motions, with individual plate margins resolved down to a scale of 1 kilometer. Back-arc extension and slab rollback are emergent consequences of slab descent in the upper mantle. Cold thermal anomalies within the lower mantle couple into oceanic plates through narrow high-viscosity slabs, altering the velocity of oceanic plates. Viscous dissipation within the bending lithosphere at trenches amounts to approximately 5 to 20% of the total dissipation through the entire lithosphere and mantle.
A revised model of fluid transport optimization in Physarum polycephalum.

PubMed

Bonifaci, Vincenzo

2017-02-01

Optimization of fluid transport in the slime mold Physarum polycephalum has been the subject of several modeling efforts in recent literature. Existing models assume that the tube adaptation mechanism in P. polycephalum's tubular network is controlled by the sheer amount of fluid flow through the tubes. We put forward the hypothesis that the controlling variable may instead be the flow's pressure gradient along the tube. We carry out the stability analysis of such a revised mathematical model for a parallel-edge network, proving that the revised model supports the global flow-optimizing behavior of the slime mold for a substantially wider class of response functions compared to previous models. Simulations also suggest that the same conclusion may be valid for arbitrary network topologies.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Besse, Nicolas; Latu, Guillaume; Ghizzo, Alain

In this paper we present a new method for the numerical solution of the relativistic Vlasov-Maxwell system on a phase-space grid using an adaptive semi-Lagrangian method. The adaptivity is performed through a wavelet multiresolution analysis, which gives a powerful and natural refinement criterion based on the local measurement of the approximation error and regularity of the distribution function. Therefore, the multiscale expansion of the distribution function allows to get a sparse representation of the data and thus save memory space and CPU time. We apply this numerical scheme to reduced Vlasov-Maxwell systems arising in laser-plasma physics. Interaction of relativistically strongmore » laser pulses with overdense plasma slabs is investigated. These Vlasov simulations revealed a rich variety of phenomena associated with the fast particle dynamics induced by electromagnetic waves as electron trapping, particle acceleration, and electron plasma wavebreaking. However, the wavelet based adaptive method that we developed here, does not yield significant improvements compared to Vlasov solvers on a uniform mesh due to the substantial overhead that the method introduces. Nonetheless they might be a first step towards more efficient adaptive solvers based on different ideas for the grid refinement or on a more efficient implementation. Here the Vlasov simulations are performed in a two-dimensional phase-space where the development of thin filaments, strongly amplified by relativistic effects requires an important increase of the total number of points of the phase-space grid as they get finer as time goes on. The adaptive method could be more useful in cases where these thin filaments that need to be resolved are a very small fraction of the hyper-volume, which arises in higher dimensions because of the surface-to-volume scaling and the essentially one-dimensional structure of the filaments. Moreover, the main way to improve the efficiency of the adaptive method is to increase the local character in phase-space of the numerical scheme, by considering multiscale reconstruction with more compact support and by replacing the semi-Lagrangian method with more local - in space - numerical scheme as compact finite difference schemes, discontinuous-Galerkin method or finite element residual schemes which are well suited for parallel domain decomposition techniques.« less
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†

PubMed Central

Khan, Md. Ashfaquzzaman; Herbordt, Martin C.

2011-01-01

Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.

PubMed

Khan, Md Ashfaquzzaman; Herbordt, Martin C

2011-07-20

Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

NASA Technical Reports Server (NTRS)

Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

1989-01-01

Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
A parallel implementation of an off-lattice individual-based model of multicellular populations

NASA Astrophysics Data System (ADS)

Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe

2015-07-01

As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
Adaptive Core Simulation Employing Discrete Inverse Theory - Part II: Numerical Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdel-Khalik, Hany S.; Turinsky, Paul J.

2005-07-15

Use of adaptive simulation is intended to improve the fidelity and robustness of important core attribute predictions such as core power distribution, thermal margins, and core reactivity. Adaptive simulation utilizes a selected set of past and current reactor measurements of reactor observables, i.e., in-core instrumentation readings, to adapt the simulation in a meaningful way. The companion paper, ''Adaptive Core Simulation Employing Discrete Inverse Theory - Part I: Theory,'' describes in detail the theoretical background of the proposed adaptive techniques. This paper, Part II, demonstrates several computational experiments conducted to assess the fidelity and robustness of the proposed techniques. The intentmore » is to check the ability of the adapted core simulator model to predict future core observables that are not included in the adaption or core observables that are recorded at core conditions that differ from those at which adaption is completed. Also, this paper demonstrates successful utilization of an efficient sensitivity analysis approach to calculate the sensitivity information required to perform the adaption for millions of input core parameters. Finally, this paper illustrates a useful application for adaptive simulation - reducing the inconsistencies between two different core simulator code systems, where the multitudes of input data to one code are adjusted to enhance the agreement between both codes for important core attributes, i.e., core reactivity and power distribution. Also demonstrated is the robustness of such an application.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.