Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Direct simulation Monte Carlo analysis on parallel processors
NASA Technical Reports Server (NTRS)
Wilmoth, Richard G.
1989-01-01
A method is presented for executing a direct simulation Monte Carlo (DSMC) analysis using parallel processing. The method is based on using domain decomposition to distribute the work load among multiple processors, and the DSMC analysis is performed completely in parallel. Message passing is used to transfer molecules between processors and to provide the synchronization necessary for the correct physical simulation. Benchmark problems are described for testing the method and results are presented which demonstrate the performance on two commercially available multicomputers. The results show that reasonable parallel speedup and efficiency can be obtained if the problem is properly sized to the number of processors. It is projected that with a massively parallel system, performance exceeding that of current supercomputers is possible.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Parallel Performance Optimization of the Direct Simulation Monte Carlo Method
NASA Astrophysics Data System (ADS)
Gao, Da; Zhang, Chonglin; Schwartzentruber, Thomas
2009-11-01
Although the direct simulation Monte Carlo (DSMC) particle method is more computationally intensive compared to continuum methods, it is accurate for conditions ranging from continuum to free-molecular, accurate in highly non-equilibrium flow regions, and holds potential for incorporating advanced molecular-based models for gas-phase and gas-surface interactions. As available computer resources continue their rapid growth, the DSMC method is continually being applied to increasingly complex flow problems. Although processor clock speed continues to increase, a trend of increasing multi-core-per-node parallel architectures is emerging. To effectively utilize such current and future parallel computing systems, a combined shared/distributed memory parallel implementation (using both Open Multi-Processing (OpenMP) and Message Passing Interface (MPI)) of the DSMC method is under development. The parallel implementation of a new state-of-the-art 3D DSMC code employing an embedded 3-level Cartesian mesh will be outlined. The presentation will focus on performance optimization strategies for DSMC, which includes, but is not limited to, modified algorithm designs, practical code-tuning techniques, and parallel performance optimization. Specifically, key issues important to the DSMC shared memory (OpenMP) parallel performance are identified as (1) granularity (2) load balancing (3) locality and (4) synchronization. Challenges and solutions associated with these issues as they pertain to the DSMC method will be discussed.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
Parallel direct numerical simulation of three-dimensional spray formation
NASA Astrophysics Data System (ADS)
Chergui, Jalel; Juric, Damir; Shin, Seungwon; Kahouadji, Lyes; Matar, Omar
2015-11-01
We present numerical results for the breakup mechanism of a liquid jet surrounded by a fast coaxial flow of air with density ratio (water/air) ~ 1000 and kinematic viscosity ratio ~ 60. We use code BLUE, a three-dimensional, two-phase, high performance, parallel numerical code based on a hybrid Front-Tracking/Level Set algorithm for Lagrangian tracking of arbitrarily deformable phase interfaces and a precise treatment of surface tension forces. The parallelization of the code is based on the technique of domain decomposition where the velocity field is solved by a parallel GMRes method for the viscous terms and the pressure by a parallel multigrid/GMRes method. Communication is handled by MPI message passing procedures. The interface method is also parallelized and defines the interface both by a discontinuous density field as well as by a triangular Lagrangian mesh and allows the interface to undergo large deformations including the rupture and/or coalescence of interfaces. EPSRC Programme Grant, MEMPHIS, EP/K0039761/1.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
Carroll, C.C.; Owen, J.E.
1988-05-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
A low-cost parallel implementation of direct numerical simulation of wall turbulence
NASA Astrophysics Data System (ADS)
Luchini, Paolo; Quadrio, Maurizio
2006-01-01
A numerical method for the direct numerical simulation of incompressible wall turbulence in rectangular and cylindrical geometries is presented. The distinctive feature resides in its design being targeted towards an efficient distributed-memory parallel computing on commodity hardware. The adopted discretization is spectral in the two homogeneous directions; fourth-order accurate, compact finite-difference schemes over a variable-spacing mesh in the wall-normal direction are key to our parallel implementation. The parallel algorithm is designed in such a way as to minimize data exchange among the computing machines, and in particular to avoid taking a global transpose of the data during the pseudo-spectral evaluation of the non-linear terms. The computing machines can then be connected to each other through low-cost network devices. The code is optimized for memory requirements, which can moreover be subdivided among the computing nodes. The layout of a simple, dedicated and optimized computing system based on commodity hardware is described. The performance of the numerical method on this computing system is evaluated and compared with that of other codes described in the literature, as well as with that of the same code implementing a commonly employed strategy for the pseudo-spectral calculation.
Direct numerical simulation of instabilities in parallel flow with spherical roughness elements
NASA Technical Reports Server (NTRS)
Deanna, R. G.
1992-01-01
Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.
A parallel direct numerical simulation of dust particles in a turbulent flow
NASA Astrophysics Data System (ADS)
Nguyen, H. V.; Yokota, R.; Stenchikov, G.; Kocurek, G.
2012-04-01
Due to their effects on radiation transport, aerosols play an important role in the global climate. Mineral dust aerosol is a predominant natural aerosol in the desert and semi-desert regions of the Middle East and North Africa (MENA). The Arabian Peninsula is one of the three predominant source regions on the planet "exporting" dust to almost the entire world. Mineral dust aerosols make up about 50% of the tropospheric aerosol mass and therefore produces a significant impact on the Earth's climate and the atmospheric environment, especially in the MENA region that is characterized by frequent dust storms and large aerosol generation. Understanding the mechanisms of dust emission, transport and deposition is therefore essential for correctly representing dust in numerical climate prediction. In this study we present results of numerical simulations of dust particles in a turbulent flow to study the interaction between dust and the atmosphere. Homogenous and passive dust particles in the boundary layers are entrained and advected under the influence of a turbulent flow. Currently no interactions between particles are included. Turbulence is resolved through direct numerical simulation using a parallel incompressible Navier-Stokes flow solver. Model output provides information on particle trajectories, turbulent transport of dust and effects of gravity on dust motion, which will be used to compare with the wind tunnel experiments at University of Texas at Austin. Results of testing of parallel efficiency and scalability is provided. Future versions of the model will include air-particle momentum exchanges, varying particle sizes and saltation effect. The results will be used for interpreting wind tunnel and field experiments and for improvement of dust generation parameterizations in meteorological models.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Weening, J.S.
1988-05-01
CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
NASA Astrophysics Data System (ADS)
Nizenkov, Paul; Noeding, Peter; Konopka, Martin; Fasoulas, Stefanos
2016-07-01
The in-house direct simulation Monte Carlo solver PICLas, which enables parallel, three-dimensional simulations of rarefied gas flows, is verified and validated. Theoretical aspects of the method and the employed schemes are briefly discussed. Considered cases include simple reservoir simulations and complex re-entry geometries, which were selected from literature and simulated with PICLas. First, the chemistry module is verified using simple numerical and analytical solutions. Second, simulation results of the rarefied gas flow around a 70° blunted-cone, the REX Free-Flyer as well as multiple points of the re-entry trajectory of the Orion capsule are presented in terms of drag and heat flux. A comparison to experimental measurements as well as other numerical results shows an excellent agreement across the different simulation cases. An outlook on future code development and applications is given.
NASA Astrophysics Data System (ADS)
Nizenkov, Paul; Noeding, Peter; Konopka, Martin; Fasoulas, Stefanos
2017-03-01
The in-house direct simulation Monte Carlo solver PICLas, which enables parallel, three-dimensional simulations of rarefied gas flows, is verified and validated. Theoretical aspects of the method and the employed schemes are briefly discussed. Considered cases include simple reservoir simulations and complex re-entry geometries, which were selected from literature and simulated with PICLas. First, the chemistry module is verified using simple numerical and analytical solutions. Second, simulation results of the rarefied gas flow around a 70° blunted-cone, the REX Free-Flyer as well as multiple points of the re-entry trajectory of the Orion capsule are presented in terms of drag and heat flux. A comparison to experimental measurements as well as other numerical results shows an excellent agreement across the different simulation cases. An outlook on future code development and applications is given.
Xyce parallel electronic simulator.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Parallel Dislocation Simulator
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-01-01
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-08-13
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs.
Jin, Dongliang; Coasne, Benoit
2017-08-09
Different molecular simulation strategies are used to assess the stability of methane hydrate under various temperature and pressure conditions. First, using two water molecular models, free energy calculations consisting of the Einstein molecule approach in combination with semi-grand Monte Carlo simulations are used to determine the pressure--temperature phase diagram of methane hydrate. With these calculations, we also estimate the chemical potentials of water and methane and methane occupancy at coexistence. Second, we also consider two other advanced molecular simulation techniques that allow probing the phase diagram of methane hydrate: the direct coexistence method in the Grand Canonical ensemble and the hyper parallel tempering Monte Carlo method. These two direct techniques are found to provide stability conditions that are consistent with the pressure--temperature phase diagram obtained using rigorous free energy calculations. The phase diagram obtained in this work, which is found to be consistent with previous simulation studies, is close to its experimental counterpart provided the TIP4P/Ice model is used to describe the water molecule.
NASA Astrophysics Data System (ADS)
Zhang, Hong-Na; Li, Feng-Chen; Li, Xiao-Bin; Li, Dong-Yang; Cai, Wei-Hua; Yu, Bo
2016-09-01
Direct numerical simulations (DNSs) of purely elastic turbulence in rectilinear shear flows in a three-dimensional (3D) parallel plate channel were carried out, by which numerical databases were established. Based on the numerical databases, the present paper analyzed the structural and statistical characteristics of the elastic turbulence including flow patterns, the wall effect on the turbulent kinetic energy spectrum, and the local relationship between the flow motion and the microstructures’ behavior. Moreover, to address the underlying physical mechanism of elastic turbulence, its generation was presented in terms of the global energy budget. The results showed that the flow structures in elastic turbulence were 3D with spatial scales on the order of the geometrical characteristic length, and vortex tubes were more likely to be embedded in the regions where the polymers were strongly stretched. In addition, the patterns of microstructures’ elongation behave like a filament. From the results of the turbulent kinetic energy budget, it was found that the continuous energy releasing from the polymers into the main flow was the main source of the generation and maintenance of the elastic turbulent status. Project supported by the National Natural Science Foundation of China (Grant Nos. 51276046 and 51506037), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 51421063), the China Postdoctoral Science Foundation (Grant No. 2016M591526), the Heilongjiang Postdoctoral Fund, China (Grant No. LBH-Z15063), and the China Postdoctoral International Exchange Program.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Efficiency of parallel direct optimization.
Janies, D A; Wheeler, W C
2001-03-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size.
Simulating Billion-Task Parallel Programs
Perumalla, Kalyan S; Park, Alfred J
2014-01-01
In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.
Parallel Power Grid Simulation Toolkit
Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Tutorial: Parallel Simulation on Supercomputers
Perumalla, Kalyan S
2012-01-01
This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.
Parallelizing alternating direction implicit solver on GPUs
USDA-ARS?s Scientific Manuscript database
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
NASA Astrophysics Data System (ADS)
Spurzem, R.; Berczik, P.; Zhong, S.; Nitadori, K.; Hamada, T.; Berentzen, I.; Veles, A.
2012-07-01
Astrophysical Computer Simulations of Dense Star Clusters in Galactic Nuclei with Supermassive Black Holes are presented using new cost-efficient supercomputers in China accelerated by graphical processing cards (GPU). We use large high-accuracy direct N-body simulations with Hermite scheme and block-time steps, parallelised across a large number of nodes on the large scale and across many GPU thread processors on each node on the small scale. A sustained performance of more than 350 Tflop/s for a science run on using simultaneously 1600 Fermi C2050 GPUs is reached; a detailed performance model is presented and studies for the largest GPU clusters in China with up to Petaflop/s performance and 7000 Fermi GPU cards. In our case study we look at two supermassive black holes with equal and unequal masses embedded in a dense stellar cluster in a galactic nucleus. The hardening processes due to interactions between black holes and stars, effects of rotation in the stellar system and relativistic forces between the black holes are simultaneously taken into account. The simulation stops at the complete relativistic merger of the black holes.
Parallel Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Ren, Ruichao; Orkoulas, G.
2007-06-01
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Parallel Markov chain Monte Carlo simulations.
Ren, Ruichao; Orkoulas, G
2007-06-07
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Plasma simulation using the massively parallel processor
NASA Technical Reports Server (NTRS)
Lin, C. S.; Thring, A. L.; Koga, J.; Janetzke, R. W.
1987-01-01
Two dimensional electrostatic simulation codes using the particle-in-cell model are developed on the Massively Parallel Processor (MPP). The conventional plasma simulation procedure that computes electric fields at particle positions by means of a gridded system is found inefficient on the MPP. The MPP simulation code is thus based on the gridless system in which particles are assigned to processing elements and electric fields are computed directly via Discrete Fourier Transform. Currently, the gridless model on the MPP in two dimensions is about nine times slower that the gridded system on the CRAY X-MP without considering I/O time. However, the gridless system on the MPP can be improved by incorporating a faster I/O between the staging memory and Array Unit and a more efficient procedure for taking floating point sums over processing elements. The initial results suggest that the parallel processors have the potential for performing large scale plasma simulations.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Parallel network simulations with NEURON.
Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L
2006-10-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Parallel Network Simulations with NEURON
Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.
2009-01-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
NASA Technical Reports Server (NTRS)
Combi, Michael R.
2004-01-01
In order to understand the global structure, dynamics, and physical and chemical processes occurring in the upper atmospheres, exospheres, and ionospheres of the Earth, the other planets, comets and planetary satellites and their interactions with their outer particles and fields environs, it is often necessary to address the fundamentally non-equilibrium aspects of the physical environment. These are regions where complex chemistry, energetics, and electromagnetic field influences are important. Traditional approaches are based largely on hydrodynamic or magnetohydrodynamic (MHD) formulations and are very important and highly useful. However, these methods often have limitations in rarefied physical regimes where the molecular collision rates and ion gyrofrequencies are small and where interactions with ionospheres and upper neutral atmospheres are important. At the University of Michigan we have an established base of experience and expertise in numerical simulations based on particle codes which address these physical regimes. The Principal Investigator, Dr. Michael Combi, has over 20 years of experience in the development of particle-kinetic and hybrid kinetichydrodynamics models and their direct use in data analysis. He has also worked in ground-based and space-based remote observational work and on spacecraft instrument teams. His research has involved studies of cometary atmospheres and ionospheres and their interaction with the solar wind, the neutral gas clouds escaping from Jupiter s moon Io, the interaction of the atmospheres/ionospheres of Io and Europa with Jupiter s corotating magnetosphere, as well as Earth s ionosphere. This report describes our progress during the year. The contained in section 2 of this report will serve as the basis of a paper describing the method and its application to the cometary coma that will be continued under a research and analysis grant that supports various applications of theoretical comet models to understanding the
Reservoir Thermal Recover Simulation on Parallel Computers
NASA Astrophysics Data System (ADS)
Li, Baoyan; Ma, Yuanle
The rapid development of parallel computers has provided a hardware background for massive refine reservoir simulation. However, the lack of parallel reservoir simulation software has blocked the application of parallel computers on reservoir simulation. Although a variety of parallel methods have been studied and applied to black oil, compositional, and chemical model numerical simulations, there has been limited parallel software available for reservoir simulation. Especially, the parallelization study of reservoir thermal recovery simulation has not been fully carried out, because of the complexity of its models and algorithms. The authors make use of the message passing interface (MPI) standard communication library, the domain decomposition method, the block Jacobi iteration algorithm, and the dynamic memory allocation technique to parallelize their serial thermal recovery simulation software NUMSIP, which is being used in petroleum industry in China. The parallel software PNUMSIP was tested on both IBM SP2 and Dawn 1000A distributed-memory parallel computers. The experiment results show that the parallelization of I/O has great effects on the efficiency of parallel software PNUMSIP; the data communication bandwidth is also an important factor, which has an influence on software efficiency. Keywords: domain decomposition method, block Jacobi iteration algorithm, reservoir thermal recovery simulation, distributed-memory parallel computer
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Icarus: A 2D direct simulation Monte Carlo (DSMC) code for parallel computers. User`s manual - V.3.0
Bartel, T.; Plimpton, S.; Johannes, J.; Payne, J.
1996-10-01
Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modelled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modelled using steric factors derived from Arrhenius reaction rates. Surface chemistry is modelled with surface reaction probabilities. The electron number density is either a fixed external generated field or determined using a local charge neutrality assumption. Ion chemistry is modelled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electrostatic fields can either be externally input or internally generated using a Langmuir-Tonks model. The Icarus software package includes the grid generation, parallel processor decomposition, postprocessing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. The majority of the software packages are written in standard Fortran.
Xyce(™) Parallel Electronic Simulator
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.
Boulant, Nicolas; Wu, Xiaoping; Adriany, Gregor; Schmitter, Sebastian; Uğurbil, Kamil; Van de Moortele, Pierre-François
2016-01-01
A method using parallel transmission to mitigate B1+ inhomogeneity while explicitly constraining the temperature rise is reported and compared with a more traditional SAR-constrained pulse design. Finite difference time domain simulations are performed on a numerical human head model and for a 16-channel coil at 10.5 Tesla. Based on a set of presimulations, a virtual observation point compression model for the temperature rise is derived. This compact representation is then used in a nonlinear programming algorithm for pulse design under explicit temperature rise constraints. In the example of a time-of-flight sequence, radiofrequency pulse performance in some cases is increased by a factor of two compared with SAR-constrained pulses, while temperature rise is directly and efficiently controlled. Pulse performance can be gained by relaxing the SAR constraints, but at the expense of a loss of direct control on temperature. Given the importance of accurate safety control at ultrahigh field and the lack of direct correspondence between SAR and temperature, this work motivates the need for thorough thermal studies in normal in vivo conditions. The tools presented here will possibly contribute to safer and more efficient MR exams. © 2015 Wiley Periodicals, Inc.
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A
2017-01-01
The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Simulation Exploration through Immersive Parallel Planes: Preprint
Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve
2016-03-01
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Parallelization and automatic data distribution for nuclear reactor simulations
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallel methods for the flight simulation model
Xiong, Wei Zhong; Swietlik, C.
1994-06-01
The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.
Structured building model reduction toward parallel simulation
Dobbs, Justin R.; Hencey, Brondon M.
2013-08-26
Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
Program For Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.
1991-01-01
User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique
Parallelization of a Compositional Reservoir Simulator
NASA Astrophysics Data System (ADS)
Reme, Hilde; Åge Øye, Geir; Espedal, Magne S.; Fladmark, Gunnar E.
A finite volume dicretization has been used to solve compositional flow in porous media. Secondary migration in fractured rocks has been the main motivation for the work. Multipoint flux approximation has been implemented and adaptive local grid refinement, based on domain decomposition, is used at fractures and faults. The parallelization method, which is described in this paper, strongly promotes code reuse and gives a very high level of parallelization despite low implementation costs. The programming framework is also portable to other platforms or other applications. We have presented computer experiments to examine the parallel efficiency of the implemented parallel simulator with respect to scalability and speedup. Keywords: porous media, multipoint flux approximation, domain decomposition, parallelization
Parallel processing of a rotating shaft simulation
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.
1989-01-01
A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Particle simulations on massively parallel machines
Plimpton, S.
1993-06-01
A wide variety of physical phenomena can be modeled with particles. Such simulations pose interesting challenges for parallel machines since the computations are often difficult to load-balance and can require irregular communication. We discuss the size of problems that can be simulated today, obstacles to higher performance, and areas where algorithmic improvements are need. The relevant issues are illustrated with two prototypical simulations: a Monte Carlo model of low-density fluid flow and molecular dynamics.
The Xyce Parallel Electronic Simulator - An Overview
HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.
2000-12-08
The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.
Xyce parallel electronic simulator release notes.
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
Compositional reservoir simulation in parallel supercomputing environments
Briens, F.J.L. ); Wu, C.H. ); Gazdag, J.; Wang, H.H. )
1991-09-01
A large-scale compositional reservoir simulation ({gt}1,000 cells) is not often run on a conventional mainframe computer owing to excessive turnaround times. This paper presents programming and computational techniques that fully exploit the capabilities of parallel supercomputers for a large-scale compositional simulation. A novel algorithm called sequential staging of tasks (SST) that can take full advantage of parallel-vector processing to speed up the solution of a large linear system is introduced. The effectiveness of SST is illustrated with results from computer experiments conducted on an IBM 3090-600E.
Unsteady flow simulation on a parallel computer
NASA Astrophysics Data System (ADS)
Faden, M.; Pokorny, S.; Engel, K.
For the simulation of the flow through compressor stages, an interactive flow simulation system is set up on an MIMD-type parallel computer. An explicit scheme is used in order to resolve the time-dependent interaction between the blades. The 2D Navier-Stokes equations are transformed into their general moving coordinates. The parallelization of the solver is based on the idea of domain decomposition. Results are presented for a problem of fixed size (4096 grid nodes for the Hakkinen case).
On extending parallelism to serial simulators
NASA Technical Reports Server (NTRS)
Nicol, David; Heidelberger, Philip
1994-01-01
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
Accelerating Conservative Parallel Simulation of VHDL Circuits
1994-12-01
1993:1). Reasearchers at the Air Force Institute of Technology (AFIT) have been investigating conservative parallel simulation of VHDL circuits for...ered with regard to size, time, and memory consumption. 2A behavior is an executable VHDL process representing a logic gate, source signal, or other...parameters significant to simulation runtime. Candidate parameters include: * number of behaviors * number of interconnections * number of dependencies
Parallel Performance of a Combustion Chemistry Simulation
Skinner, Gregg; Eigenmann, Rudolf
1995-01-01
We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1997-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be
Bolis, A; Cantwell, C D; Moxey, D; Serson, D; Sherwin, S J
2016-09-01
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
NASA Astrophysics Data System (ADS)
Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.
2016-09-01
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
Parallel algorithm strategies for circuit simulation.
Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard
2010-01-01
Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.
Inflated speedups in parallel simulations via malloc()
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.
Parallel discrete event simulation using shared memory
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1988-01-01
With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
Xyce parallel electronic simulator : reference guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.
Massively-Parallel Dislocation Dynamics Simulations
Cai, W; Bulatov, V V; Pierce, T G; Hiratani, M; Rhee, M; Bartelt, M; Tang, M
2003-06-18
Prediction of the plastic strength of single crystals based on the collective dynamics of dislocations has been a challenge for computational materials science for a number of years. The difficulty lies in the inability of the existing dislocation dynamics (DD) codes to handle a sufficiently large number of dislocation lines, in order to be statistically representative and to reproduce experimentally observed microstructures. A new massively-parallel DD code is developed that is capable of modeling million-dislocation systems by employing thousands of processors. We discuss the general aspects of this code that make such large scale simulations possible, as well as a few initial simulation results.
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but
Parallel Strategies for Crash and Impact Simulations
Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.
1998-12-07
We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.
Parallelization Strategies for Large Particle Simulations in Astrophysics
NASA Astrophysics Data System (ADS)
Pattabiraman, Bharath
The modeling of collisional N-body stellar systems is a topic of great current interest in several branches of astrophysics and cosmology. These systems are dominated by the physics of relaxation, the collective effect of many weak, random gravitational encounters between stars. They connect directly to our understanding of star clusters, and to the formation of exotic objects such as X-ray binaries, pulsars, and massive black holes. As a prototypical multi-physics, multi-scale problem, the numerical simulation of such systems is computationally intensive, and can only be achieved through high-performance computing. The goal of this thesis is to present parallelization and optimization strategies that can be used to develop efficient computational tools for simulating collisional N-body systems. This leads to major advances: 1) From an astrophysics perspective, these tools enable the study of new physical regimes out of reach by previous simulations. They also lead to much more complete parameter space exploration, allowing direct comparison of numerical results to observational data. 2) On the high-performance computing front, efficient parallelization of a multi-component application requires the meticulous redesign of the various components, as well as innovative parallelization techniques. Many of the challenges faced in this process lie at the very heart of high-performance computing research, including achieving optimal load balancing, maximizing utilization of computational resources, and making effective use of different parallel platforms. For modeling collisional N-body systems, a Monte Carlo approach provides ideal balance between speed and accuracy, as opposed to the more accurate but less scalable direct N-body method. We describe the development of a new version of the Cluster Monte Carlo (CMC) code capable of simulating systems with a realistic number of stars, while accounting for all important physical processes. This efficient and scalable
NASA Astrophysics Data System (ADS)
Foster, Justin; Miller, Richard
2011-11-01
Direct Numerical Simulations (DNS) are conducted for temporally developing reacting H2/O2 shear layers at an ambient pressure of 100atm. The compressible form of the governing equations are coupled with the Peng Robinson real gas equation of state and are solved using eighth order central finite differences and fourth order Runge Kutta time integration with resolutions up to ~3/4 billion grid points. The formulation includes a detailed pressure dependent kinetics mechanism having 8 species and 19 steps, detailed property models, and generalized forms of the multicomponent heat and mass diffusion vectors derived from nonequilibrium thermodynamics and fluctuation theory. The DNS is performed over a range of Reynolds numbers up to 4500 based on the free stream velocity difference and initial vorticity thickness. The results are then analyzed in an a priori manner to illustrate the role of the subgrid mass flux vector within the filtered form of the governing equations relevant to Large Eddy Simulations. The subgrid mass flux vector is found to be a significant term; particularly within localized regions of the flame. Research supported by NSF Grant CBET-0965624 and Clemson University's Palmetto Cluster.
Parallel beam dynamics simulation of linear accelerators
Qiang, Ji; Ryne, Robert D.
2002-01-31
In this paper we describe parallel particle-in-cell methods for the large scale simulation of beam dynamics in linear accelerators. These techniques have been implemented in the IMPACT (Integrated Map and Particle Accelerator Tracking) code. IMPACT is being used to study the behavior of intense charged particle beams and as a tool for the design of next-generation linear accelerators. As examples, we present applications of the code to the study of emittance exchange in high intensity beams and to the study of beam transport in a proposed accelerator for the development of accelerator-driven waste transmutation technologies.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Empirical study of parallel LRU simulation algorithms
NASA Technical Reports Server (NTRS)
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
A note on parallel efficiency of fire simulation on cluster
NASA Astrophysics Data System (ADS)
Valasek, L.; Glasa, J.
2016-08-01
Current HPC clusters are capable to reduce execution time of parallelized tasks significantly. The paper discusses the use of two selected strategies of cluster computational resources allocation and their impact on parallel efficiency of fire simulation. Simulation of a simple corridor fire scenario by Fire Dynamics Simulator parallelized by the MPI programming model is tested on the HPC cluster at the Institute of Informatics of Slovak Academy of Sciences in Bratislava (Slovakia). The tests confirm that parallelization has a great potential to reduce execution times achieving promising values of parallel efficiency of the simulation, however, the results also show that the use of increasing numbers of computational meshes resulting in increasing numbers of used computational cores does not necessarily decrease the execution time nor the parallel efficiency of simulation. The results obtained indicate that the simulation achieves different values of the execution time and the parallel efficiency in regard of the used strategy for cluster computational resources allocation.
Parallel Proximity Detection for Computer Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1998-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel Proximity Detection for Computer Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1997-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
A polymorphic reconfigurable emulator for parallel simulation
NASA Technical Reports Server (NTRS)
Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.
1980-01-01
Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.
Adaptive domain decomposition for Monte Carlo simulations on parallel processors
NASA Technical Reports Server (NTRS)
Wilmoth, Richard G.
1991-01-01
A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefied flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.
Adaptive domain decomposition for Monte Carlo simulations on parallel processors
NASA Technical Reports Server (NTRS)
Wilmoth, Richard G.
1990-01-01
A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefield flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκαr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2012-01-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Conservative parallel discrete-event simulation: Principles and practice
Wagner, D.B.
1989-01-01
Simulation is one of the most important computational technologies in use today. Unfortunately, its importance is matched by its appetite for computational resources. These factors make parallel simulation a topic with far-reaching consequences in all fields of science and engineering. This thesis is concerned with one approach to this problem, conservative loose event-driven parallel simulation, the objective of which is to apply multiple processors to a single simulation run in an effort to reduce its time-to-completion. There are several factors that make parallel simulation difficult. First, the fact that a physical system has a high degree of concurrency does not necessarily mean that a simulation of that system will benefit from parallelism. The author introduces two simple analytic techniques that can be used to bound from above the speedup potential of parallel simulations. Second, a parallel simulation requires synchronization to ensure that the results obtained are equivalent to those of a sequential simulation of the problem. He argues that the availability of inexpensive, medium-scale and shared-memory multiprocessors mandates a re-examination of synchronization algorithms for conservative loose event-driven parallel simulation. His investigations lead to a novel synchronization technique called lazy blocking avoidance. His measurements show that lazy blocking avoidance performs at least as well as, and often substantially better than, two other synchronization methods that have been widely discussed in the literature (deadlock detection and recovery), and eager blocking avoidance.
Partitioning strategies for parallel KIVA-4 engine simulations
Torres, D J; Kong, S C
2008-01-01
Parallel KIVA-4 is described and simulated in four different engine geometries. The Message Passing-Interface (MPl) was used to parallelize KIVA-4. Par itioning strategies ar accesed in light of the fact that cells can become deactivated and activated during the course of an engine simulation which will affect the load balance between processors.
Parallel/distributed direct method for solving linear systems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
Massively parallel algorithms for trace-driven cache simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.
1991-01-01
Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
Parallel methods for dynamic simulation of multiple manipulator systems
NASA Technical Reports Server (NTRS)
Mcmillan, Scott; Sadayappan, P.; Orin, David E.
1993-01-01
In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.
High Performance Parallel Methods for Space Weather Simulations
NASA Technical Reports Server (NTRS)
Hunter, Paul (Technical Monitor); Gombosi, Tamas I.
2003-01-01
This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10
Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution
Yoginath, Srikanth B; Perumalla, Kalyan S
2008-01-01
Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.
Ion dynamics at supercritical quasi-parallel shocks: Hybrid simulations
NASA Astrophysics Data System (ADS)
Su, Yanqing; Lu, Quanming; Gao, Xinliang; Huang, Can; Wang, Shui
2012-09-01
By separating the incident ions into directly transmitted, downstream thermalized, and diffuse ions, we perform one-dimensional (1D) hybrid simulations to investigate ion dynamics at a supercritical quasi-parallel shock. In the simulations, the angle between the upstream magnetic field and shock nominal direction is θBn=30°, and the Alfven Mach number is MA˜5.5. The shock exhibits a periodic reformation process. The ion reflection occurs at the beginning of the reformation cycle. Part of the reflected ions is trapped between the old and new shock fronts for an extended time period. These particles eventually form superthermal diffuse ions after they escape to the upstream of the new shock front at the end of the reformation cycle. The other reflected ions may return to the shock immediately or be trapped between the old and new shock fronts for a short time period. When the amplitude of the new shock front exceeds that of the old shock front and the reformation cycle is finished, these ions become thermalized ions in the downstream. No noticeable heating can be found in the directly transmitted ions. The relevance of our simulations to the satellite observations is also discussed in the paper.
The effects of parallel processing architectures on discrete event simulation
NASA Astrophysics Data System (ADS)
Cave, William; Slatt, Edward; Wassmer, Robert E.
2005-05-01
As systems become more complex, particularly those containing embedded decision algorithms, mathematical modeling presents a rigid framework that often impedes representation to a sufficient level of detail. Using discrete event simulation, one can build models that more closely represent physical reality, with actual algorithms incorporated in the simulations. Higher levels of detail increase simulation run time. Hardware designers have succeeded in producing parallel and distributed processor computers with theoretical speeds well into the teraflop range. However, the practical use of these machines on all but some very special problems is extremely limited. The inability to use this power is due to great difficulties encountered when trying to translate real world problems into software that makes effective use of highly parallel machines. This paper addresses the application of parallel processing to simulations of real world systems of varying inherent parallelism. It provides a brief background in modeling and simulation validity and describes a parameter that can be used in discrete event simulation to vary opportunities for parallel processing at the expense of absolute time synchronization and is constrained by validity. It focuses on the effects of model architecture, run-time software architecture, and parallel processor architecture on speed, while providing an environment where modelers can achieve sufficient model accuracy to produce valid simulation results. It describes an approach to simulation development that captures subject area expert knowledge to leverage inherent parallelism in systems in the following ways: * Data structures are separated from instructions to track which instruction sets share what data. This is used to determine independence and thus the potential for concurrent processing at run-time. * Model connectivity (independence) can be inspected visually to determine if the inherent parallelism of a physical system is properly
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Parallel Monte Carlo simulation of multilattice thin film growth
NASA Astrophysics Data System (ADS)
Shu, J. W.; Lu, Qin; Wong, Wai-on; Huang, Han-chen
2001-07-01
This paper describe a new parallel algorithm for the multi-lattice Monte Carlo atomistic simulator for thin film deposition (ADEPT), implemented on parallel computer using the PVM (Parallel Virtual Machine) message passing library. This parallel algorithm is based on domain decomposition with overlapping and asynchronous communication. Multiple lattices are represented by a single reference lattice through one-to-one mappings, with resulting computational demands being comparable to those in the single-lattice Monte Carlo model. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. Results show that the algorithm is highly efficient with large number of processors. The algorithm was implemented on a parallel machine with 50 processors, and it is suitable for parallel Monte Carlo simulation of thin film growth with either a distributed memory parallel computer or a shared memory machine with message passing libraries. In this paper, the significant communication time in parallel MC simulation of thin film growth is effectively reduced by adopting domain decomposition with overlapping between sub-domains and asynchronous communication among processors. The overhead of communication does not increase evidently and speedup shows an ascending tendency when the number of processor increases. A near linear increase in computing speed was achieved with number of processors increases and there is no theoretical limit on the number of processors to be used. The techniques developed in this work are also suitable for the implementation of the Monte Carlo code on other parallel systems.
Series-parallel method of direct solar array regulation
NASA Technical Reports Server (NTRS)
Gooder, S. T.
1976-01-01
A 40 watt experimental solar array was directly regulated by shorting out appropriate combinations of series and parallel segments of a solar array. Regulation switches were employed to control the array at various set-point voltages between 25 and 40 volts. Regulation to within + or - 0.5 volt was obtained over a range of solar array temperatures and illumination levels as an active load was varied from open circuit to maximum available power. A fourfold reduction in regulation switch power dissipation was achieved with series-parallel regulation as compared to the usual series-only switching for direct solar array regulation.
Electromagnetic direct implicit PIC simulation
Langdon, A.B.
1983-03-29
Interesting modelling of intense electron flow has been done with implicit particle-in-cell simulation codes. In this report, the direct implicit PIC simulation approach is applied to simulations that include full electromagnetic fields. The resulting algorithm offers advantages relative to moment implicit electromagnetic algorithms and may help in our quest for robust and simpler implicit codes.
Parallel-Processing Test Bed For Simulation Software
NASA Technical Reports Server (NTRS)
Blech, Richard; Cole, Gary; Townsend, Scott
1996-01-01
Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.
1990-01-01
Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.
1990-01-01
Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Symplectic molecular dynamics simulations on specially designed parallel computers.
Borstnik, Urban; Janezic, Dusanka
2005-01-01
We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.
Parallel discrete-event simulation of FCFS stochastic queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical
Xyce parallel electronic simulator : users' guide. Version 5.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical
Iterative Schemes for Time Parallelization with Application to Reservoir Simulation
Garrido, I; Fladmark, G E; Espedal, M S; Lee, B
2005-04-18
Parallel methods are usually not applied to the time domain because of the inherit sequentialness of time evolution. But for many evolutionary problems, computer simulation can benefit substantially from time parallelization methods. In this paper, they present several such algorithms that actually exploit the sequential nature of time evolution through a predictor-corrector procedure. This sequentialness ensures convergence of a parallel predictor-corrector scheme within a fixed number of iterations. The performance of these novel algorithms, which are derived from the classical alternating Schwarz method, are illustrated through several numerical examples using the reservoir simulator Athena.
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Broadband monitoring simulation with massively parallel processors
NASA Astrophysics Data System (ADS)
Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander
2011-09-01
Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.
Running Parallel Discrete Event Simulators on Sierra
Barnes, P. D.; Jefferson, D. R.
2015-12-03
In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.
Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting; Russo, Thomas V.; Schiek, Richard L.; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.
2016-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Parallel discrete event simulation: A shared memory approach
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Traffic simulations on parallel computers using domain decomposition techniques
Hanebutte, U.R.; Tentner, A.M.
1995-12-31
Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.
A CUDA based parallel multi-phase oil reservoir simulator
NASA Astrophysics Data System (ADS)
Zaza, Ayham; Awotunde, Abeeb A.; Fairag, Faisal A.; Al-Mouhamed, Mayez A.
2016-09-01
Forward Reservoir Simulation (FRS) is a challenging process that models fluid flow and mass transfer in porous media to draw conclusions about the behavior of certain flow variables and well responses. Besides the operational cost associated with matrix assembly, FRS repeatedly solves huge and computationally expensive sparse, ill-conditioned and unsymmetrical linear system. Moreover, as the computation for practical reservoir dimensions lasts for long times, speeding up the process by taking advantage of parallel platforms is indispensable. By considering the state of art advances in massively parallel computing and the accompanying parallel architecture, this work aims primarily at developing a CUDA-based parallel simulator for oil reservoir. In addition to the initial reported 33 times speed gain compared to the serial version, running experiments showed that BiCGSTAB is a stable and fast solver which could be incorporated in such simulations instead of the more expensive, storage demanding and usually utilized GMRES.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
An approach to real-time simulation using parallel processing
NASA Technical Reports Server (NTRS)
Blech, R. A.; Arpasi, D. J.
1981-01-01
Current applications of real-time simulations to the development of complex aircraft propulsion system controls have demonstrated the need for accurate, portable, and low-cost simulators. This paper presents a preliminary simulator design that uses a parallel computer organization to provide these features. The hardware and software for this prototype simulator are discussed. A detailed discussion of the inter-computer data transfer mechanism is also presented.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.
Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D
2016-10-01
Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
3-D massively parallel impact simulations using PCTH
Fang, H.E.; Robinson, A.C.
1992-12-31
Simulations of hypervelocity impact problems are performed frequently by government laboratories and contractors for armor/anti-armor applications. These simulations need to deal with shock wave physics phenomena, large material deformation, motion of debris particles and complex geometries. As a result, memory and processing time requirements are large for detailed, three-dimensional calculations. The large massively parallel supercomputing systems of the future will provide the power necessary to greatly reduce simulation times currently required by shared-memory, vector supercomputers. This paper gives an introduction to PCTH, a next-generation shock wave physics code which is being built at Sandia National Laboratories for massively parallel supercomputers, and demonstrates that massively parallel hydrocodes, such as PCTH, can provide highly-detailed, three-dimensional simulations of armor/anti-armor systems.
3-D massively parallel impact simulations using PCTH
Fang, H.E.; Robinson, A.C.
1992-01-01
Simulations of hypervelocity impact problems are performed frequently by government laboratories and contractors for armor/anti-armor applications. These simulations need to deal with shock wave physics phenomena, large material deformation, motion of debris particles and complex geometries. As a result, memory and processing time requirements are large for detailed, three-dimensional calculations. The large massively parallel supercomputing systems of the future will provide the power necessary to greatly reduce simulation times currently required by shared-memory, vector supercomputers. This paper gives an introduction to PCTH, a next-generation shock wave physics code which is being built at Sandia National Laboratories for massively parallel supercomputers, and demonstrates that massively parallel hydrocodes, such as PCTH, can provide highly-detailed, three-dimensional simulations of armor/anti-armor systems.
Parallel finite element simulation of large ram-air parachutes
NASA Astrophysics Data System (ADS)
Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.
1997-06-01
In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.
Methods of parallel computation applied on granular simulations
NASA Astrophysics Data System (ADS)
Martins, Gustavo H. B.; Atman, Allbens P. F.
2017-06-01
Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.
Direct simulation of turbulent combustion
NASA Technical Reports Server (NTRS)
Poinsot, T. J.
1990-01-01
Understanding and modeling of turbulent combustion are key-problems in the computation of numerous practical systems. Because of the lack of analytical theories in this field and of the difficulty of performing precise experiments, direct simulation appears to be one of the most attractive tools to use in addressing this problem. The present work can be split into two parts: (1) Development and validation of a direct simulation method for turbulent combustion; (2) Applications of the method to premixed turbulent combustion problems. The goal of part 1 is to define and to test a numerical method for direct simulation of reacting flows. A high level of confidence should be attached to direct simulation results, and this can only be achieved through extensive validation tests. In part 2, direct simulation is used to address some of the many critical problems related to turbulent combustion. At the present time, I have limited this work to premixed combustion and considered only four basic issues: (1) The effect of pressure waves on flame propagation; (2) The interaction between flame fronts and vortices; (3) The influence of curvature on premixed flame fronts; and (4) The validation of flamelet models for premixed turbulent combustion.
Applying Parallel Processing Techniques to Tether Dynamics Simulation
NASA Technical Reports Server (NTRS)
Wells, B. Earl
1996-01-01
The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.
Parallelizing N-Body Simulations on a Heterogeneous Cluster
NASA Astrophysics Data System (ADS)
Stenborg, T. N.
2009-10-01
This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A; Mamidala, Amith R
2014-02-11
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
A tool for simulating parallel branch-and-bound methods
NASA Astrophysics Data System (ADS)
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
A hybrid algorithm for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Panel on future directions in parallel computer architecture
VanTilborg, A.M. )
1989-06-01
One of the program highlights of the 15th Annual International Symposium on Computer Architecture, held May 30 - June 2, 1988 in Honolulu, was a panel session on future directions in parallel computer architecture. The panel was organized and chaired by the author, and was comprised of Prof. Jack Dennis (NASA Ames Research Institute for Advanced Computer Science), Prof. H.T. Kung (Carnegie Mellon), and Dr. Burton Smith (Tera Computer Company). The objective of the panel was to identify the likely trajectory of future parallel computer system progress, particularly from the sandpoint of marketplace acceptance. Approximately 250 attendees participated in the session, in which each panelist began with a ten minute viewgraph explanation of his views, followed by an open and sometimes lively exchange with the audience and fellow panelists. The session ran for ninety minutes.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
A hybrid parallel framework for the cellular Potts model simulations
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
On the hierarchical parallelization of ab initio simulations
NASA Astrophysics Data System (ADS)
Ruiz-Barragan, Sergi; Ishimura, Kazuya; Shiga, Motoyuki
2016-02-01
A hierarchical parallelization has been implemented in a new unified code PIMD-SMASH for ab initio simulation where the replicas and the Born-Oppenheimer forces are parallelized. It is demonstrated that ab initio path integral molecular dynamics simulations can be carried out very efficiently for systems up to a few tens of water molecules. The code was then used to study a Diels-Alder reaction of cyclopentadiene and butenone by ab initio string method. A reduction in the reaction energy barrier is found in the presence of hydrogen-bonded water, in accordance with experiment.
Parallel Monte Carlo Simulation for control system design
NASA Technical Reports Server (NTRS)
Schubert, Wolfgang M.
1995-01-01
The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.
Direct drive digital servo press with high parallel control
NASA Astrophysics Data System (ADS)
Murata, Chikara; Yabe, Jun; Endou, Junichi; Hasegawa, Kiyoshi
2013-12-01
Direct drive digital servo press has been developed as the university-industry joint research and development since 1998. On the basis of this result, 4-axes direct drive digital servo press has been developed and in the market on April of 2002. This servo press is composed of 1 slide supported by 4 ball screws and each axis has linearscale measuring the position of each axis with high accuracy less than μm order level. Each axis is controlled independently by servo motor and feedback system. This system can keep high level parallelism and high accuracy even with high eccentric load. Furthermore the 'full stroke full power' is obtained by using ball screws. Using these features, new various types of press forming and stamping have been obtained by development and production. The new stamping and forming methods are introduced and 'manufacturing' need strategy of press forming with high added value and also the future direction of press forming are also introduced.
Parallel canonical Monte Carlo simulations through sequential updating of particles
NASA Astrophysics Data System (ADS)
O'Keeffe, C. J.; Orkoulas, G.
2009-04-01
In canonical Monte Carlo simulations, sequential updating of particles is equivalent to random updating due to particle indistinguishability. In contrast, in grand canonical Monte Carlo simulations, sequential implementation of the particle transfer steps in a dense grid of distinct points in space improves both the serial and the parallel efficiency of the simulation. The main advantage of sequential updating in parallel canonical Monte Carlo simulations is the reduction in interprocessor communication, which is usually a slow process. In this work, we propose a parallelization method for canonical Monte Carlo simulations via domain decomposition techniques and sequential updating of particles. Each domain is further divided into a middle and two outer sections. Information exchange is required after the completion of the updating of the outer regions. During the updating of the middle section, communication does not occur unless a particle moves out of this section. Results on two- and three-dimensional Lennard-Jones fluids indicate a nearly perfect improvement in parallel efficiency for large systems.
Parallel canonical Monte Carlo simulations through sequential updating of particles.
O'Keeffe, C J; Orkoulas, G
2009-04-07
In canonical Monte Carlo simulations, sequential updating of particles is equivalent to random updating due to particle indistinguishability. In contrast, in grand canonical Monte Carlo simulations, sequential implementation of the particle transfer steps in a dense grid of distinct points in space improves both the serial and the parallel efficiency of the simulation. The main advantage of sequential updating in parallel canonical Monte Carlo simulations is the reduction in interprocessor communication, which is usually a slow process. In this work, we propose a parallelization method for canonical Monte Carlo simulations via domain decomposition techniques and sequential updating of particles. Each domain is further divided into a middle and two outer sections. Information exchange is required after the completion of the updating of the outer regions. During the updating of the middle section, communication does not occur unless a particle moves out of this section. Results on two- and three-dimensional Lennard-Jones fluids indicate a nearly perfect improvement in parallel efficiency for large systems.
Parallel runway requirement analysis study. Volume 2: Simulation manual
NASA Technical Reports Server (NTRS)
Ebrahimi, Yaghoob S.; Chun, Ken S.
1993-01-01
This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.
Parallel Mass Transfer Simulation of Nanoparticles Using Nonblocking Communications
NASA Astrophysics Data System (ADS)
Chantrapornchai (Phonpensri), Chantana; Dolwithayakul, Banpot; Gorlatch, Sergei
This paper presents experiences and results obtained in optimizing parallelization of the mass transfer simulation in the High Gradient Magnetic Separation (HGMS) of nanoparticles using nonblocking communication techniques in the point-to-point and collective model. We study the dynamics of mass transfer statistically in terms of particle volume concentration and the continuity equation, which is solved numerically by using the finite-difference method to compute concentration distribution in the simulation domain at a given time. In the parallel simulation, total concentration data in the simulation domain are divided row-wise and distributed equally to a group of processes. We propose two parallel algorithms based on the row-wise partitioning: algorithms with nonblocking send/receive and nonblocking scatter/gather using the NBC library. We compare the performance of both versions by measuring their parallel speedup and efficiency. We also investigate the communication overhead in both versions. Our results show that the nonblocking collective communication can improve the performance of the simulation when the number of processes is large.
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Efficient parallel CFD-DEM simulations using OpenMP
NASA Astrophysics Data System (ADS)
Amritkar, Amit; Deb, Surya; Tafti, Danesh
2014-01-01
The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.
Parallel PDE-Based Simulations Using the Common Component Architecture
McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia
2006-03-05
Summary. The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of componentbased software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and generalpurpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications.
Enabling parallel simulation of large-scale HPC network systems
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; ...
2016-04-07
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
Enabling parallel simulation of large-scale HPC network systems
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; Carns, Philip
2016-04-07
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations
Xyce parallel electronic simulator reference guide, version 6.0.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce parallel electronic simulator reference guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator : reference guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Xyce™ Parallel Electronic Simulator: Reference Guide, Version 5.1
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Santarelli, Keith R.; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S.; Pawlowski, Roger P.
2009-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide.
Xyce Parallel Electronic Simulator : reference guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Time parallelization of plasma simulations using the parareal algorithm
Samaddar, D.; Houlberg, Wayne A; Berry, Lee A; Elwasif, Wael R; Huysmans, G; Batchelor, Donald B
2011-01-01
Simulation of fusion plasmas involve a broad range of timescales. In magnetically confined plasmas, such as in ITER, the timescale associated with the microturbulence responsible for transport and confinement timescales vary by an order of 10^6 10^9. Simulating this entire range of timescales is currently impossible, even on the most powerful supercomputers available. Space parallelization has so far been the most common approach to solve partial differential equations. Space parallelization alone has led to computational saturation for fluid codes, which means that the walltime for computaion does not linearly decrease with the increasing number of processors used. The application of the parareal algorithm to simulations of fusion plasmas ushers in a new avenue of parallelization, namely temporal parallelization. The algorithm has been successfully applied to plasma turbulence simulations, prior to which it has been applied to other relatively simpler problems. This work explores the extension of the applicability of the parareal algorithm to ITER relevant problems, starting with a diffusion-convection model.
A Framework to Simulate Semiconductor Devices Using Parallel Computer Architecture
NASA Astrophysics Data System (ADS)
Kumar, Gaurav; Singh, Mandeep; Bulusu, Anand; Trivedi, Gaurav
2016-10-01
Device simulations have become an integral part of semiconductor technology to address many issues (short channel effects, narrow width effects, hot-electron effect) as it goes into nano regime, helping us to continue further with the Moore's Law. TCAD provides a simulation environment to design and develop novel devices, thus a leap forward to study their electrical behaviour in advance. In this paper, a parallel 2D simulator for semiconductor devices using Discontinuous Galerkin Finite Element Method (DG-FEM) is presented. Discontinuous Galerkin (DG) method is used to discretize essential device equations and later these equations are analyzed by using a suitable methodology to find the solution. DG method is characterized to provide more accurate solution as it efficiently conserve the flux and easily handles complex geometries. OpenMP is used to parallelize solution of device equations on manycore processors and a speed of 1.4x is achieved during assembly process of discretization. This study is important for more accurate analysis of novel devices (such as FinFET, GAAFET etc.) on a parallel computing platform and will help us to develop a parallel device simulator which will be able to address this issue efficiently. A case study of PN junction diode is presented to show the effectiveness of proposed approach.
Simulation of Parallel Interacting Faults and Earthquake Predictability
NASA Astrophysics Data System (ADS)
Mora, P.; Weatherley, D.; Klein, B.
2003-04-01
'' behaviour. This implies that mean field theoretical analysis such as Klein et al, 2000 requires introduction of a memory kernel in order to properly account for the glassy behaviour of interacting fault system models. The elasto-dynamic parallel interacting fault model helps to provide a crucial link between CA maps of phase space and the behaviour of more realistic elasto-dynamic interacting fault system models, and thus, a means to improve understanding of the dynamics and predictability of real fault systems. REFERENCES W. Klein and M. Anghel and C.D. Ferguson and J.B. Rundle and J.S. Sá Martins (2000) Statistical Analysis of a Model for Earthquake Faults with Long-Range Stress Transfer, in: Geocomplexity and the Physics of Earthquakes (Geophysical Monograph series; no. 120), eds. J.B. Rundle and D. L. Turcotte and W. Klein, pp 43-71 (American Geophysical Union, Washington, DC). Mora, P., Place, D., Abe, S. and Jaumé, S. (2000) Lattice solid simulation of the physics of earthquakes: the model, results and directions, in: GeoComplexity and the Physics of Earthquakes (Geophysical Monograph series; no. 120), eds. Rundle, J.B., Turcotte, D.L. &Klein, W., pp 105-125 (American Geophys. Union, Washington, DC). Mora, P., and Place, D. (2002) Stress correlation function evolution in lattice solid elasto-dynamic models of shear and fracture zones and earthquake prediction, Pure Appl. Geophys, 159, 2413-2427. Weatherley, D. and Mora, P. (2003) Accelerating Precursory Activity within a Class of Earthquake Analog Automata, Pure Appl. Geophysics, submitted.
An adaptive synchronization protocol for parallel discrete event simulation
Bisset, K.R.
1998-12-01
Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.
Particle/Continuum Hybrid Simulation in a Parallel Computing Environment
NASA Technical Reports Server (NTRS)
Baganoff, Donald
1996-01-01
The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.
The parallel subdomain-levelset deflation method in reservoir simulation
NASA Astrophysics Data System (ADS)
van der Linden, J. H.; Jönsthövel, T. B.; Lukyanov, A. A.; Vuik, C.
2016-01-01
Extreme and isolated eigenvalues are known to be harmful to the convergence of an iterative solver. These eigenvalues can be produced by strong heterogeneity in the underlying physics. We can improve the quality of the spectrum by 'deflating' the harmful eigenvalues. In this work, deflation is applied to linear systems in reservoir simulation. In particular, large, sudden differences in the permeability produce extreme eigenvalues. The number and magnitude of these eigenvalues is linked to the number and magnitude of the permeability jumps. Two deflation methods are discussed. Firstly, we state that harmonic Ritz eigenvector deflation, which computes the deflation vectors from the information produced by the linear solver, is unfeasible in modern reservoir simulation due to high costs and lack of parallelism. Secondly, we test a physics-based subdomain-levelset deflation algorithm that constructs the deflation vectors a priori. Numerical experiments show that both methods can improve the performance of the linear solver. We highlight the fact that subdomain-levelset deflation is particularly suitable for a parallel implementation. For cases with well-defined permeability jumps of a factor 104 or higher, parallel physics-based deflation has potential in commercial applications. In particular, the good scalability of parallel subdomain-levelset deflation combined with the robust parallel preconditioner for deflated system suggests the use of this method as an alternative for AMG.
Molecular simulation of rheological properties using massively parallel supercomputers
Bhupathiraju, R.K.; Cui, S.T.; Gupta, S.A.; Cummings, P.T.; Cochran, H.D.
1996-11-01
Advances in parallel supercomputing now make possible molecular-based engineering and science calculations that will soon revolutionize many technologies, such as those involving polymers and those involving aqueous electrolytes. We have developed a suite of message-passing codes for classical molecular simulation of such complex fluids and amorphous materials and have completed a number of demonstration calculations of problems of scientific and technological importance with each. In this paper, we will focus on the molecular simulation of rheological properties, particularly viscosity, of simple and complex fluids using parallel implementations of non-equilibrium molecular dynamics. Such calculations represent significant challenges computationally because, in order to reduce the thermal noise in the calculated properties within acceptable limits, large systems and/or long simulated times are required.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods
Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A; Popov, Emilian L
2012-01-01
At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.
Reusable Component Model Development Approach for Parallel and Distributed Simulation
Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng
2014-01-01
Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Parallelization of Program to Optimize Simulated Trajectories (POST3D)
NASA Technical Reports Server (NTRS)
Hammond, Dana P.; Korte, John J. (Technical Monitor)
2001-01-01
This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Potts-model grain growth simulations: Parallel algorithms and applications
Wright, S.A.; Plimpton, S.J.; Swiler, T.P.
1997-08-01
Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.
Xyce Parallel Electronic Simulator Users Guide Version 6.2.
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-09-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Casting pearls ballistically: Efficient massively parallel simulation of particle deposition
Lubachevsky, B.D.; Privman, V.; Roy, S.C.
1996-06-01
We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.
The cost of conservative synchronization in parallel discrete event simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Modularized Parallel Neutron Instrument Simulation on the TeraGrid
Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E
2007-01-01
In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.
Parallel Stochastic discrete event simulation of calcium dynamics in neuron.
Ishlam Patoary, Mohammad Nazrul; Tropper, Carl; McDougal, Robert A; Zhongwei, Lin; Lytton, William W
2017-09-26
The intra-cellular calcium signaling pathways of a neuron depends on both biochemical reactions and diffusions. Some quasi-isolated compartments (e.g. spines) are so small and calcium concentrations are so low that one extra molecule diffusing in by chance can make a nontrivial difference in its concentration (percentage-wise). These rare events can affect dynamics discretely in such way that they cannot be evaluated by a deterministic simulation. Stochastic models of such a system provide a more detailed understanding of these systems than existing deterministic models because they capture their behavior at a molecular level. Our research focuses on the development of a high performance parallel discrete event simulation environment, Neuron Time Warp (NTW), which is intended for use in the parallel simulation of stochastic reaction-diffusion systems such as intra-calcium signaling. NTW is integrated with NEURON, a simulator which is widely used within the neuroscience community. We simulate two models, a calcium buffer and a calcium wave model. The calcium buffer model is employed in order to verify the correctness and performance of NTW by comparing it to a serial deterministic simulation in NEURON. We also derived a discrete event calcium wave model from a deterministic model using the stochastic IP3R structure.
Massively parallelized replica-exchange simulations of polymers on GPUs
NASA Astrophysics Data System (ADS)
Gross, Jonathan; Janke, Wolfhard; Bachmann, Michael
2011-08-01
We discuss the advantages of parallelization by multithreading on graphics processing units (GPUs) for parallel tempering Monte Carlo computer simulations of an exemplified bead-spring model for homopolymers. Since the sampling of a large ensemble of conformations is a prerequisite for the precise estimation of statistical quantities such as typical indicators for conformational transitions like the peak structure of the specific heat, the advantage of a strong increase in performance of Monte Carlo simulations cannot be overestimated. Employing multithreading and utilizing the massive power of the large number of cores on GPUs, being available in modern but standard graphics cards, we find a rapid increase in efficiency when porting parts of the code from the central processing unit (CPU) to the GPU.
Parallel algorithms for simulating continuous time Markov chains
NASA Technical Reports Server (NTRS)
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
cellGPU: Massively parallel simulations of dynamic vertex models
NASA Astrophysics Data System (ADS)
Sussman, Daniel M.
2017-10-01
Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
NASA Astrophysics Data System (ADS)
Bhatti, Ghulam M.; Vakili, Pirooz
1997-06-01
There are significant opportunities for the development of parallel/distributed simulation algorithms in the context of parametric study of discrete event systems. In such studies, simulation of multiple (often a large number of) parametric variants is required in order to, for example, identify significant parameters (factor screening), determine directions for response improvement (gradient estimation), find optimal parameter settings (response optimization), or construct a model of the response (meta-modeling). The computational burden in this case is to a large extent due to the large number of alternatives that need to be simulated. An effective strategy in this context is to concurrently simulate a number of parametric variants: the structural similarity of the variants often allows for significant amount of sharing of the simulation work, and the code for concurrent simulation of the variants can often be implemented in a parallel/distributed environment. In this paper, we describe two methods of parallel/distributed/concurrent simulation called the standard clock (SC) and the general shared clock (GSC) simulation. Both approaches rely on an event-reservation approach: by contrast to most discrete-event simulation approaches that are based on an event-scheduling approach, in the SC and GSC simulation, the occurrence instances of all events are reserved on the time axis. These instances may or may not be used. This event-reservation approach frees the clock mechanism of the simulation from needing feedback from the state-update mechanism. Due to this autonomy of the clock mechanism, a single clock can be used to drive a number (possibly large) of variants concurrently and in parallel. The autonomy of the clock mechanism is also the key to the different implementation strategies we adopt. To illustrate, we describe the simulation of parametric versions of wireless communication networks on message passing and shared memory environments.
Time parallelization of advanced operation scenario simulations of ITER plasma
Samaddar, D.; Casper, T. A.; Kim, S. H.; Berry, Lee A; Elwasif, Wael R; Batchelor, Donald B; Houlberg, Wayne A
2013-01-01
This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA - an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.
Xyce™ Parallel Electronic Simulator Reference Guide, Version 6.5
Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting; Russo, Thomas V.; Schiek, Richard L.; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.
2016-06-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Synchronous Parallel System for Emulation and Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
2001-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
Synchronous parallel system for emulation and discrete event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1992-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
A parallel 3D poisson solver for space charge simulation in cylindrical coordinates.
Xu, J.; Ostroumov, P. N.; Nolen, J.; Physics
2008-02-01
This paper presents the development of a parallel three-dimensional Poisson solver in cylindrical coordinate system for the electrostatic potential of a charged particle beam in a circular tube. The Poisson solver uses Fourier expansions in the longitudinal and azimuthal directions, and Spectral Element discretization in the radial direction. A Dirichlet boundary condition is used on the cylinder wall, a natural boundary condition is used on the cylinder axis and a Dirichlet or periodic boundary condition is used in the longitudinal direction. A parallel 2D domain decomposition was implemented in the (r,{theta}) plane. This solver was incorporated into the parallel code PTRACK for beam dynamics simulations. Detailed benchmark results for the parallel solver and a beam dynamics simulation in a high-intensity proton LINAC are presented. When the transverse beam size is small relative to the aperture of the accelerator line, using the Poisson solver in a Cartesian coordinate system and a Cylindrical coordinate system produced similar results. When the transverse beam size is large or beam center located off-axis, the result from Poisson solver in Cartesian coordinate system is not accurate because different boundary condition used. While using the new solver, we can apply circular boundary condition easily and accurately for beam dynamic simulations in accelerator devices.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
An automated parallel simulation execution and analysis approach
NASA Astrophysics Data System (ADS)
Dallaire, Joel D.; Green, David M.; Reaper, Jerome H.
2004-08-01
State-of-the-art simulation computing requirements are continually approaching and then exceeding the performance capabilities of existing computers. This trend remains true even with huge yearly gains in processing power and general computing capabilities; simulation scope and fidelity often increases as well. Accordingly, simulation studies often expend days or weeks executing a single test case. Compounding the problem, stochastic models often require execution of each test case with multiple random number seeds to provide valid results. Many techniques have been developed to improve the performance of simulations without sacrificing model fidelity: optimistic simulation, distributed simulation, parallel multi-processing, and the use of supercomputers such as Beowulf clusters. An approach and prototype toolset has been developed that augments existing optimization techniques to improve multiple-execution timelines. This approach, similar in concept to the SETI @ home experiment, makes maximum use of unused licenses and computers, which can be geographically distributed. Using a publish/subscribe architecture, simulation executions are dispatched to distributed machines for execution. Simulation results are then processed, collated, and transferred to a single site for analysis.
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.
Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.
2005-06-01
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to
Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman
2014-11-01
Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Conservative parallel simulation of priority class queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.
Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos
2015-11-01
The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.
Direct simulation of groundwater age
Goode, D.J.
1996-01-01
A new method is proposed to simulate groundwater age directly, by use of an advection-dispersion transport equation with a distributed zero-order source of unit (1) strength, corresponding to the rate of aging. The dependent variable in the governing equation is the mean age, a mass- weighted average age. The governing equation is derived from residence- time-distribution concepts for the case of steady flow. For the more general case of transient flow, a transient governing equation for age is derived from mass-conservation principles applied to conceptual 'age mass.' The age mass is the product of the water mass and its age, and age mass is assumed to be conserved during mixing. Boundary conditions include zero age mass flux across all noflow and inflow boundaries trod no age mass dispersive flux across outflow boundaries. For transient-flow conditions, the initial distribution of age must be known. The solution of the governing transport equation yields the spatial distribution of the mean groundwater age and includes diffusion, dispersion, mixing, and exchange processes that typically are considered only through tracer-specific solute transport simulation. Traditional methods have relied on advective transport to predict point values of groundwater travel time and age. The proposed method retains the simplicity and tracer-independence of advection-only models, but incorporates the effects of dispersion and mixing on volume- averaged age. Example simulations of age in two idealized regional aquifer systems, one homogeneous and the other layered, demonstrate the agreement between the proposed method and traditional particle-tracking approaches and illustrate use of the proposed method to determine the effects of diffusion, dispersion, and mixing on groundwater age.
Evaluation of parallel direct sparse linear solvers in electromagnetic geophysical problems
NASA Astrophysics Data System (ADS)
Puzyrev, Vladimir; Koric, Seid; Wilkin, Scott
2016-04-01
High performance computing is absolutely necessary for large-scale geophysical simulations. In order to obtain a realistic image of a geologically complex area, industrial surveys collect vast amounts of data making the computational cost extremely high for the subsequent simulations. A major computational bottleneck of modeling and inversion algorithms is solving the large sparse systems of linear ill-conditioned equations in complex domains with multiple right hand sides. Recently, parallel direct solvers have been successfully applied to multi-source seismic and electromagnetic problems. These methods are robust and exhibit good performance, but often require large amounts of memory and have limited scalability. In this paper, we evaluate modern direct solvers on large-scale modeling examples that previously were considered unachievable with these methods. Performance and scalability tests utilizing up to 65,536 cores on the Blue Waters supercomputer clearly illustrate the robustness, efficiency and competitiveness of direct solvers compared to iterative techniques. Wide use of direct methods utilizing modern parallel architectures will allow modeling tools to accurately support multi-source surveys and 3D data acquisition geometries, thus promoting a more efficient use of the electromagnetic methods in geophysics.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
MRISIMUL: a GPU-based parallel approach to MRI simulations.
Xanthis, Christos G; Venetis, Ioannis E; Chalkias, A V; Aletras, Anthony H
2014-03-01
A new step-by-step comprehensive MR physics simulator (MRISIMUL) of the Bloch equations is presented. The aim was to develop a magnetic resonance imaging (MRI) simulator that makes no assumptions with respect to the underlying pulse sequence and also allows for complex large-scale analysis on a single computer without requiring simplifications of the MRI model. We hypothesized that such a simulation platform could be developed with parallel acceleration of the executable core within the graphic processing unit (GPU) environment. MRISIMUL integrates realistic aspects of the MRI experiment from signal generation to image formation and solves the entire complex problem for densely spaced isochromats and for a densely spaced time axis. The simulation platform was developed in MATLAB whereas the computationally demanding core services were developed in CUDA-C. The MRISIMUL simulator imaged three different computer models: a user-defined phantom, a human brain model and a human heart model. The high computational power of GPU-based simulations was compared against other computer configurations. A speedup of about 228 times was achieved when compared to serially executed C-code on the CPU whereas a speedup between 31 to 115 times was achieved when compared to the OpenMP parallel executed C-code on the CPU, depending on the number of threads used in multithreading (2-8 threads). The high performance of MRISIMUL allows its application in large-scale analysis and can bring the computational power of a supercomputer or a large computer cluster to a single GPU personal computer.
Development of magnetron sputtering simulator with GPU parallel computing
NASA Astrophysics Data System (ADS)
Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil
2014-12-01
Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that
Xyce Parallel Electronic Simulator Users' Guide Version 6.6.
Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason
2016-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c 2012
Xyce Parallel Electronic Simulator Users Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Parallel Direct Methods for Solving Banded Linear Systems.
1985-08-01
when passing from the 2-D grid architecture to the hypercube architecture. 22 [%22 References [1) J.J. Dongarra and A. Sameh , On some parallel banded...1985. In preparation. [101 A.H. Sameh , Numerical Parallel Algorithms - A Survey, D. Lawrie, A. Sameh ed., High .. Speed Computer and Algorithm
Parallel Direct Methods for Solving Banded Linear Systems.
1985-08-01
Dongarra and A. Sameh , On some parallel banded system solvers, Technical Report MCSD, No. 27, Argonne National Lab., 1984. [2] D. Gannon, J. van...Report , Computer Science Dept., Yale University, 1985. In preparation. [10] A.H. Sameh , Numerical Parallel Algorithms - A Survey., D. Lawrie, A. Sameh
CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation
NASA Astrophysics Data System (ADS)
Schneider, Evan E.; Robertson, Brant E.
2015-04-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
Numerical Simulation of Flow Field Within Parallel Plate Plastometer
NASA Technical Reports Server (NTRS)
Antar, Basil N.
2002-01-01
Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
Massively parallel simulations of multiphase flows using Lattice Boltzmann methods
NASA Astrophysics Data System (ADS)
Ahrenholz, Benjamin
2010-03-01
In the last two decades the lattice Boltzmann method (LBM) has matured as an alternative and efficient numerical scheme for the simulation of fluid flows and transport problems. Unlike conventional numerical schemes based on discretizations of macroscopic continuum equations, the LBM is based on microscopic models and mesoscopic kinetic equations. The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics of microscopic or mesoscopic processes so that the macroscopic averaged properties obey the desired macroscopic equations. Especially applications involving interfacial dynamics, complex and/or changing boundaries and complicated constitutive relationships which can be derived from a microscopic picture are suitable for the LBM. In this talk a modified and optimized version of a Gunstensen color model is presented to describe the dynamics of the fluid/fluid interface where the flow field is based on a multi-relaxation-time model. Based on that modeling approach validation studies of contact line motion are shown. Due to the fact that the LB method generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallelization. Hence, it is possible to perform efficient simulations in complex geometries at a large scale by massively parallel computations. Here, the results of drainage and imbibition (Degree of Freedom > 2E11) in natural porous media gained from microtomography methods are presented. Those fully resolved pore scale simulations are essential for a better understanding of the physical processes in porous media and therefore important for the determination of constitutive relationships.
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
Schneider, Evan E.; Robertson, Brant E.
2015-04-15
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika
2010-07-15
The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.
Plimpton, Steve; Thompson, Aidan; Crozier, Paul
LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.
NASA Astrophysics Data System (ADS)
Böddeker, B.; Teichler, H.
The MD simulation program TABB is motivated by the need of long time simulations for the investigation of slow processes near the glass transition of glass forming alloys. TABB is written in C++ with a high degree of flexibility: TABB allows the use of any short ranged pair potentials or EAM potentials, by generating and using a spline representation of all functions and their derivatives. TABB supports several numerical integration algorithms like the Runge-Kotta or the modified Gear-predictor-corrector algorithm of order five. The boundary conditions can be chosen to resemble the geometry of bulk materials or films. The simulation box length or the pressure can be fixed for each dimension separately. TABB may be used in isokinetic, isoenergeric or canonic (with random forces) mode. TABB contains a simple instruction interpreter to easily control the parameters and options during the simulation. The same source code can be compiled either for workstations or for parallel computers. The main optimization goal of TABB is to allow long time simulations of medium or small sized systems. To make this possible, much attention is spent on the optimized communication between the nodes. TABB uses a domain decomposition procedure. To use many nodes with a small system, the domain size has to be small compared to the range of particle interactions. In the limit of many nodes for only few atoms, the bottle neck of communication is the latency time. TABB minimizes the number of pairs of domains containing atoms that interact between these domains. This procedure minimizes the need of communication calls between pairs of nodes. TABB decides automatically, to how many, and to which directions the decomposition shall be applied. E.g., in the case of one dimensional domain decomposition, the simulation box is only split into "slabs" along a selected direction. The three dimensional domain decomposition is best with respect to the number of interacting domains only for simulations
Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines
NASA Technical Reports Server (NTRS)
Kiris, Cetin C.; Kwak, Dochan; Chan, William
2000-01-01
This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.
Parallel collisionless shocks forming in simulations of the LAPD experiment
NASA Astrophysics Data System (ADS)
Weidl, Martin S.; Jenko, Frank; Niemann, Chris; Winske, Dan
2016-10-01
Research on parallel collisionless shocks, most prominently occurring in the Earth's bow shock region, has so far been limited to satellite measurements and simulations. However, the formation of collisionless shocks depends on a wide range of parameters and scales, which can be accessed more easily in a laboratory experiment. Using a kJ-class laser, an ongoing experimental campaign at the Large Plasma Device (LAPD) at UCLA is expected to produce the first laboratory measurements of the formation of a parallel collisionless shock. We present hybrid kinetic/MHD simulations that show how beam instabilities in the background plasma can be driven by ablating carbon ions from a target, causing non-linear density oscillations which develop into a propagating shock front. The free-streaming carbon ions can excite both the resonant right-hand instability and the non-resonant firehose mode. We analyze their respective roles and discuss optimizing their growth rates to speed up the process of shock formation.
Parallel Simulation of Wave Propagation in Three-Dimensional Poroelastic Media
NASA Astrophysics Data System (ADS)
Sheen, D.; Baag, C.; Tuncay, K.; Ortoleva, P. J.
2003-12-01
Parallelized velocity-stress staggered-grid finite-difference method to simulate wave propagation in 3-D heterogeneous poroelastic media is presented. Biot_s poroelasticity theory is used to study the behavior of wavefield in fluid saturated media. In the poroelasticity theory, the fluid velocities and pressure are included as additional field variables to those for the pure elasticity in order to describe the interaction between pore fluid and solid. Discretization of governing equations for finite-difference approximation is performed for total of 13 components of field variables in 3-D Cartesian coordinates: six components for velocity, six components for solid stress, and a component for fluid pressure. The scheme has fourth-order accuracy in space and second-order accuracy in time. Also, to simulate wave propagation in an unbounded medium, the perfectly matched layer (PML) method is used as an absorbing boundary condition. In contrast with the pure elastic problem, the larger number of components to describe the poroelasticity requires a huge sum of core memory inevitably. In the case of modeling in a realistic scale, the computation is hardly to run on serial platforms. Therefore, the computationally efficient scheme to run on a large parallel environment is required. The parallel implementation is achieved by using a spatial decomposition and the portable message passing interface (MPI) for communication between neighboring processors. Direct comparisons are made for serial and parallel computations. The inevitability and efficiency of parallelization for the poroelastic wave modeling are also demonstrated using model examples.
New Directions in Maintenance Simulation.
ERIC Educational Resources Information Center
Miller, Gary G.
A two-phase effort was conducted to design and evaluate a maintenance simulator which incorporated state-of-the-art information in simulation and instructional technology. The particular equipment selected to be simulated was the 6883 Convert/Flight Controls Test Station. Phase I included a generalized block diagram of the computer-trainer, the…
New Directions in Maintenance Simulation.
ERIC Educational Resources Information Center
Miller, Gary G.
A two-phase effort was conducted to design and evaluate a maintenance simulator which incorporated state-of-the-art information in simulation and instructional technology. The particular equipment selected to be simulated was the 6883 Convert/Flight Controls Test Station. Phase I included a generalized block diagram of the computer-trainer, the…
Concurrent simulation of a parallel jaw end effector
NASA Technical Reports Server (NTRS)
Bynum, Bill
1985-01-01
A system of programs developed to aid in the design and development of the command/response protocol between a parallel jaw end effector and the strategic planner program controlling it are presented. The system executes concurrently with the LISP controlling program to generate a graphical image of the end effector that moves in approximately real time in response to commands sent from the controlling program. Concurrent execution of the simulation program is useful for revealing flaws in the communication command structure arising from the asynchronous nature of the message traffic between the end effector and the strategic planner. Software simulation helps to minimize the number of hardware changes necessary to the microprocessor driving the end effector because of changes in the communication protocol. The simulation of other actuator devices can be easily incorporated into the system of programs by using the underlying support that was developed for the concurrent execution of the simulation process and the communication between it and the controlling program.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
NASA Technical Reports Server (NTRS)
Hesse, Michael; Birn, Joachim
1989-01-01
Properties of the electric field component parallel to the magnetic field (E sub parallel) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component were observed. Particularly emphasized was the spatial location of E(sub parallel), the concept of a diffusion zone and the role of E(sub parallel) in accelerating electrons. A localization of the region of enhanced E(sub parallel) in all space directions with a strong concentration in the z direction was found. This region was identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B(sub y) implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B(sub y) field is strong enough to force particles to follow field lines through the diffusion region. It is estimated that for a typical net B(sub y) field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the Northern (Southern) Hemisphere should dominate for duskward (dawnward) net B(sub y). In addition, a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras was observed.
Direct parallel image reconstructions for spiral trajectories using GRAPPA.
Heidemann, Robin M; Griswold, Mark A; Seiberlich, Nicole; Krüger, Gunnar; Kannengiesser, Stephan A R; Kiefer, Berthold; Wiggins, Graham; Wald, Lawrence L; Jakob, Peter M
2006-08-01
The use of spiral trajectories is an efficient way to cover a desired k-space partition in magnetic resonance imaging (MRI). Compared to conventional Cartesian k-space sampling, it allows faster acquisitions and results in a slight reduction of the high gradient demand in fast dynamic scans, such as in functional MRI (fMRI). However, spiral images are more susceptible to off-resonance effects that cause blurring artifacts and distortions of the point-spread function (PSF), and thereby degrade the image quality. Since off-resonance effects scale with the readout duration, the respective artifacts can be reduced by shortening the readout trajectory. Multishot experiments represent one approach to reduce these artifacts in spiral imaging, but result in longer scan times and potentially increased flow and motion artifacts. Parallel imaging methods are another promising approach to improve image quality through an increase in the acquisition speed. However, non-Cartesian parallel image reconstructions are known to be computationally time-consuming, which is prohibitive for clinical applications. In this study a new and fast approach for parallel image reconstructions for spiral imaging based on the generalized autocalibrating partially parallel acquisitions (GRAPPA) methodology is presented. With this approach the computational burden is reduced such that it becomes comparable to that needed in accelerated Cartesian procedures. The respective spiral images with two- to eightfold acceleration clearly benefit from the advantages of parallel imaging, such as enabling parallel MRI single-shot spiral imaging with the off-resonance behavior of multishot acquisitions. Copyright 2006 Wiley-Liss, Inc.
Roadmap for efficient parallelization of breast anatomy simulation
NASA Astrophysics Data System (ADS)
Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.
2012-03-01
A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.
Parallel electromagnetic simulator based on the Finite-Difference Time Domain method
NASA Astrophysics Data System (ADS)
Walendziuk, Wojciech
2006-03-01
In the following paper the parallel tool for electromagnetic field distribution analysis is presented. The main simulation programme is based on the parallel algorithm of the Finite-Difference Time-Domain method and use Message Passing Interface as a communication library. In the paper also ways of communications among computation nodes in a parallel environment and efficiency of the parallel algorithm are presented.
Parallel implementation of a particle simulation for modeling rarefied gas dynamic flow
NASA Technical Reports Server (NTRS)
Fallavollita, M. A.; Mcdonald, J. D.; Baganoff, D.
1992-01-01
When the conditions of flow are rarefied and hypersonic, a more suitable alternative to the use of the Navier-Stokes equations for developing a numerical solution is the Direct Simulation Monte Carlo method (DSMC), a method of simulation which employs a large number of particles in modeling a rarefied gas. The performance of a parallel DSMC code developed for the Intel iPSC/860 Touchstone Gamma prototype computer is studied and the scaleup is found to be very nearly over the range of 16-128 processors.
Parallel grid library for rapid and flexible simulation development
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2013-04-01
We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Massive parallel 3D PIC simulation of negative ion extraction
NASA Astrophysics Data System (ADS)
Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu
2017-09-01
The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
Efficient solid state NMR powder simulations using SMP and MPP parallel computation
NASA Astrophysics Data System (ADS)
Kristensen, Jørgen Holm; Farnan, Ian
2003-04-01
Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations.
Large parallel cosmic string simulations: New results on loop production
Blanco-Pillado, Jose J.; Olum, Ken D.; Shlaer, Benjamin
2011-04-15
Using a new parallel computing technique, we have run the largest cosmic string simulations ever performed. Our results confirm the existence of a long transient period where a nonscaling distribution of small loops is produced at lengths depending on the initial correlation scale. As time passes, this initial population gives way to the true scaling regime, where loops of size approximately equal to one-twentieth the horizon distance become a significant component. We observe similar behavior in matter and radiation eras, as well as in flat space. In the matter era, the scaling population of large loops becomes the dominant component; we expect this to eventually happen in the other eras as well.
Parallel multiphase field simulations with OpenPhase
NASA Astrophysics Data System (ADS)
Tegeler, Marvin; Shchyglo, Oleg; Kamachali, Reza Darvishi; Monas, Alexander; Steinbach, Ingo; Sutmann, Godehard
2017-06-01
The open-source software project OpenPhase allows the three-dimensional simulation of microstructural evolution using the multiphase field method. The core modules of OpenPhase and their implementation as well as their parallelization for a distributed-memory setting are presented. Especially communication and load-balancing strategies are discussed. Synchronization points are avoided by an increased halo-size, i.e. additional layers of ghost cells, which allow multiple stencil operations without data exchange. Load-balancing is considered via graph-partitioning and sub-domain decomposition. Results are presented for performance benchmarks as well as for a variety of applications, e.g. grain growth in polycrystalline materials, including a large number of phase fields as well as Mg-Al alloy solidification.
NASA Astrophysics Data System (ADS)
Sharma, Anupam; Long, Lyle N.
2004-10-01
A particle approach using the Direct Simulation Monte Carlo (DSMC) method is used to solve the problem of blast impact with structures. A novel approach to model the solid boundary condition for particle methods is presented. The solver is validated against an analytical solution of the Riemann shocktube problem and against experiments on interaction of a planar shock with a square cavity. Blast impact simulations are performed for two model shapes, a box and an I-shaped beam, assuming that the solid body does not deform. The solver uses domain decomposition technique to run in parallel. The parallel performance of the solver on two Beowulf clusters is also presented.
GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Pall, Szilard; Schulz, Roland; Larsson, Per; Bjelkmar, Par; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-02-13
In this study, molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. As a result, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-28
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations
NASA Astrophysics Data System (ADS)
Hunsaker, Isaac L.
Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash
Luo, Y.; Xiong, Y. Y.; Chen, S. Y.; Huang, J.; Tang, C. J.
2016-04-15
The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognized as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Parallel cluster labeling for large-scale Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Flanigan, M.; Tamayo, P.
1995-02-01
We present an optimized version of a cluster labeling algorithm previously introduced by the authors. This algorithm is well suited for large-scale Monte Carlo simulations of spin models using cluster dynamics on parallel computers with large numbers of processors. The algorithm divides physical space into rectangular cells which are assigned to processors and combines a serial local labeling procedure with a relaxation process across nearest-neighbor processors. By controlling overhead and reducing inter-processor communication this method attains good computational speed-up and efficiency. Large systems of up to 65536 2 spins have been simulated at updating speeds of 11 nanosecs/site (90.7 × 10 6 spin updates/sec) using state-of-the-art supercomputers. In the second part of the article we use the cluster algorithm to study the relaxation of magnetization and energy on large Ising models using Swendsen-Wang dynamics. We found evidence that exponential and power law factors are present in the relaxation process as has been proposed by Hackl et al. The variation of the power-law exponent λM taken at face value indicates that the value of ZM falls in the interval 0.31-0.49 for the time interval analysed and appears to vanish asymptotically.
A massively parallel solution strategy for efficient thermal radiation simulation
NASA Astrophysics Data System (ADS)
Nguyen, P. D.; Moureau, V.; Vervisch, L.; Perret, N.
2012-06-01
A novel and efficient methodology to solve the Radiative Transfer Equations (RTE) in thermal radiation is discussed. The BiCGStab(2) iterative solution method, as designed for the non-symmetric linear equation systems, is used to solve the discretized RTE. The numerical upwind and central schemes are blended to provide a stable numerical scheme (MUCS) for interpolation of the cell facial radiation intensities in finite volume formulation. The combination of the BiCGStab(2) and MUCS methods proved to be very efficient when coupling with the DOM approach to solve the RTE. A cost-effective tabulation technique for the gaseous radiative property model SNB-FSCK using 7-point Gauss-Labatto quadrature scheme is also introduced. The whole methodology is implemented into a massively parallel unstructured CFD code where the radiative and fluid flow solutions share the same domain decomposition, which is the bottleneck in current radiative solvers. The dual mesh decomposition at the cell groups level and processors level is adopted to optimize the CFD code for massively parallel computing. The whole method is applied to simulate the radiation heat-transfer in a 3D rectangular enclosure containing non-isothermal CO2 and H2O mixtures. Two test cases are studied for homogeneous and inhomogeneous distributions of CO2 and H2O in the enclosure. The result is reported for the heat flux and radiation energy source and the comparison is also made between the present methodology BiCGStab(2)/MUCS/tabulated SNB-FSCK, the benchmark method SNB-CK (implemented at 25cm-1 narrow-band) and some other methods available in the literature. The present method (BiCGStab(2)/MUCS/tabulated SNB-FSCK) yields more accurate predictions particularly for the radiation source term. When comparing with the benchmark solution, the relative error of the radiation source term is remarkably reduced to less than 4% and the CPU time is drastically diminished.
Xyce Parallel Electronic Simulator Reference Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
NASA Astrophysics Data System (ADS)
Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M. H.; Najafi, Bijan
2012-11-01
We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ˜10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm-3, bubble formation and restructuring of the water layers are observed.
Li, Qiyue; Qu, Xiaobo; Liu, Yunsong; Guo, Di; Lai, Zongying; Ye, Jing; Chen, Zhong
2015-06-01
Compressed sensing MRI (CS-MRI) is a promising technology to accelerate magnetic resonance imaging. Both improving the image quality and reducing the computation time are important for this technology. Recently, a patch-based directional wavelet (PBDW) has been applied in CS-MRI to improve edge reconstruction. However, this method is time consuming since it involves extensive computations, including geometric direction estimation and numerous iterations of wavelet transform. To accelerate computations of PBDW, we propose a general parallelization of patch-based processing by taking the advantage of multicore processors. Additionally, two pertinent optimizations, excluding smooth patches and pre-arranged insertion sort, that make use of sparsity in MR images are also proposed. Simulation results demonstrate that the acceleration factor with the parallel architecture of PBDW approaches the number of central processing unit cores, and that pertinent optimizations are also effective to make further accelerations. The proposed approaches allow compressed sensing MRI reconstruction to be accomplished within several seconds. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our
Direct stereo radargrammetric processing using massively parallel processing
NASA Astrophysics Data System (ADS)
Balz, Timo; Zhang, Lu; Liao, Mingsheng
2013-05-01
Synthetic Aperture Radar (SAR) offers many ways to reconstruct digital surface models (DSMs). The two most commonly used methods are SAR interferometry (InSAR) and stereo radargrammetry. Stereo radargrammetry is a very stable and reliable process and is far less affected by temporal decorrelation compared with InSAR. It is therefore often used for DSM generation in heavily vegetated areas. However, stereo radargrammetry often produces rather noisy DSMs, sometimes containing large outliers. In this manuscript, we present a new approach for stereo radargrammetric processing, where the homologous points between the images are found by geocoding large amount of points. This offers a very flexible approach, allowing the simultaneous processing of multiple images and of cross-heading image pairs. Our approach relies on a good initial geocoding accuracy of the data and on very fast processing using a massively parallel implementation. The approach is demonstrated using TerraSAR-X images from Mount Song, China, and from Trento, Italy.
NASA Astrophysics Data System (ADS)
Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan
2016-01-01
3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Contact-impact simulations on massively parallel SIMD supercomputers
Plaskacz, E.J. ); Belytscko, T.; Chiang, H.Y. )
1992-01-01
The implementation of explicit finite element methods with contact-impact on massively parallel SIMD computers is described. The basic parallel finite element algorithm employs an exchange process which minimizes interprocessor communication at the expense of redundant computations and storage. The contact-impact algorithm is based on the pinball method in which compatibility is enforced by preventing interpenetration on spheres embedded in elements adjacent to surfaces. The enhancements to the pinball algorithm include a parallel assembled surface normal algorithm and a parallel detection of interpenetrating pairs. Some timings with and without contact-impact are given.
Direct simulation of compressible reacting flows
NASA Technical Reports Server (NTRS)
Poinsot, Thierry J.
1989-01-01
A research program for direct numerical simulations of compressible reacting flows is described. Two main research subjects are proposed: the effect of pressure waves on turbulent combustion and the use of direct simulation methods to validate flamelet models for turbulent combustion. The interest of a compressible code to study turbulent combustion is emphasized through examples of reacting shear layer and combustion instabilities studies. The choice of experimental data to compare with direct simulation results is discussed. A tentative program is given and the computation cases to use are described as well as the code validation runs.
Xyce Parallel Electronic Simulator Reference Guide Version 6.6.
Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason
2016-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c 2012 and 2016, Regents of the University of California. The BSIM-SOI is Copyright c 1990, Regents of the University of California. All rights reserved. The Mextram model has been developed by NXP Semiconductors until 2007, Delft University of Technology from 2007 to 2014, and Auburn University since April 2015. Copyrights c of Mextram are with Delft University of Technology, NXP Semiconductors and Auburn University. The MIT VS Model Research Group developed the MIT Virtual Source (MVS) model. Copyright c 2013 Massachusetts Institute of Technology (MIT). The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are
NASA Astrophysics Data System (ADS)
Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles
2007-11-01
Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.
NASA Astrophysics Data System (ADS)
Thummala, P.; Zhang, Z.; Andersen, M. A. E.; Rahimullah, S.
2014-03-01
Dielectric electroactive polymer (DEAP) actuators are capacitive devices which provide mechanical motions when charged electrically. The charging characteristics of a DEAP actuator depends on its size, voltage applied to its electrodes, and its operating frequency. The main idea of this paper is to design and implement driving circuits for the DEAP actuators for their use in various applications. This paper presents implementation of parallel input, parallel output, high voltage (~2.5 kV) bi-directional DC-DC converters for driving the DEAP actuators. The topology is a bidirectional flyback DC-DC converter incorporating commercially available high voltage MOSFETs (4 kV) and high voltage diodes (5 kV). Although the average current of the aforementioned devices is limited to 300 mA and 150 mA, respectively, connecting the outputs of multiple converters in parallel can provide a scalable design. This enables operating the DEAP actuators in various static and dynamic applications e.g. positioning, vibration generation or damping, and pumps. The proposed idea is experimentally verified by connecting three high voltage converters in parallel to operate a single DEAP actuator. The experimental results with both film capacitive load and the DEAP actuator are shown for a maximum charging voltage of 2 kV.
On the suitability of the connection machine for direct particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonard
1990-01-01
The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Lattice Boltzmann modeling of directional wetting: Comparing simulations to experiments
NASA Astrophysics Data System (ADS)
Jansen, H. Patrick; Sotthewes, Kai; van Swigchem, Jeroen; Zandvliet, Harold J. W.; Kooij, E. Stefan
2013-07-01
Lattice Boltzmann Modeling (LBM) simulations were performed on the dynamic behavior of liquid droplets on chemically striped patterned surfaces, ultimately with the aim to develop a predictive tool enabling reliable design of future experiments. The simulations accurately mimic experimental results, which have shown that water droplets on such surfaces adopt an elongated shape due to anisotropic preferential spreading. Details of the contact line motion such as advancing of the contact line in the direction perpendicular to the stripes exhibit pronounced similarities in experiments and simulations. The opposite of spreading, i.e., evaporation of water droplets, leads to a characteristic receding motion first in the direction parallel to the stripes, while the contact line remains pinned perpendicular to the stripes. Only when the aspect ratio is close to unity, the contact line also starts to recede in the perpendicular direction. Very similar behavior was observed in the LBM simulations. Finally, droplet movement can be induced by a gradient in surface wettability. LBM simulations show good semiquantitative agreement with experimental results of decanol droplets on a well-defined striped gradient, which move from high- to low-contact angle surfaces. Similarities and differences for all systems are described and discussed in terms of the predictive capabilities of LBM simulations to model direction wetting.
Lattice Boltzmann modeling of directional wetting: comparing simulations to experiments.
Jansen, H Patrick; Sotthewes, Kai; van Swigchem, Jeroen; Zandvliet, Harold J W; Kooij, E Stefan
2013-07-01
Lattice Boltzmann Modeling (LBM) simulations were performed on the dynamic behavior of liquid droplets on chemically striped patterned surfaces, ultimately with the aim to develop a predictive tool enabling reliable design of future experiments. The simulations accurately mimic experimental results, which have shown that water droplets on such surfaces adopt an elongated shape due to anisotropic preferential spreading. Details of the contact line motion such as advancing of the contact line in the direction perpendicular to the stripes exhibit pronounced similarities in experiments and simulations. The opposite of spreading, i.e., evaporation of water droplets, leads to a characteristic receding motion first in the direction parallel to the stripes, while the contact line remains pinned perpendicular to the stripes. Only when the aspect ratio is close to unity, the contact line also starts to recede in the perpendicular direction. Very similar behavior was observed in the LBM simulations. Finally, droplet movement can be induced by a gradient in surface wettability. LBM simulations show good semiquantitative agreement with experimental results of decanol droplets on a well-defined striped gradient, which move from high- to low-contact angle surfaces. Similarities and differences for all systems are described and discussed in terms of the predictive capabilities of LBM simulations to model direction wetting.
Fernandez-Delgado, Manuel; Ribeiro, Jorge; Cernadas, Eva; Ameneiro, Senén Barro
2011-11-01
Parallel perceptrons (PPs) are very simple and efficient committee machines (a single layer of perceptrons with threshold activation functions and binary outputs, and a majority voting decision scheme), which nevertheless behave as universal approximators. The parallel delta (P-Delta) rule is an effective training algorithm, which, following the ideas of statistical learning theory used by the support vector machine (SVM), raises its generalization ability by maximizing the difference between the perceptron activations for the training patterns and the activation threshold (which corresponds to the separating hyperplane). In this paper, we propose an analytical closed-form expression to calculate the PPs' weights for classification tasks. Our method, called Direct Parallel Perceptrons (DPPs), directly calculates (without iterations) the weights using the training patterns and their desired outputs, without any search or numeric function optimization. The calculated weights globally minimize an error function which simultaneously takes into account the training error and the classification margin. Given its analytical and noniterative nature, DPPs are computationally much more efficient than other related approaches (P-Delta and SVM), and its computational complexity is linear in the input dimensionality. Therefore, DPPs are very appealing, in terms of time complexity and memory consumption, and are very easy to use for high-dimensional classification tasks. On real benchmark datasets with two and multiple classes, DPPs are competitive with SVM and other approaches but they also allow online learning and, as opposed to most of them, have no tunable parameters.
High-performance parallel sparse-direct triangular solves (Invited)
NASA Astrophysics Data System (ADS)
Poulson, J.; Ying, L.
2013-12-01
Geophysical inverse problems are increasingly posed in the frequency domain in a manner which requires solving many challenging heterogeneous 3D Helmholtz or linear elastic wave equations at each iteration. One effective means of solving such problems, at least when there is no large-scale internal resonance, is to use moving-PML "sweeping preconditioners". Each application of the sweeping preconditioner involves performing many modest-sized sparse-direct triangular solves -- unfortunately, one at a time. While P. et al. have shown that, with a careful implementation of a distributed sparse-direct solver [1,2], challenging 3D problems approaching a billion degrees of freedom can be solved in a few minutes using less than 10,000 cores, this talk discusses how to leverage the existence of many right-hand sides in order to increase the performance of the preconditioner applications by orders of magnitude. [1] http://github.com/poulson/Clique [2] http://github.com/poulson/PSP
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.
1994-11-01
inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP
Direct simulation of rotational and vibrational nonequilibrium
NASA Technical Reports Server (NTRS)
Boyd, Iain D.
1989-01-01
The ways in which energy transfer is calculated in the Direct Simulation Monte Carlo method are presented. An energy exchange model that deals with translational and rotational modes is described. A model for simulating the transfer of energy between the translational and vibrational modes is presented as well.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-14
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-07
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
A two-level parallel direct search implementation for arbitrarily sized objective functions
Hutchinson, S.A.; Shadid, N.; Moffat, H.K.
1994-12-31
In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
Short-term gas dispersion in idealised urban canopy in street parallel with flow direction
NASA Astrophysics Data System (ADS)
Chaloupecká, Hana; Jaňour, Zbyněk; Nosek, Štěpán
2016-03-01
Chemical attacks (e.g. Syria 2014-15 chlorine, 2013 sarine or Iraq 2006-7 chlorine) as well as chemical plant disasters (e.g. Spain 2015 nitric oxide, ferric chloride; Texas 2014 methyl mercaptan) threaten mankind. In these crisis situations, gas clouds are released. Dispersion of gas clouds is the issue of interest investigated in this paper. The paper describes wind tunnel experiments of dispersion from ground level point gas source. The source is situated in a model of an idealised urban canopy. The short duration releases of passive contaminant ethane are created by an electromagnetic valve. The gas cloud concentrations are measured in individual places at the height of the human breathing zone within a street parallel with flow direction by Fast-response Ionisation Detector. The simulations of the gas release for each measurement position are repeated many times under the same experimental set up to obtain representative datasets. These datasets are analysed to compute puff characteristics (arrival, leaving time and duration). The results indicate that the mean value of the dimensionless arrival time can be described as a growing linear function of the dimensionless coordinate in the street parallel with flow direction where the gas source is situated. The same might be stated about the dimensionless leaving time as well as the dimensionless duration, however these fits are worse. Utilising a linear function, we might also estimate some other statistical characteristics from datasets than the datasets means (medians, trimeans). The datasets of the dimensionless arrival time, the dimensionless leaving time and the dimensionless duration can be fitted by the generalized extreme value distribution (GEV) in all sampling positions except one.
NASA Astrophysics Data System (ADS)
Zhang, Shanghong; Yuan, Rui; Wu, Yu; Yi, Yujun
2016-01-01
The Open Accelerator (OpenACC) application programming interface is a relatively new parallel computing standard. In this paper, particle-based flow field simulations are examined as a case study of OpenACC parallel computation. The parallel conversion process of the OpenACC standard is explained, and further, the performance of the flow field parallel model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the parallel algorithm, a speedup factor of 18.26× is possible. In contrast, a speedup factor of just 11.77× was achieved with the conventional Open Multi-Processing (OpenMP) parallel mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC parallel mode is found to have good portability, making it easy to implement parallel computation from the original serial model.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
ANNarchy: a code generation approach to neural simulations on parallel hardware.
Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Understanding transition and turbulence through direct simulations
NASA Technical Reports Server (NTRS)
Spalart, P. R.; Kim, J. J.
1989-01-01
Direct simulations consist in solving the full Navier-Stokes equations, without any turbulence model, and describing all the detailed features of the flow. Usually the flows are three-dimensional and time-dependent and contain both coarse and fine structures, which makes the numerical task very challenging in terms of both the algorithm and the computational effort. Most of the work until now has involved spectral methods, which are highly accurate but not very flexible in terms of geometry or complex equations. For that reason, future work will also rely on high-order finite-difference or other methods. Direct simulations complement experimental work, and both contribute to the theory and the empirical knowledge of turbulence. Once such a simulation has been shown to be accurate, the flow field is completely known in three dimensions and time, including the pressure, the vorticity and any other quantity. On the other hand, most simulations to date solved the incompressible equations in rather simple geometries, and direct simulations will always be limited to moderate Reynolds numbers. Extensive simulations have been conducted in homogeneous turbulence, channel flows, boundary layers, and mixing layers. Much effort is devoted to addressing flows with compressibility and chemical reactions, and to new geometries such as a backward-facing step.
Pesce, Lorenzo L.; Lee, Hyong C.; Stevens, Rick L.
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069
Parallel computing simulation of electrical excitation and conduction in the 3D human heart.
Di Yu; Dongping Du; Hui Yang; Yicheng Tu
2014-01-01
A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart simulation model with the use of massive parallel computing with GPGPU and OpenGL. The simulation algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based simulation outperforms the conventional CPU-based approach and significantly improves the speed of simulation. By adopting modern computer architecture, this present investigation enables real-time simulation and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart.
Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; ...
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less
Comparison of serial and parallel simulations of a corridor fire using FDS
NASA Astrophysics Data System (ADS)
Valasek, L.
2015-09-01
Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.
NASA Astrophysics Data System (ADS)
Weiss, C. J.; Schultz, A.
2011-12-01
The high computational cost of the forward solution for modeling low-frequency electromagnetic induction phenomena is one of the primary impediments against broad-scale adoption by the geoscience community of exploration techniques, such as magnetotellurics and geomagnetic depth sounding, that rely on fast and cheap forward solutions to make tractable the inverse problem. As geophysical observables, electromagnetic fields are direct indicators of Earth's electrical conductivity - a physical property independent of (but in some cases correlative with) seismic wavespeed. Electrical conductivity is known to be a function of Earth's physiochemical state and temperature, and to be especially sensitive to the presence of fluids, melts and volatiles. Hence, electromagnetic methods offer a critical and independent constraint on our understanding of Earth's interior processes. Existing methods for parallelization of time-harmonic electromagnetic simulators, as applied to geophysics, have relied heavily on a combination of strategies: coarse-grained decompositions of the model domain; and/or, a high-order functional decomposition across spectral components, which in turn can be domain-decomposed themselves. Hence, in terms of scaling, both approaches are ultimately limited by the growing communication cost as the granularity of the forward problem increases. In this presentation we examine alternate parallelization strategies based on OpenMP shared-memory parallelization and CUDA-based GPU parallelization. As a test case, we use two different numerical simulation packages, each based on a staggered Cartesian grid: FDM3D (Weiss, 2006) which solves the curl-curl equation directly in terms of the scattered electric field (available under the LGPL at www.openem.org); and APHID, the A-Phi Decomposition based on mixed vector and scalar potentials, in which the curl-curl operator is replaced operationally by the vector Laplacian. We describe progress made in modifying the code to
Robust large-scale parallel nonlinear solvers for simulations.
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write
Static and dynamic load-balancing strategies for parallel reservoir simulation
Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.
1995-12-31
Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Directional benefit in simulated classroom environments.
Ricketts, Todd; Galster, Jason; Tharpe, Anne Marie
2007-12-01
To examine speech recognition performance and subjective ratings for directional and omnidirectional microphone modes across a variety of simulated classroom environments. Speech recognition was measured in a group of 26 children age 10-17 years in up to 8 listening environments. Significant directional benefit was found when the sound source(s) of interest was in front, and directional decrement was measured when the sound source of interest was behind the participants. Of considerable interest is that a directional decrement was observed in the absence of directional benefit when sources of interest were both in front of and behind the participants. In addition, limiting directional processing to the low frequencies eliminated both the directional deficit and the directional advantage. Although these data support the use of directional hearing aids in some noisy school environments, they also suggest that use of the directional mode should be limited to situations in which all talkers of interest are located in the front hemisphere. These results highlight the importance of appropriate switching between microphone modes in the school-age population.
NASA Astrophysics Data System (ADS)
Galo, J. R.; Albarreal, I. I.; Calzada, M. C.; Cruz, J. L.; Fernández-Cara, E.; Marín, M.
2008-12-01
For the solution of elliptic problems, fractional step methods and in particular alternating directions (ADI) methods are iterative methods where fractional steps are sequential. Therefore, they only accept parallelization at low level. In [T. Lu, P. Neittaanmäki, X.C. Tai, A parallel splitting-up method for partial differential equations and its applications to Navier-Stokes equations, RAIRO Modél. Math. Anal. Numér. 26 (6) (1992) 673-708], Lu et al. proposed a method where the fractional steps can be performed in parallel. We can thus speak of parallel fractional step (PFS) methods and, in particular, simultaneous directions (SDI) methods. In this paper, we perform a detailed analysis of the convergence and optimization of PFS and SDI methods, complementing what was done in [T. Lu, P. Neittaanmäki, X.C. Tai, A parallel splitting-up method for partial differential equations and its applications to Navier-Stokes equations, RAIRO Modél. Math. Anal. Numér. 26 (6) (1992) 673-708]. We describe the behavior of the method and we specify the good choice of the parameters. We also study the efficiency of the parallelization. Some 2D, 3D and high-dimensional tests confirm our results.
Computer simulation program for parallel SITAN. [Sandia Inertia Terrain-Aided Navigation, in FORTRAN
Andreas, R.D.; Sheives, T.C.
1980-11-01
This computer program simulates the operation of parallel SITAN using digitized terrain data. An actual trajectory is modeled including the effects of inertial navigation errors and radar altimeter measurements.
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0
NASA Astrophysics Data System (ADS)
Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin
2016-04-01
This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0
Fathany, Maulana Yusuf Fuada, Syifaul Lawu, Braham Lawas Sulthoni, Muhammad Amin
2016-04-19
This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
The IDES framework: A case study in development of a parallel discrete-event simulation system
Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.
1997-12-31
This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.
Parallel implementation of the FETI-DPEM algorithm for general 3D EM simulations
NASA Astrophysics Data System (ADS)
Li, Yu-Jia; Jin, Jian-Ming
2009-05-01
A parallel implementation of the electromagnetic dual-primal finite element tearing and interconnecting algorithm (FETI-DPEM) is designed for general three-dimensional (3D) electromagnetic large-scale simulations. As a domain decomposition implementation of the finite element method, the FETI-DPEM algorithm provides fully decoupled subdomain problems and an excellent numerical scalability, and thus is well suited for parallel computation. The parallel implementation of the FETI-DPEM algorithm on a distributed-memory system using the message passing interface (MPI) is discussed in detail along with a few practical guidelines obtained from numerical experiments. Numerical examples are provided to demonstrate the efficiency of the parallel implementation.
Parallel simulation of compressible flow using automatic differentiation and PETSc.
Hovland, P. D.; McInnes, L. C.; Mathematics and Computer Science
2001-03-01
Many aerospace applications require parallel implicit solution strategies and software. The use of two computational tools, the Portable, Extensible Toolkit for Scientific computing (PETSc) and ADIFOR, to implement a Newton-Krylov-Schwarz method with pseudo-transient continuation for a particular application, namely, a steady-state, fully implicit, three-dimensional compressible Euler model of flow over an M6 wing is considered. How automatic differentiation (AD) can be used within the PETSc framework to compute the required derivatives is described. Performance data demonstrating the suitability of AD and PETSc for this problem are presented. A synopsis of results and a description of opportunities for future work concludes this paper.
Parallel Unsteady Turbopump Flow Simulations for Reusable Launch Vehicles
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2000-01-01
An efficient solution procedure for time-accurate solutions of Incompressible Navier-Stokes equation is obtained. Artificial compressibility method requires a fast convergence scheme. Pressure projection method is efficient when small time-step is required. The number of sub-iteration is reduced significantly when Poisson solver employed with the continuity equation. Both computing time and memory usage are reduced (at least 3 times). Other work includes Multi Level Parallelism (MLP) of INS3D, overset connectivity for the validation case, experimental measurements, and computational model for boost pump.
Hybrid simulations of a parallel collisionless shock in the large plasma device
NASA Astrophysics Data System (ADS)
Weidl, Martin S.; Winske, Dan; Jenko, Frank; Niemann, Chris
2016-12-01
We present two-dimensional hybrid kinetic/magnetohydrodynamic simulations of planned laser-ablation experiments in the Large Plasma Device. Our results, based on parameters that have been validated in previous experiments, show that a parallel collisionless shock can begin forming within the available space. Carbon-debris ions that stream along the magnetic-field direction with a blow-off speed of four times the Alfvén velocity excite strong magnetic fluctuations, eventually transferring part of their kinetic energy to the surrounding hydrogen ions. This acceleration and compression of the background plasma creates a shock front, which satisfies the Rankine-Hugoniot conditions and can therefore propagate on its own. Furthermore, we analyze the upstream turbulence and show that it is dominated by the right-hand resonant instability.
NASA Astrophysics Data System (ADS)
Shim, Yunsic; Amar, Jacques G.
2005-03-01
The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semirigorous synchronous sublattice algorithm for parallel kinetic Monte Carlo simulations. The accuracy and parallel efficiency are studied as a function of diffusion rate, processor size, and number of processors for a variety of simple models of epitaxial growth. The effects of fluctuations on the parallel efficiency are also studied. Since only local communications are required, linear scaling behavior is observed, e.g., the parallel efficiency is independent of the number of processors for fixed processor size.
Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.
1997-03-01
We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.
Parallel computations using a cluster of workstations to simulate elasticity problems
NASA Astrophysics Data System (ADS)
Darmawan, J. B. B.; Mungkasi, S.
2016-11-01
Computational physics has played important roles in real world problems. This paper is within the applied computational physics area. The aim of this study is to observe the performance of parallel computations using a cluster of workstations (COW) to simulate elasticity problems. Parallel computations with the COW configuration are conducted using the Message Passing Interface (MPI) standard. In parallel computations with COW, we consider five scenarios with twenty simulations. In addition to the execution time, efficiency is used to evaluate programming algorithm scenarios. Sequential and parallel programming performances are evaluated based on their execution time and efficiency. Results show that the one-dimensional elasticity equations are not appropriate to be solved in parallel with MPI_Send and MPI_Recv technique in the MPI standard, because the total amount of time to exchange data is considered more dominant compared with the total amount of time to conduct the basic elasticity computation.
Direct simulation with vibration-dissociation coupling
NASA Technical Reports Server (NTRS)
Hash, David B.; Hassan, H. A.
1992-01-01
The majority of implementations of the Direct Simulation Monte Carlo (DSMC) method of Bird do not account for vibration-dissociation coupling. Haas and Boyd have proposed the vibrationally-favored dissociation model to accomplish this task. This model requires measurements of induction distance to determine model constants. A more general expression has been derived that does not require any experimental input. The model is used to calculate one-dimensional shock waves in nitrogen and the flow past a lunar transfer vehicle (LTV). For the conditions considered in the simulation, the influence of vibration-dissociation coupling on heat transfer in the stagnation region of the LTV can be significant.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Xyce parallel electronic simulator users guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users' guide, Version 6.0.1.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.0.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
NASA Technical Reports Server (NTRS)
Sanger, Eugen
1932-01-01
In the present report the computation is actually carried through for the case of parallel spars of equal resistance in bending without direct loading, including plotting of the influence lines; for other cases the method of calculation is explained. The development of large size airplanes can be speeded up by accurate methods of calculation such as this.
A Parallel Simulated Annealing Approach to Solve for Earthquake Rupture Rates
NASA Astrophysics Data System (ADS)
Milner, K.; Page, M. T.; Field, E. H.
2011-12-01
We present a parallel approach to the classic simulated annealing algorithm (Kirkpatrick 1983) in order to solve for the rates of earthquake ruptures in California's complex fault system, being developed for the 3rd Uniform California Earthquake Rupture Forecast (UCERF3). Through the use of distributed computing, we have achieved substantial speedup when compared to serial simulated annealing. We will describe the parallel simulated annealing algorithm in detail, as well as the parallelization parameters used and their effect on speedup (time to convergence, or alternatively a specified energy level) and communications efficiency. Additionally we will discuss the correlation between performance of the parallel algorithm and the degree of constraints on the solution. We will present scaling results to thousands of processors, and experiences with the MPJ Express Java Message Passing Library (Baker 2006) on the University of Southern California's High Performance Computing and Communications cluster.
Virtual reality visualization of parallel molecular dynamics simulation
Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.
1995-12-31
When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.
Direct Simulation of Ultrafast Detonations in Mixtures
2005-07-13
gases. This was further supported by the Zeldovich - von Neumann - Döring ( ZND ) theories predicting Chapman-Jouguet velocities for detonations in...for the C-J and ZND theories are no longer valid. Previous work with the direct simulation method established conditions for forcing the reaction...characterized as ultrafast, as they were found to exceed the steady-state velocities predicted by the C-J and ZND theories. Continued investigation into the
Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang
2014-01-01
Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506
Wang, Zhiteng; Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang
2014-01-01
Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow.
Direct Observation of Parallel Folding Pathways Revealed Using a Symmetric Repeat Protein System
Aksel, Tural; Barrick, Doug
2014-01-01
Although progress has been made to determine the native fold of a polypeptide from its primary structure, the diversity of pathways that connect the unfolded and folded states has not been adequately explored. Theoretical and computational studies predict that proteins fold through parallel pathways on funneled energy landscapes, although experimental detection of pathway diversity has been challenging. Here, we exploit the high translational symmetry and the direct length variation afforded by linear repeat proteins to directly detect folding through parallel pathways. By comparing folding rates of consensus ankyrin repeat proteins (CARPs), we find a clear increase in folding rates with increasing size and repeat number, although the size of the transition states (estimated from denaturant sensitivity) remains unchanged. The increase in folding rate with chain length, as opposed to a decrease expected from typical models for globular proteins, is a clear demonstration of parallel pathways. This conclusion is not dependent on extensive curve-fitting or structural perturbation of protein structure. By globally fitting a simple parallel-Ising pathway model, we have directly measured nucleation and propagation rates in protein folding, and have quantified the fluxes along each path, providing a detailed energy landscape for folding. This finding of parallel pathways differs from results from kinetic studies of repeat-proteins composed of sequence-variable repeats, where modest repeat-to-repeat energy variation coalesces folding into a single, dominant channel. Thus, for globular proteins, which have much higher variation in local structure and topology, parallel pathways are expected to be the exception rather than the rule. PMID:24988356
NASA Astrophysics Data System (ADS)
Giacalone, Joe
2017-09-01
Using a self-consistent hybrid simulation, with kinetic protons and fluid electrons, we investigate the acceleration of thermal protons and minor ions (alphas, 3He ++, and C5+) by a quasi-parallel collisionless shock. The results are compared to spacecraft observations of a strong interplanetary shock seen by the Advanced Composition Explorer on DOY 94, 2001, which was associated with significant increases in the flux of > 50 keV/nuc ions. Our simulation uses similar plasma and shock parameters to those observed. The densities of minor ions for two of the species (alphas and C5+) were based on observations at thermal energies for this shock, and we used a nominal value for the density of 3He ++, since no observations at thermal energies was available to us. Acceleration of the ions by the shock leads to a high-energy tail in the distribution in the post-shock plasma for all ion species. We find that by extrapolating the simulated tails to the higher energies measured by ACE/EPAM and ACE/ULEIS, the intensity matches well the observations for protons, alphas, and carbon. This suggests that thermal solar wind, accelerated directly at the shock, is a significant source of the observed high-energy protons and these minor ions.
Characterization of Parallelism and Deadlocks in Distributed Digital Logic Simulation
1988-11-03
Using these definitions we get: LP was deadlocked by an unevaluated path of n levels, If For each input j where Ij < E7 m and each LPk. where bki=n...Science, August 1988. [161 Andrew Wilson. Paralelization of an Event Driven Simulator on the Encore Multimaz. Technical Report ETR 86-005, Encore Computer
Massively Parallel Reactive and Quantum Molecular Dynamics Simulations
NASA Astrophysics Data System (ADS)
Vashishta, Priya
2015-03-01
In this talk I will discuss two simulations: Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on a 163,840-processor BlueGene/P supercomputer to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit on the surface. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated. Quantum molecular dynamics (QMD) simulations are performed on 786,432-processor Blue Gene/Q to study on-demand production of hydrogen gas from water using Al nanoclusters. QMD simulations reveal rapid hydrogen production from water by an Al nanocluster. We find a low activation-barrier mechanism, in which a pair of Lewis acid and base sites on the Aln surface preferentially catalyzes hydrogen production. I will also discuss on-demand production of hydrogen gas from water using and LiAl alloy particles. Research reported in this lecture was carried in collaboration with Rajiv Kalia, Aiichiro Nakano and Ken-ichi Nomura from the University of Southern California, and Fuyuki Shimojo and Kohei Shimamura from Kumamoto University, Japan.
Directing crowd simulations using navigation fields.
Patil, Sachin; Berg, Jur van den; Curtis, Sean; Lin, Ming C; Manocha, Dinesh
2011-02-01
We present a novel approach to direct and control virtual crowds using navigation fields. Our method guides one or more agents toward desired goals based on guidance fields. The system allows the user to specify these fields by either sketching paths directly in the scene via an intuitive authoring interface or by importing motion flow fields extracted from crowd video footage. We propose a novel formulation to blend input guidance fields to create singularity-free, goal-directed navigation fields. Our method can be easily combined with the most current local collision avoidance methods and we use two such methods as examples to highlight the potential of our approach. We illustrate its performance on several simulation scenarios.
Polar-direct-drive simulations and experiments
Marozas, J.A.; Marshall, F.J.; Craxton, R.S.; Igumenshchev, I.V.; Skupsky, S.; Bonino, M.J.; Collins, T.J.B.; Epstein, R.; Glebov, V.Yu.; Jacobs-Perkins, D.; Knauer, J.P.; McCrory, R.L.; McKenty, P.W.; Meyerhofer, D.D.; Noyes, S.G.; Radha, P.B.; Sangster, T.C.; Seka, W.; Smalyuk, V.A.
2006-05-15
Polar direct drive (PDD) [S. Skupsky et al., Phys. Plasmas 11, 2763 (2004)] will allow direct-drive ignition experiments on the National Ignition Facility (NIF) [J. Paisner et al., Laser Focus World 30, 75 (1994)] as it is configured for x-ray drive. Optimal drive uniformity is obtained via a combination of beam repointing, pulse shapes, spot shapes, and/or target design. This article describes progress in the development of standard and 'Saturn' [R. S. Craxton and D. W. Jacobs-Perkins, Phys. Rev. Lett. 94, 0952002 (2005)] PDD target designs. Initial evaluation of experiments on the OMEGA Laser System [T. R. Boehly et al., Rev. Sci. Instrum. 66, 508 (1995)] and simulations were carried out with the two-dimensional hydrodynamics code SAGE [R. S. Craxton et al., Phys. Plasmas 12, 056304 (2005)]. This article adds to this body of work by including fusion particle production and transport as well as radiation transport within the two-dimensional DRACO [P. B. Radha et al., Phys. Plasmas 12, 032702 (2005)] hydrodynamics simulations used to model experiments. Forty OMEGA beams arranged in six rings to emulate the NIF x-ray-drive configuration are used to perform direct-drive implosions of CH shells filled with D{sub 2} gas. Target performance was diagnosed with framed x-ray backlighting and by the measured fusion yield. Saturn target experiments have resulted in {approx}75% of the yield from energy-equivalent, symmetrically irradiated implosions. The results of the two-dimensional PDD simulations performed with DRACO are in good agreement with experimental x-ray radiographs. DRACO is being used to further optimize standard PDD designs. In addition, DRACO simulations of NIF-scale PDD designs show ignition with a gain of 20 and the development of a 40 {mu}m radius, 10 keV region with a neutron-averaged {rho}r of 1270 mg/cm{sup 2} near stagnation.
NASA Astrophysics Data System (ADS)
Vashishta, Priya; Bachlechner, Martina; Nakano, Aiichiro; Campbell, Timothy J.; Kalia, Rajiv K.; Kodiyalam, Sanjay; Ogata, Shuji; Shimojo, Fuyuki; Walsh, Phillip
2001-10-01
We have developed scalable space-time multiresolution algorithms to enable molecular dynamics simulations involving up to a billion atoms on massively parallel computers. Large-scale molecular dynamics simulations have been used to study stress domains and interfacial fracture in semiconductor/dielectric nanopixels, nanoindentation, and oxidation of metallic nanoparticles.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
SimPL: A Simulator for Parallel LISP
1987-06-02
defun pexptsq (p n) (if ( oddp n) (if (= n 1) P ifLa-A 2J1 ILTl jL-’i J ^"_:i^vi lai W-J h 60 (ptimes p (pexptsq (future (ptimes p p)) (floor n 2...char write-string vrite-line vrite-byte format oddp evenp))) Define a macro to call the simulator so that quoting is not necessary. The first
Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers
1991-07-01
DTIC ~ ELECTE OCT 2 41991 AD-A242 045 Scan Directed Load Balancing for Highly-Parallel Mesh-Connected Computers’ Edoardo S. Biagioni Jan F. Prins...Department of Computer Science University of North Carolina Chapel Hill, N.C. 27599-3175 USA biagioni @cs.unc.edu prinsOcs.unc.edu Abstract Scan Directed...MasPar Computer Corpora- tion. Bibliography [1] Edoardo S. Biagioni . Scan Directed Load Balancing. PhD thesis., University of North Carolina, Chapel Hill
NASA Astrophysics Data System (ADS)
Krappel, Timo; Riedelbauch, Stefan; Jester-Zuerker, Roland; Jung, Alexander; Flurl, Benedikt; Unger, Friedeman; Galpin, Paul
2016-11-01
The operation of Francis turbines in part load conditions causes high fluctuations and dynamic loads in the turbine and especially in the draft tube. At the hub of the runner outlet a rotating vortex rope within a low pressure zone arises and propagates into the draft tube cone. The investigated part load operating point is at about 72% discharge of best efficiency. To reduce the possible influence of boundary conditions on the solution, a flow simulation of a complete Francis turbine is conducted consisting of spiral case, stay and guide vanes, runner and draft tube. As the flow has a strong swirling component for the chosen operating point, it is very challenging to accurately predict the flow and in particular the flow losses in the diffusor. The goal of this study is to reach significantly better numerical prediction of this flow type. This is achieved by an improved resolution of small turbulent structures. Therefore, the Scale Adaptive Simulation SAS-SST turbulence model - a scale resolving turbulence model - is applied and compared to the widely used RANS-SST turbulence model. The largest mesh contains 300 million elements, which achieves LES-like resolution throughout much of the computational domain. The simulations are evaluated in terms of the hydraulic losses in the machine, evaluation of the velocity field, pressure oscillations in the draft tube and visual comparisons of turbulent flow structures. A pre-release version of ANSYS CFX 17.0 is used in this paper, as this CFD solver has a parallel performance up to several thousands of cores for this application which includes a transient rotor-stator interface to support the relative motion between the runner and the stationary portions of the water turbine.
Implementation of a parallel algorithm for thermo-chemical nonequilibrium flow simulations
NASA Astrophysics Data System (ADS)
Wong, C. C.; Blottner, F. G.; Payne, J. L.; Soetrisno, M.
1995-01-01
Massively parallel (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to simulate a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale simulation with a large number of computer nodes. This paper will discuss the implementation of a massively parallel computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for
Partitioning and packing mathematical simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Arpasi, D. J.; Milner, E. J.
1986-01-01
The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
CODE BLUE: Three dimensional massively-parallel simulation of multi-scale configurations
NASA Astrophysics Data System (ADS)
Juric, Damir; Kahouadji, Lyes; Chergui, Jalel; Shin, Seungwon; Craster, Richard; Matar, Omar
2016-11-01
We present recent progress on BLUE, a solver for massively parallel simulations of fully three-dimensional multiphase flows which runs on a variety of computer architectures from laptops to supercomputers and on 131072 threads or more (limited only by the availability to us of more threads). The code is wholly written in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We developed parallel GMRES and multigrid iterative solvers suited to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. Particular attention is drawn to the details and performance of the parallel Multigrid solver. EPSRC UK Programme Grant MEMPHIS (EP/K003976/1).
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)
NASA Astrophysics Data System (ADS)
Kum, Oyeon
2004-11-01
Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Directional interstitial brachytherapy from simulation to application
NASA Astrophysics Data System (ADS)
Lin, Liyong
Organs at risk (OAR) are sometimes adjacent to or embedded in or overlap with the clinical target volume (CTV) to be treated. The purpose of this PhD study is to develop directionally low energy gamma-emitting interstitial brachytherapy sources. These sources can be applied between OAR to selectively reduce hot spots in the OARs and normal tissues. The reduction of dose over undesired regions can expand patient eligibility or reduce toxicities for the treatment by conventional interstitial brachytherapy. This study covers the development of a directional source from design optimization to construction of the first prototype source. The Monte Carlo code MCNP was used to simulate the radiation transport for the designs of directional sources. We have made a special construction kit to assemble radioactive and gold-shield components precisely into D-shaped titanium containers of the first directional source. Directional sources have a similar dose distribution as conventional sources on the treated side but greatly reduced dose on the shielded side, with a sharp dose gradient between them. A three-dimensional dose deposition kernel for the 125I directional source has been calculated. Treatment plans can use both directional and conventional 125I sources at the same source strength for low-dose-rate (LDR) implants to optimize the dose distributions. For prostate tumors, directional 125I LDR brachytherapy can potentially reduce genitourinary and gastrointestinal toxicities and improve potency preservation for low risk patients. The combination of better dose distribution of directional implants and better therapeutic ratio between tumor response and late reactions enables a novel temporary LDR treatment, as opposed to permanent or high-dose-rate (HDR) brachytherapy for the intermediate risk T2b and high risk T2c tumors. Supplemental external-beam treatments can be shortened with a better brachytherapy boost for T3 tumors. In conclusion, we have successfully finished the
Parallel 3D FDTD Simulator for Photonic Crystals
2007-06-01
an The outermost layer of grid cells is defined as a perfect absorbing boundary condition (ABC) developed by electric conductor ( PEG ) by setting the...limitation can be overcome by using a new impeinn f th o kown as Ns A alternate-direction implicit ( ADI ) approach that has been comparison of the...implicit formulation of the discretized Maxwell’s found to be limited by a number of shared-memory equations, called the ADI -FDTD method, results in a
NASA Astrophysics Data System (ADS)
Liu, Xiaofeng; Siddle-Mitchell, Seth
2015-11-01
This paper presents a novel pressure reconstruction method featuring rotating parallel ray omni-directional integration, as an improvement over the circular virtual boundary integration method introduced by Liu and Katz (2003, 2006, 2008 and 2013) for non-intrusive instantaneous pressure measurement in incompressible flow field. Unlike the virtual boundary omni-directional integration, where the integration path is originated from a virtual circular boundary at a finite distance from the real boundary of the integration domain, the new method utilizes parallel rays, which can be viewed as being originated from a distance of infinity, as guidance for integration paths. By rotating the parallel rays, omni-directional paths with equal weights coming from all directions toward the point of interest at any location within the computation domain will be generated. In this way, the location dependence of the integration weight inherent in the old algorithm will be eliminated. By implementing this new algorithm, the accuracy of the reconstructed pressure for a synthetic rotational flow in terms of r.m.s. error from theoretical values is reduced from 1.03% to 0.30%. Improvement is further demonstrated from the comparison of the reconstructed pressure with that from the Johns Hopkins University isotropic turbulence database (JHTDB). This project is funded by the San Diego State University.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm
NASA Technical Reports Server (NTRS)
Povitsky, A.
1998-01-01
In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
DL_POLY_2.0: a general-purpose parallel molecular dynamics simulation package.
Smith, W; Forester, T R
1996-06-01
DL_POLY_2.0 is a general-purpose parallel molecular dynamics simulation package developed at Daresbury Laboratory under the auspices of the Council for the Central Laboratory of the Research Councils. Written to support academic research, it has a wide range of applications and is designed to run on a wide range of computers: from single processor workstations to parallel supercomputers. Its structure, functionality, performance, and availability are described.
Direct numerical simulation of chemically reacting turbulence
NASA Astrophysics Data System (ADS)
Miyauchi, Toshio; Tanahashi, Mamoru
In this paper, we present two results of direct numerical simulation of chemically reacting flows. One is direct numerical simulation of chemically reacting two-dimensional mixing layer and the other is direct numerical simulation of chemically reacting compressible isotropic turbulence. As for the mixing layer, a low Mach number approximation was used to take into account the variable density effects on the flow fields and to clarify the effects of heat release and density difference of a mean flow. In the case of density difference, expansion and baroclinic torque has a negative contribution to the local vorticity transport in the high density side and a positive contribution in the low density side which results in an asymmetric vortical structure structure. Thes density difference suppresses the growth of mixing layer and causes the overshoot of mean velocity only in the high density side which coincides with an experimental result. Coupling effects of heat release and desnity difference are also investigated. As for the homogeneous turbulence, fully compressible Navier-Stokes equations are solved to clarify the interaction between turbulence and chemical reaction in turbulent diffusion flame. The chemical reaction is suppressed by the increase of heat release because of the decrease of density and local Reynolds number. However, the decay of enstrophy with heat release is slower than that without heat release because of strong baroclinic torque which is generated near the reaction zone. Also, large amount of heat release causes increase in turbulent energy through the pressure dilatation term. The pressure dilatation term shows the periodic fluctuation which has an acoustic time scale. The fluctuation is enhanced by the heat release and travels in the turbulent field as pressure and dilatation waves.
Compositional reservoir simulation on CM-5 and KSR-1 parallel machines
Ghori, S.G.; Wang, C.H.; Lim, M.T.; Pope, G.A.; Sepehrnoori, K.; Wheeler, M.F.
1995-12-31
Recently, use of parallel machines in reservoir simulation has received considerable attention from the petroleum industry. This paper presents parallelization of a 3D compositional, equation-of-state reservoir simulator on the CM-5 and KSR-1. To the best of the authors` knowledge, this is the first time that the parallelization of a compositional reservoir simulator has been performed on both the CM-5 and KSR-1. For new users of the CM-5 machines, the software and hardware of CM-5 architecture is presented, as well as details of the parallelization techniques. For example, domain decomposition, I/O`s, phase equilibrium computations, and well model are described. The parallelism techniques on the KSR-1 are presented with the emphasis on the porting of the phase equilibrium calculation. The performance of each machine is evaluated by showing the speedup on different sets of processing nodes. Two test problems were used to explore the capability of the parallelized version of the code; one is a waterflood problem and the other is a CO{sub 2} multiple contact miscible flood, both in a West Texas oil field. These field problems were run on 1, 2, 4, 8, 16, and 32 processors to get insight into the locations of communication bottlenecks, generally occurring in the programming with distributed memory machines. The problems of latency and bandwidth which are associated with communication efficiency of the CM-5 are also addressed.
Simulation of optically encoded multiplexing for parallel multipoint sensing.
Babu Rao, C; Chelliah, Pandian; Sahoo, Trilochan
2015-06-20
Spectral emission/absorption-based sensors are commonly used to monitor explosives, narcotics, and other restricted materials in high-security zones such as airports. Monitoring a broad range of spectral wavelengths with high spectral resolution would increase the repertoire of chemicals that can be monitored. However, a portable unit will have limitations in meeting these requirements. Optical fibers can be employed for collecting and transmitting spectral signals from portable sensor heads (PSHs) to a sensitive central spectral analyzer. However, simultaneous detection by sensors in multiple PSHs needs to be differentiated for identifying individual PSHs. An optical encoding method is presented in this paper for use of a portable unit for highly sensitive measurement. The methodology is demonstrated through a simulation using MATLAB Simulink.
SPILADY: A parallel CPU and GPU code for spin-lattice magnetic molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Ma, Pui-Wai; Dudarev, S. L.; Woo, C. H.
2016-10-01
Spin-lattice dynamics generalizes molecular dynamics to magnetic materials, where dynamic variables describing an evolving atomic system include not only coordinates and velocities of atoms but also directions and magnitudes of atomic magnetic moments (spins). Spin-lattice dynamics simulates the collective time evolution of spins and atoms, taking into account the effect of non-collinear magnetism on interatomic forces. Applications of the method include atomistic models for defects, dislocations and surfaces in magnetic materials, thermally activated diffusion of defects, magnetic phase transitions, and various magnetic and lattice relaxation phenomena. Spin-lattice dynamics retains all the capabilities of molecular dynamics, adding to them the treatment of non-collinear magnetic degrees of freedom. The spin-lattice dynamics time integration algorithm uses symplectic Suzuki-Trotter decomposition of atomic coordinate, velocity and spin evolution operators, and delivers highly accurate numerical solutions of dynamic evolution equations over extended intervals of time. The code is parallelized in coordinate and spin spaces, and is written in OpenMP C/C++ for CPU and in CUDA C/C++ for Nvidia GPU implementations. Temperatures of atoms and spins are controlled by Langevin thermostats. Conduction electrons are treated by coupling the discrete spin-lattice dynamics equations for atoms and spins to the heat transfer equation for the electrons. Worked examples include simulations of thermalization of ferromagnetic bcc iron, the dynamics of laser pulse demagnetization, and collision cascades.
NASA Astrophysics Data System (ADS)
Hiramatsu, Takuya; Takahashi, Isao; Matsushima, Satoru; Usami, Noritaka
2016-09-01
We performed numerical calculations of temperature distributions in a furnace and clarified that a simple modification of heat insulators allows the realization of a complex temperature distribution for a parallel arrangement of adjacent dendrite crystals at the initial stage of the floating cast method. The temperature distribution included a unidirectional temperature gradient on the Si melt surface, which led to the preferential nucleation on one side of a square crucible. Numerical simulation was utilized to design crystal growth experiments, and we demonstrated the preferential formation of dendrite crystals on the expected side of the crucible.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
NASA Technical Reports Server (NTRS)
Lee, J.; Kim, K.
1991-01-01
A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.
Parallelization issues of a code for physically-based simulation of fabrics
NASA Astrophysics Data System (ADS)
Romero, Sergio; Gutiérrez, Eladio; Romero, Luis F.; Plata, Oscar; Zapata, Emilio L.
2004-10-01
The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.
NASA Astrophysics Data System (ADS)
Chang, Ouliang
The objective of this dissertation is to study the physics of whistler turbulence evolution and its role in energy transport and dissipation in the solar wind plasmas through computational and theoretical investigations. This dissertation presents the first fully three-dimensional (3D) particle-in-cell (PIC) simulations of whistler turbulence forward cascade in a homogeneous, collisionless plasma with a uniform background magnetic field B o, and the first 3D PIC simulation of whistler turbulence with both forward and inverse cascades. Such computationally demanding research is made possible through the use of massively parallel, high performance electromagnetic PIC simulations on state-of-the-art supercomputers. Simulations are carried out to study characteristic properties of whistler turbulence under variable solar wind fluctuation amplitude (epsilon e) and electron beta (betae), relative contributions to energy dissipation and electron heating in whistler turbulence from the quasilinear scenario and the intermittency scenario, and whistler turbulence preferential cascading direction and wavevector anisotropy. The 3D simulations of whistler turbulence exhibit a forward cascade of fluctuations into broadband, anisotropic, turbulent spectrum at shorter wavelengths with wavevectors preferentially quasi-perpendicular to B o. The overall electron heating yields T ∥ > T⊥ for all epsilone and betae values, indicating the primary linear wave-particle interaction is Landau damping. But linear wave-particle interactions play a minor role in shaping the wavevector spectrum, whereas nonlinear wave-wave interactions are overall stronger and faster processes, and ultimately determine the wavevector anisotropy. Simulated magnetic energy spectra as function of wavenumber show a spectral break to steeper slopes, which scales as k⊥lambda e ≃ 1 independent of betae values, where lambdae is electron inertial length, qualitatively similar to solar wind observations. Specific
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Direct Numerical Simulations of Transient Dispersion
NASA Astrophysics Data System (ADS)
Porter, M.; Valdes-Parada, F.; Wood, B.
2008-12-01
Transient dispersion is important in many engineering applications, including transport in porous media. A common theoretical approach involves upscaling the micro-scale mass balance equations for convection- diffusion to macro-scale equations that contain effective medium quantities. However, there are a number of assumptions implicit in the various upscaling methods. For example, results obtained from volume averaging are often dependent on a given set of length and time scale constraints. Additionally, a number of the classical models for dispersion do not fully capture the early-time dispersive behavior of the solute for a general set of initial conditions. In this work, we present direct numerical simulations of micro-scale transient mass balance equations for convection-diffusion in both capillary tubes and porous media. Special attention is paid to analysis of the influence of a new time- decaying coefficient that filters the effects of the initial conditions. The direct numerical simulations were compared to results obtained from solving the closure problem associated with volume averaging. These comparisons provide a quantitative measure of the significance of (1) the assumptions implicit in the volume averaging method and (2) the importance of the early-time dispersive behavior of the solute due to various initial conditions.
Direct numerical simulation of incompressible axisymmetric flows
NASA Technical Reports Server (NTRS)
Loulou, Patrick
1994-01-01
In the present work, we propose to conduct direct numerical simulations (DNS) of incompressible turbulent axisymmetric jets and wakes. The objectives of the study are to understand the fundamental behavior of axisymmetric jets and wakes, which are perhaps the most technologically relevant free shear flows (e.g. combuster injectors, propulsion jet). Among the data to be generated are various statistical quantities of importance in turbulence modeling, like the mean velocity, turbulent stresses, and all the terms in the Reynolds-stress balance equations. In addition, we will be interested in the evolution of large-scale structures that are common in free shear flow. The axisymmetric jet or wake is also a good problem in which to try the newly developed b-spline numerical method. Using b-splines as interpolating functions in the non-periodic direction offers many advantages. B-splines have local support, which leads to sparse matrices that can be efficiently stored and solved. Also, they offer spectral-like accuracy that are C(exp O-1) continuous, where O is the order of the spline used; this means that derivatives of the velocity such as the vorticity are smoothly and accurately represented. For purposes of validation against existing results, the present code will also be able to simulate internal flows (ones that require a no-slip boundary condition). Implementation of no-slip boundary condition is trivial in the context of the b-splines.
Simulation of reflooding on two parallel heated channel by TRACE
Zakir, Md. Ghulam
2016-07-12
In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.
Simulation of reflooding on two parallel heated channel by TRACE
NASA Astrophysics Data System (ADS)
Zakir, Md. Ghulam
2016-07-01
In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.
Parallel Cartesian grid refinement for 3D complex flow simulations
NASA Astrophysics Data System (ADS)
Angelidis, Dionysios; Sotiropoulos, Fotis
2013-11-01
A second order accurate method for discretizing the Navier-Stokes equations on 3D unstructured Cartesian grids is presented. Although the grid generator is based on the oct-tree hierarchical method, fully unstructured data-structure is adopted enabling robust calculations for incompressible flows, avoiding both the need of synchronization of the solution between different levels of refinement and usage of prolongation/restriction operators. The current solver implements a hybrid staggered/non-staggered grid layout, employing the implicit fractional step method to satisfy the continuity equation. The pressure-Poisson equation is discretized by using a novel second order fully implicit scheme for unstructured Cartesian grids and solved using an efficient Krylov subspace solver. The momentum equation is also discretized with second order accuracy and the high performance Newton-Krylov method is used for integrating them in time. Neumann and Dirichlet conditions are used to validate the Poisson solver against analytical functions and grid refinement results to a significant reduction of the solution error. The effectiveness of the fractional step method results in the stability of the overall algorithm and enables the performance of accurate multi-resolution real life simulations. This material is based upon work supported by the Department of Energy under Award Number DE-EE0005482.
Bi-directional series-parallel elastic actuator and overlap of the actuation layers.
Furnémont, Raphaël; Mathijssen, Glenn; Verstraten, Tom; Lefeber, Dirk; Vanderborght, Bram
2016-01-27
Several robotics applications require high torque-to-weight ratio and energy efficient actuators. Progress in that direction was made by introducing compliant elements into the actuation. A large variety of actuators were developed such as series elastic actuators (SEAs), variable stiffness actuators and parallel elastic actuators (PEAs). SEAs can reduce the peak power while PEAs can reduce the torque requirement on the motor. Nonetheless, these actuators still cannot meet performances close to humans. To combine both advantages, the series parallel elastic actuator (SPEA) was developed. The principle is inspired from biological muscles. Muscles are composed of motor units, placed in parallel, which are variably recruited as the required effort increases. This biological principle is exploited in the SPEA, where springs (layers), placed in parallel, can be recruited one by one. This recruitment is performed by an intermittent mechanism. This paper presents the development of a SPEA using the MACCEPA principle with a self-closing mechanism. This actuator can deliver a bi-directional output torque, variable stiffness and reduced friction. The load on the motor can also be reduced, leading to a lower power consumption. The variable recruitment of the parallel springs can also be tuned in order to further decrease the consumption of the actuator for a given task. First, an explanation of the concept and a brief description of the prior work done will be given. Next, the design and the model of one of the layers will be presented. The working principle of the full actuator will then be given. At the end of this paper, experiments showing the electric consumption of the actuator will display the advantage of the SPEA over an equivalent stiff actuator.
Direct numerical simulation of human phonation
NASA Astrophysics Data System (ADS)
Saurabh, Shakti; Bodony, Daniel
2016-11-01
A direct numerical simulation study of the generation and propagation of the human voice in a full-body domain is conducted. A fully compressible fluid flow model, anatomically representative vocal tract geometry, finite deformation model for vocal fold (VF) motion and a fully coupled fluid-structure interaction model are employed. The dynamics of the multi-layered VF tissue with varying stiffness are solved using a quadratic finite element code. The fluid-solid domains are coupled through a boundary-fitted interface and utilize a Poisson equation-based mesh deformation method. A new inflow boundary condition, based upon a quasi-1D formulation with constant sub-glottal volume velocity, linked to the VF movement, has been adopted. Simulations for both child and adult phonation were performed. Acoustic characteristics obtained from these simulation are consistent with expected values. A sensitivity analysis based on VF stiffness variation is undertaken and sound pressure level/fundamental frequency trends are established. An evaluation of the data against the commonly-used quasi-1D equations suggest that the latter are not sufficient to model phonation. Phonation threshold pressures are measured for several VF stiffness variations and comparisons to clinical data are carried out. Supported by the National Science Foundation (CAREER Award Number 1150439).
Direct Simulation of a Solidification Benchmark Experiment
NASA Astrophysics Data System (ADS)
Carozzani, Tommy; Gandin, Charles-André; Digonnet, Hugues; Bellet, Michel; Zaidat, Kader; Fautrelle, Yves
2013-02-01
A solidification benchmark experiment is simulated using a three-dimensional cellular automaton—finite element solidification model. The experiment consists of a rectangular cavity containing a Sn-3 wt pct Pb alloy. The alloy is first melted and then solidified in the cavity. A dense array of thermocouples permits monitoring of temperatures in the cavity and in the heat exchangers surrounding the cavity. After solidification, the grain structure is revealed by metallography. X-ray radiography and inductively coupled plasma spectrometry are also conducted to access a distribution map of Pb, or macrosegregation map. The solidification model consists of solutions for heat, solute mass, and momentum conservations using the finite element method. It is coupled with a description of the development of grain structure using the cellular automaton method. A careful and direct comparison with experimental results is possible thanks to boundary conditions deduced from the temperature measurements, as well as a careful choice of the values of the material properties for simulation. Results show that the temperature maps and the macrosegregation map can only be approached with a three-dimensional simulation that includes the description of the grain structure.
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis
NASA Technical Reports Server (NTRS)
Sawyer, Darren Charles
1994-01-01
The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
NASA Astrophysics Data System (ADS)
Peng, Kuang; Cao, Yiping; Wu, Yingchun; Chen, Cheng; Wan, Yingying
2017-01-01
A dual-frequency online phase measurement profilometry (PMP) method with phase-shifting parallel to moving direction of measured object is proposed in this paper. The high-frequency fringe is used for the better modulation patterns in pixel matching and it is not modified by the measured object's surface. Based on the relative positive between the moving measured object and digital light processing (DLP), the high-frequency fringe in each dual-frequency deformed pattern after pixel matching is the same. As a result, the phase can be calculated directly by the improved Stoilov algorithm without filtering out the low-frequency component containing the measured object's height information. As there is no filtering process in phase calculation, the valid information loss can be avoided so that the accuracy of the proposed method can be guaranteed. Simulations and experiments prove the method's feasibility and precision.
Direct numerical simulation of turbulent reacting flows
Chen, J.H.
1993-12-01
The development of turbulent combustion models that reflect some of the most important characteristics of turbulent reacting flows requires knowledge about the behavior of key quantities in well defined combustion regimes. In turbulent flames, the coupling between the turbulence and the chemistry is so strong in certain regimes that is is very difficult to isolate the role played by one individual phenomenon. Direct numerical simulation (DNS) is an extremely useful tool to study in detail the turbulence-chemistry interactions in certain well defined regimes. Globally, non-premixed flames are controlled by two limiting cases: the fast chemistry limit, where the turbulent fluctuations. In between these two limits, finite-rate chemical effects are important and the turbulence interacts strongly with the chemical processes. This regime is important because industrial burners operate in regimes in which, locally the flame undergoes extinction, or is at least in some nonequilibrium condition. Furthermore, these nonequilibrium conditions strongly influence the production of pollutants. To quantify the finite-rate chemistry effect, direct numerical simulations are performed to study the interaction between an initially laminar non-premixed flame and a three-dimensional field of homogeneous isotropic decaying turbulence. Emphasis is placed on the dynamics of extinction and on transient effects on the fine scale mixing process. Differential molecular diffusion among species is also examined with this approach, both for nonreacting and reacting situations. To address the problem of large-scale mixing and to examine the effects of mean shear, efforts are underway to perform large eddy simulations of round three-dimensional jets.
Parallel computation for reservoir thermal simulation: An overlapping domain decomposition approach
NASA Astrophysics Data System (ADS)
Wang, Zhongxiao
2005-11-01
In this dissertation, we are involved in parallel computing for the thermal simulation of multicomponent, multiphase fluid flow in petroleum reservoirs. We report the development and applications of such a simulator. Unlike many efforts made to parallelize locally the solver of a linear equations system which affects the performance the most, this research takes a global parallelization strategy by decomposing the computational domain into smaller subdomains. This dissertation addresses the domain decomposition techniques and, based on the comparison, adopts an overlapping domain decomposition method. This global parallelization method hands over each subdomain to a single processor of the parallel computer to process. Communication is required when handling overlapping regions between subdomains. For this purpose, MPI (message passing interface) is used for data communication and communication control. A physical and mathematical model is introduced for the reservoir thermal simulation. Numerical tests on two sets of industrial data of practical oilfields indicate that this model and the parallel implementation match the history data accurately. Therefore, we expect to use both the model and the parallel code to predict oil production and guide the design, implementation and real-time fine tuning of new well operating schemes. A new adaptive mechanism to synchronize processes on different processors has been introduced, which not only ensures the computational accuracy but also improves the time performance. To accelerate the convergence rate of iterative solution of the large linear equations systems derived from the discretization of governing equations of our physical and mathematical model in space and time, we adopt the ORTHOMIN method in conjunction with an incomplete LU factorization preconditioning technique. Important improvements have been made in both ORTHOMIN method and incomplete LU factorization in order to enhance time performance without affecting
A conflict-free, path-level parallelization approach for sequential simulation algorithms
NASA Astrophysics Data System (ADS)
Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.
2015-07-01
Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.
Krosel, S.M.; Milner, E.J.
1982-01-01
Illustrates the application of predictor-corrector integration algorithms developed for the digital parallel processing environment. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real-time performance, inter-processor communication and algorithm startup are also discussed. 10 references.
NASA Astrophysics Data System (ADS)
Quillen, Alice C.; Moore, A.
2008-09-01
Planetesimal and dust dynamical simulations require collision and nearest neighbor detection. A brute force implementation for sorting interparticle distances requires O(N2) computations for N particles, limiting the numbers of particles that have been simulated. Parallel algorithms recently developed for the GPU (graphics processing unit), such as the radix sort, can run as fast as O(N) and sort distances between a million particles in a few hundred milliseconds. We introduce improvements in collision and nearest neighbor detection algorithms and how we have incorporated them into our efficient parallel 2nd order democratic heliocentric method symplectic integrator written in NVIDIA's CUDA for the GPU.
NASA Technical Reports Server (NTRS)
Krosel, S. M.; Milner, E. J.
1982-01-01
The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.
Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S
2017-10-01
An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT
A parallel finite volume algorithm for large-eddy simulation of turbulent flows
NASA Astrophysics Data System (ADS)
Bui, Trong Tri
1998-11-01
A parallel unstructured finite volume algorithm is developed for large-eddy simulation of compressible turbulent flows. Major components of the algorithm include piecewise linear least-square reconstruction of the unknown variables, trilinear finite element interpolation for the spatial coordinates, Roe flux difference splitting, and second-order MacCormack explicit time marching. The computer code is designed from the start to take full advantage of the additional computational capability provided by the current parallel computer systems. Parallel implementation is done using the message passing programming model and message passing libraries such as the Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The development of the numerical algorithm is presented in detail. The parallel strategy and issues regarding the implementation of a flow simulation code on the current generation of parallel machines are discussed. The results from parallel performance studies show that the algorithm is well suited for parallel computer systems that use the message passing programming model. Nearly perfect parallel speedup is obtained on MPP systems such as the Cray T3D and IBM SP2. Performance comparison with the older supercomputer systems such as the Cray YMP show that the simulations done on the parallel systems are approximately 10 to 30 times faster. The results of the accuracy and performance studies for the current algorithm are reported. To validate the flow simulation code, a number of Euler and Navier-Stokes simulations are done for internal duct flows. Inviscid Euler simulation of a very small amplitude acoustic wave interacting with a shock wave in a quasi-1D convergent-divergent nozzle shows that the algorithm is capable of simultaneously tracking the very small disturbances of the acoustic wave and capturing the shock wave. Navier-Stokes simulations are made for fully developed laminar flow in a square duct, developing laminar flow in a
A new parallel method for molecular dynamics simulation of macromolecular systems
Plimpton, S.; Hendrickson, B.
1994-08-01
Short-range molecular dynamics simulations of molecular systems are commonly parallelized by replicated-data methods, where each processor stores a copy of all atom positions. This enables computation of bonded 2-, 3-, and 4-body forces within the molecular topology to be partitioned among processors straightforwardly. A drawback to such methods is that the inter-processor communication scales as N, the number of atoms, independent of P, the number of processors. Thus, their parallel efficiency falls off rapidly when large numbers of processors are used. In this paper a new parallel method called force-decomposition for simulating macromolecular or small-molecule systems is presented. Its memory and communication costs scale as N/{radical}P, allowing larger problems to be run faster on greater numbers of processors. Like replicated-data techniques, and in contrast to spatial-decomposition approaches, the new method can be simply load-balanced and performs well even for irregular simulation geometries. The implementation of the algorithm in a prototypical macromolecular simulation code ParBond is also discussed. On a 1024-processor Intel Paragon, ParBond runs a standard benchmark simulation of solvated myoglobin with a parallel efficiency of 61% and at 40 times the speed of a vectorized version of CHARMM running on a single Cray Y-MP processor.
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Direct Numerical Simulation of Rayleigh-Taylor Instability
NASA Astrophysics Data System (ADS)
Cook, Andrew W.; Dimotakis, Paul E.
1999-11-01
Direct numerical simulations of Rayleigh-Taylor-Instability flows in a rectangular domain were performed on the ASCI Blue-Pacific computer at Lawrence Livermore National Laboratory. The code solves the Navier-Stokes equations, plus species-continuity equations with Fickian diffusion for two incompressible, miscible fluids with a 3:1 density ratio. The fluids are initially separated by a diffuse interface with broad-banded isotropic perturbations applied to the contact region. Several runs were performed with different initial conditions. Periodic boundary conditions are imposed in the horizontal directions with no-slip walls at the top and bottom of the computational domain. Neumann conditions on pressure are applied on the top/bottom walls. Domain decomposition is used to achieve a high degree of parallelism on 96 processors with a computational mesh of 256x256x512 grid points. The flow Reynolds number, based on the vertical-extent and growth-rate of the mixing region, reaches about 1100 by the end of the calculations. The results indicate that the flow remains sensitive to initial conditions, at least for the time spanned in these simulations.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Pacheco, P; Miller, P; Kim, J; Leese, T; Zabiyaka, Y
2003-05-07
Object-oriented NeuroSys (ooNeuroSys) is a collection of programs for simulating very large networks of biologically accurate neurons on distributed memory parallel computers. It includes two principle programs: ooNeuroSys, a parallel program for solving the large systems of ordinary differential equations arising from the interconnected neurons, and Neurondiz, a parallel program for visualizing the results of ooNeuroSys. Both programs are designed to be run on clusters and use the MPI library to obtain parallelism. ooNeuroSys also includes an easy-to-use Python interface. This interface allows neuroscientists to quickly develop and test complex neuron models. Both ooNeuroSys and Neurondiz have a design that allows for both high performance and relative ease of maintenance.
Direct numerical simulation of axisymmetric turbulence
NASA Astrophysics Data System (ADS)
Qu, Bo; Bos, Wouter J. T.; Naso, Aurore
2017-09-01
The dynamics of decaying, strictly axisymmetric, incompressible turbulence is investigated using direct numerical simulations. It is found that the angular momentum is a robust invariant of the system. It is further shown that long-lived coherent structures are generated by the flow. These structures can be associated with stationary solutions of the Euler equations. The structures obey relations in agreement with predictions from selective decay principles, compatible with the decay laws of the system. Two different types of decay scenarios are highlighted. The first case results in a quasi-two-dimensional flow with a dynamical behavior in the poloidal plane similar to freely decaying two-dimensional turbulence. In a second regime, the long-time dynamics is dominated by a single three-dimensional mode.
OOPSE: an object-oriented parallel simulation engine for molecular dynamics.
Meineke, Matthew A; Vardeman, Charles F; Lin, Teng; Fennell, Christopher J; Gezelter, J Daniel
2005-02-01
OOPSE is a new molecular dynamics simulation program that is capable of efficiently integrating equations of motion for atom types with orientational degrees of freedom (e.g. "sticky" atoms and point dipoles). Transition metals can also be simulated using the embedded atom method (EAM) potential included in the code. Parallel simulations are carried out using the force-based decomposition method. Simulations are specified using a very simple C-based meta-data language. A number of advanced integrators are included, and the basic integrator for orientational dynamics provides substantial improvements over older quaternion-based schemes. (c) 2004 Wiley Periodicals, Inc.
Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation
NASA Technical Reports Server (NTRS)
Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.
2009-01-01
A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.
Zhao, Jinkui
2011-01-01
IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.
Gedney, S.D.
1990-12-01
The Parallel-Plate Bounded-Wave EMP Simulator is typically used to test the vulnerability of electronic systems to the electromagnetic pulse (EMP) produced by a high altitude nuclear burst by subjecting the systems to a simulated EMP environment. However, when large test objects are placed within the simulator for investigation, the desired EMP environment may be affected by the interaction between the simulator and the test object. This simulator/obstacle interaction can be attributed to the following phenomena: (1) mutual coupling between the test object and the simulator, (2) fringing effects due to the finite width of the conducting plates of the simulator, and (3) multiple reflections between the object and the simulator's tapered end-sections. When the interaction is significant, the measurement of currents coupled into the system may not accurately represent those induced by an actual EMP. To better understand the problem of simulator/obstacle interaction, a dynamic analysis of the fields within the parallel-plate simulator is presented. The fields are computed using a moment method solution based on a wire mesh approximation of the conducting surfaces of the simulator. The fields within an empty simulator are found to be predominately transversse electromagnetic (TEM) for frequencies within the simulator's bandwidth, properly simulating the properties of the EMP propagating in free space. However, when a large test object is placed within the simulator, it is found that the currents induced on the object can be quite different from those on an object situated in free space. A comprehensive study of the mechanisms contributing to this deviation is presented.
Serial and parallel processes in eye movement control: current controversies and future directions.
Murray, Wayne S; Fischer, Martin H; Tatler, Benjamin W
2013-01-01
In this editorial for the special issue on serial and parallel processing in reading we explore the background to the current debate concerning whether the word recognition processes in reading are strictly serial-sequential or take place in an overlapping parallel fashion. We consider the history of the controversy and some of the underlying assumptions, together with an analysis of the types of evidence and arguments that have been adduced to both sides of the debate, concluding that both accounts necessarily presuppose some weakening of, or elasticity in, the eye-mind assumption. We then consider future directions, both for reading research and for scene viewing, and wrap up the editorial with a brief overview of the following articles and their conclusions.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
High performance Python for direct numerical simulations of turbulent flows
NASA Astrophysics Data System (ADS)
Mortensen, Mikael; Langtangen, Hans Petter
2016-06-01
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.
Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo
2014-01-01
Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957
Nobile, Marco S; Cazzaniga, Paolo; Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo
2014-01-01
Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae.
Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip
2010-01-01
We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that
Xyce parallel electronic simulator reference guide, Version 6.0.1.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Komarov, Ivan; D'Souza, Roshan M
2012-01-01
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
NASA Astrophysics Data System (ADS)
Decker, K. M.; Jayewardena, C.; Rehmann, R.
We describe the library lgtlib, and lgttool, the corresponding development environment for Monte Carlo simulations of lattice gauge theory on multiprocessor vector computers with shared memory. We explain why distributed memory parallel processor (DMPP) architectures are particularly appealing for compute-intensive scientific applications, and introduce the design of a general application and program development environment system for scientific applications on DMPP architectures.
Komarov, Ivan; D'Souza, Roshan M.
2012-01-01
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×−120× performance gain over various state-of-the-art serial algorithms when simulating different types of models. PMID:23152751
Massively parallel simulation of flow and transport in variably saturated porous and fractured media
Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten
2002-01-15
This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.
Direct numerical simulation of hot jets
NASA Technical Reports Server (NTRS)
Jacob, Marc C.
1993-01-01
The ultimate motivation of this work is to investigate the stability of two dimensional heated jets and its implications for aerodynamic sound generation from data obtained with direct numerical simulations (DNS). As pointed out in our last report, these flows undergo two types of instabilities, convective or absolute, depending on their temperature. We also described the limits of earlier experimental and theoretical studies and explained why a numerical investigation could give us new insight into the physics of these instabilities. The aeroacoustical interest of these flows was also underlined. In order to reach this goal, we first need to succeed in the DNS of heated jets. Our past efforts have been focused on this issue which encountered several difficulties. Our numerical difficulties are directly related to the physical problem we want to investigate since these absolutely or almost absolutely unstable flows are by definition very sensitive to the smallest disturbances and are very likely to reach nonlinear saturation through a numerical feedback mechanism. As a result, it is very difficult to compute a steady laminar solution using a spatial DNS. A steady state was reached only for strongly co-flowed jets, but these flows are almost equivalent to two independent mixing layers. Thus they are far from absolute instability and have much lower growth rates.
Direct numerical simulation of inertial flows in porous media
NASA Astrophysics Data System (ADS)
Apte, S.; Finn, J.; Wood, B. D.
2010-12-01
At modest flow rates (10 ≤ Re ≤ 300) through porous media and packed beds, fluid inertia can result in complex steady and unsteady recirculation regions, dependent on the local pore geometry. Body fitted CFD is a broadly used design and analysis tool for flows in porous media and packed bed type reactors. Unfortunately, the inherent complexities of porous media make unstructured mesh generation a difficult and time consuming step in the simulation process. To accurately capture the inertial dynamics using high-fidelity direct simulations, body fitted meshes must be high quality and sufficiently refined. We present methods to parameterize and simplify mesh generation for packed beds, with an eye toward obtaining efficient mesh independence for Reynolds numbers in the inertial and unsteady regimes. The crux of mesh generation for packed beds is dealing with sphere-sphere or sphere-wall contact points, where a geometric singularity exists. To handle the sphere-sphere and sphere-wall contact points, we use a fillet bridge model, in which every pair of contacting entities are bridged by a fillet, eliminating a small fluid region near the contact point. This results in a continuous surface mesh which does not require resizing of the spheres and can accommodate prism cells for improved boundary layer resolution. A second order accurate, parallel, incompressible flow solver [Moin and Apte, AIAA J. 2006] is used to simulate flow through three different sphere packings: a periodic simple cubic packing, a wall bounded hexagonal close packing, and a randomly packed tube. Mesh independence is assessed using several measures including Ergun pressure drop coefficients, viscous and pressure components of drag force, kinetic energy, kinetic energy dissipation and interstitial velocity profiles. The results of these test cases are used to determine the feasibility of accurate and very large scale simulations of flow through a randomly packed bed of 103 pores. Preliminary results
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can
Direct Numerical Simulation of Complex Turbulence
NASA Astrophysics Data System (ADS)
Hsieh, Alan
Direct numerical simulations (DNS) of spanwise-rotating turbulent channel flow were conducted. The data base obtained from these DNS simulations were used to investigate the turbulence generation cycle for simple and complex turbulence. For turbulent channel flow, three theoretical models concerning the formation and evolution of sublayer streaks, three-dimensional hairpin vortices and propagating plane waves were validated using visualizations from the present DNS data. The principal orthogonal decomposition (POD) method was used to verify the existence of the propagating plane waves; a new extension of the POD method was derived to demonstrate these plane waves in a spatial channel model. The analyses of coherent structures was extended to complex turbulence and used to determine the proper computational box size for a minimal flow unit (MFU) at Rob < 0.5. Proper realization of Taylor-Gortler vortices in the highly turbulent pressure region was demonstrated to be necessary for acceptably accurate MFU turbulence statistics, which required a minimum spanwise domain length Lz = pi. A dependence of MFU accuracy on Reynolds number was also discovered and MFU models required a larger domain to accurately approximate higher-Reynolds number flows. In addition, the results obtained from the DNS simulations were utilized to evaluate several turbulence closure models for momentum and thermal transport in rotating turbulent channel flow. Four nonlinear eddy viscosity turbulence models were tested and among these, Explicit Algebraic Reynolds Stress Models (EARSM) obtained the Reynolds stress distributions in best agreement with DNS data for rotational flows. The modeled pressure-strain functions of EARSM were shown to have strong influence on the Reynolds stress distributions near the wall. Turbulent heatflux distributions obtained from two explicit algebraic heat flux models consistently displayed increasing disagreement with DNS data with increasing rotation rate. Results
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Direct and Inverse Kinematics of a Novel Tip-Tilt-Piston Parallel Manipulator
NASA Technical Reports Server (NTRS)
Tahmasebi, Farhad
2004-01-01
Closed-form direct and inverse kinematics of a new three degree-of-freedom (DOF) parallel manipulator with inextensible limbs and base-mounted actuators are presented. The manipulator has higher resolution and precision than the existing three DOF mechanisms with extensible limbs. Since all of the manipulator actuators are base-mounted; higher payload capacity, smaller actuator sizes, and lower power dissipation can be obtained. The manipulator is suitable for alignment applications where only tip, tilt, and piston motions are significant. The direct kinematics of the manipulator is reduced to solving an eighth-degree polynomial in the square of tangent of half-angle between one of the limbs and the base plane. Hence, there are at most 16 assembly configurations for the manipulator. In addition, it is shown that the 16 solutions are eight pairs of reflected configurations with respect to the base plane. Numerical examples for the direct and inverse kinematics of the manipulator are also presented.
Electro-optic directed XOR logic circuits based on parallel-cascaded micro-ring resonators.
Tian, Yonghui; Zhao, Yongpeng; Chen, Wenjie; Guo, Anqi; Li, Dezhao; Zhao, Guolin; Liu, Zilong; Xiao, Huifu; Liu, Guipeng; Yang, Jianhong
2015-10-05
We report an electro-optic photonic integrated circuit which can perform the exclusive (XOR) logic operation based on two silicon parallel-cascaded microring resonators (MRRs) fabricated on the silicon-on-insulator (SOI) platform. PIN diodes embedded around MRRs are employed to achieve the carrier injection modulation. Two electrical pulse sequences regarded as two operands of operations are applied to PIN diodes to modulate two MRRs through the free carrier dispersion effect. The final operation result of two operands is output at the Output port in the form of light. The scattering matrix method is employed to establish numerical model of the device, and numerical simulator SG-framework is used to simulate the electrical characteristics of the PIN diodes. XOR operation with the speed of 100Mbps is demonstrated successfully.
Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP
NASA Astrophysics Data System (ADS)
Fan, Houfu; Li, Shaofan
2017-04-01
In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.
Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP
NASA Astrophysics Data System (ADS)
Fan, Houfu; Li, Shaofan
2016-06-01
In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.
Domel, N.D.; Thompson, D.S. )
1991-01-01
The effect of shock impingement on the mixing and combustion of a reacting shear-layer is numerically simulated. Hydrogen fuel is injected at sonic velocity behind a backward facing step in a direction parallel to a supersonic freestream vitiated with H{sub 2}O. The two-dimensional Navier-Stokes equations are solved and explicitly coupled to a chemistry package employing a global, two-step combustion model. The results show that shock impingement enhances the mixing and combustion. 17 refs.
Parallel direct laser writing in three dimensions with spatially dependent aberration correction.
Jesacher, Alexander; Booth, Martin J
2010-09-27
We propose a hologram design process which aims at reducing aberrations in parallel three-dimensional direct laser writing applications. One principle of the approach is to minimise the diffractive power of holograms while retaining the degree of parallelisation. This reduces focal distortion caused by chromatic aberration. We address associated problems such as the zero diffraction order and aberrations induced by a potential refractive index mismatch between the immersion medium of the microscope objective and the fabrication substrate. Results from fabrication in diamond, fused silica and lithium niobate are presented.
NASA Astrophysics Data System (ADS)
Stupl, J.; Faber, N.; Foster, C.; Yang, F.; Nelson, B.; Aziz, J.; Nuttall, A.; Henze, C.; Levit, C.
2014-09-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has proven that a few ground-based systems consisting of 10 kW class lasers directed by 1.5 m telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present both our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.
Shadow-Bitcoin: Scalable Simulation via Direct Execution of Multi-Threaded Applications
2015-08-10
Shadow -Bitcoin: Scalable Simulation via Direct Execution of Multi-threaded Applications Andrew Miller University of Maryland amiller@cs.umd.edu Rob...threaded applications inside of Shadow , an existing parallel discrete-event network sim- ulation framework. Our methodology utilizes function... Shadow plug-in that directly executes the Bitcoin reference client software. To demonstrate the usefulness of this tool, we present novel denial-of
Direct numerical simulation of free falling sphere in creeping flow
NASA Astrophysics Data System (ADS)
Reddy, Rupesh K.; Jin, Shi; Nandakumar, K.; Minev, Peter D.; Joshi, Jyeshtharaj B.
2010-03-01
In the present study, direct numerical simulations (DNS) are performed on single and a swarm of particles settling under the action of gravity. The simulations have been carried out in the creeping flow range of Reynolds number from 0.01 to 1 for understanding the hindrance effect, of the other particles, on the settling velocity and drag coefficient. The DNS code is a non-Lagrange multiplier-based fictitious-domain method, which has been developed and validated by Jin et al. (2008; A parallel algorithm for the direct numerical simulation of 3D inertial particle sedimentation. In: Conference proceedings of the 16th annual conference of the CFD Society of Canada). It has been observed that the time averaged settling velocity of the particle in the presence of other particles, decreases with an increase in the number of particles surrounding it (from 9 particles to 245 particles). The effect of the particle volume fraction on the drag coefficient has also been studied and it has been observed that the computed values of drag coefficients are in good agreement with the correlations proposed by Richardson and Zaki (1954; Sedimentation and fluidization: part I. Transactions of the Institution of Chemical Engineers, 32, 35-53) and Pandit and Joshi (1998; Pressure drop in packed, expanded and fluidised beds, packed columns and static mixers - a unified approach. Reviews in Chemical Engineering, 14, 321-371). The suspension viscosity-based model of Frankel and Acrivos (1967; On the viscosity of a concentrated suspension of solid spheres. Chemical Engineering Science, 22, 847-853) shows good agreement with the DNS results.
The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations
Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka
2011-01-01
Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.
Higginson, J S; Neptune, R R; Anderson, F C
2005-09-01
Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code
NASA Technical Reports Server (NTRS)
Yamakov, Vesselin I.
2016-01-01
This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.
Application of parallel computing to seismic damage process simulation of an arch dam
NASA Astrophysics Data System (ADS)
Zhong, Hong; Lin, Gao; Li, Jianbo
2010-06-01
The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.
Parallel simulations of Grover's algorithm for closest match search in neutron monitor data
NASA Astrophysics Data System (ADS)
Kussainov, Arman; White, Yelena
We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
NASA Astrophysics Data System (ADS)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik
2015-09-01
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique
Kanarska, Y
2010-03-24
Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical simulation of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle
NASA Technical Reports Server (NTRS)
Lyons, Daniel T.; Desai, Prasun N.
2005-01-01
This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.
A method for data handling numerical results in parallel OpenFOAM simulations
Anton, Alin; Muntean, Sebastian
2015-12-31
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Parallel simulations of vortex-induced vibrations in turbulent flow: Linear and nonlinear models
NASA Astrophysics Data System (ADS)
Evangelinos, Constantinos
1999-11-01
In this work unstructured spectral/hp element based direct numerical simulation (DNS) techniques are used to simulate vortex-induced vibrations (VIV) of flexible cylinders. Linear structural models are employed for tension- dominated structures (cables) and bending stiffness- dominated structures (beams). Flow-structure interactions are studied in transitional (200-300) and turbulent (1000) Reynolds numbers. Structural responses as well as hydrodynamic forces are analyzed and their relationship with the near wake flow structures is examined. The following conclusions were reached: (1)A Reynolds number effect exists for the observed oscillation amplitude. (2)The phase relationship between cross- flow displacement and coefficient of lift is correlated with both the magnitudes of lift forces and displacement. (3)Cables enhance transition to turbulent flow, while beams (and rigidly vibrating cylinders) delay it. In the transition regime beams oscillate with 70% of the amplitude of cables. (4)Oblique and parallel shedding appear to coexist in the turbulent wake of cables and beams with a traveling wave structural response. The corresponding wake structure behind a cylinder with pinned ends vibrating as a standing wave, displays lambda-type vortices similar to those seen at lower (laminar) Reynolds numbers. (5)Cables and beams at a Reynolds number of 1000 give: (a)extremely similar velocity spectra, (b)differing autocorrelation profiles and large flow structures, and (c)differing structural responses. (6)The empirical formula for the coefficient of drag due to Skop et al. (1977) is shown to be in disagreement with the experimental data; a modified formula fits the results much better. A non-linear set of equations for the finite amplitude vibrations of a string are also derived and investigated. It is combined with an Arbitrary Lagrangian-Eulerian (ALE) flow solver and applied to model simulations of low Reynolds number (100) flow past flexible cylinders with pinned ends
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Direct Numerical Simulation of the Leidenfrost Effect
NASA Astrophysics Data System (ADS)
Tanguy, Sebastien; Rueda Villegas, Lucia; Fluid Mechanics Institute of Toulouse Team
2015-11-01
The development of numerical methods for the direct numerical simulation of two-phase flows with phase changes, is the main topic of this study. We propose a novel numerical method which allows dealing with both evaporation and boiling at the interface between a liquid and a gas. For instance it can occur for a Leidenfrost droplet; a water drop levitating above a hot plate which temperature is much higher than the boiling temperature. In this case, boiling occurs in the film of saturated vapor which is entrapped between the bottom of the drop and the plate, whereas the top of the water droplet evaporates in contact of ambient air. Thus, boiling and evaporation can occur simultaneously on different regions of the same liquid interface or occur successively at different times of the history of an evaporating droplet. Usual numerical methods are not able to perform computations in these transient regimes, therefore, we propose in this paper a novel numerical method to achieve this challenging task. Finally, we present several accurate validations against experimental results on Leidenfrost Droplets to strengthen the relevance of this new method.
Direct Simulation of Ultrafast Detonations in Mixtures
NASA Astrophysics Data System (ADS)
O'Connor, Patrick D.; Long, Lyle N.; Anderson, James B.
2005-05-01
For nearly a century experimental measurements of the velocities of detonations in gases have been found in general agreement with those of the Chapman-Jouguet (C-J) hypothesis predicting velocities, relative to the burned gases, equal to the speed of sound in the burned gases. This was further supported by the Zeldovich — von Neumann — Döring (ZND) theories predicting Chapman-Jouguet velocities for detonations in which the shock and reaction zones are separated. However, for a very fast reaction, the shock and reaction regions overlap and the assumptions required for the C-J and ZND theories are no longer valid. Previous work with the direct simulation method established conditions for forcing the reaction and shock regions to coalesce in a detonation wave by means of a very fast exothermic reaction. The resulting detonation velocities were characterized as ultrafast, as they were found to exceed the steady-state velocities predicted by the C-J and ZND theories. Continued investigation into the ultrafast regime has allowed for the further development of this inconsistency with theory by including a heavy non-reacting gas in the mixture. The resulting gaseous mixtures closely followed the C-J predicted behavior for slow reactions, and for very fast reactions were found to produce ultrafast detonations with a substantially greater deviation from C-J behavior.
Direct Numerical Simulation of Cell Printing
NASA Astrophysics Data System (ADS)
Qiao, Rui; He, Ping
2010-11-01
Structural cell printing, i.e., printing three dimensional (3D) structures of cells held in a tissue matrix, is gaining significant attention in the biomedical community. The key idea is to use desktop printer or similar devices to print cells into 3D patterns with a resolution comparable to the size of mammalian cells, similar to that in living organs. Achieving such a resolution in vitro can lead to breakthroughs in areas such as organ transplantation and understanding of cell-cell interactions in truly 3D spaces. Although the feasibility of cell printing has been demonstrated in the recent years, the printing resolution and cell viability remain to be improved. In this work, we investigate one of the unit operations in cell printing, namely, the impact of a cell-laden droplet into a pool of highly viscous liquids using direct numerical simulations. The dynamics of droplet impact (e.g., crater formation and droplet spreading and penetration) and the evolution of cell shape and internal stress are quantified in details.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.
2011-07-01
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software
Three-dimensional direct particle simulation on the Connection Machine
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1991-01-01
This paper presents the algorithms necessary for an efficient data parallel implementation of a 3D particle simulation. In particular, a general master/slave algorithm and a fast sorting algorithm are described and the use of these algorithms in a particle simulation is outlined. A particle simulation using these algorithms has been implemented on a 32768 processor Connection Machine that is capable of simulating over 30 million particles at an average rate of 2.4-microsec/particle/step. Results are presented from the simulation of flow over an Aeroassisted Flight Experiment geometry at 100 km altitude.
NASA Astrophysics Data System (ADS)
Kojima, A.; Ikegami, N.; Yoshida, T.; Miyaguchi, H.; Muroyama, M.; Yoshida, S.; Totsu, K.; Koshida, N.; Esashi, M.
2016-03-01
Developments of a Micro Electro-Mechanical System (MEMS) electrostatic Condenser Lens Array (CLA) for a Massively Parallel Electron Beam Direct Write (MPEBDW) lithography system are described. The CLA converges parallel electron beams for fine patterning. The structure of the CLA was designed on a basis of analysis by a finite element method (FEM) simulation. The lens was fabricated with precise machining and assembled with a nanocrystalline silicon (nc-Si) electron emitter array as an electron source of MPEBDW. The nc-Si electron emitter has the advantage that a vertical-emitted surface electron beam can be obtained without any extractor electrodes. FEM simulation of electron optics characteristics showed that the size of the electron beam emitted from the electron emitter was reduced to 15% by a radial direction, and the divergence angle is reduced to 1/18.
NASA Astrophysics Data System (ADS)
Marx, Alain; Lütjens, Hinrich
2017-03-01
A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
Fabrication of Si-nozzles for parallel mechano-electrospinning direct writing
NASA Astrophysics Data System (ADS)
Pan, Yanqiao; Huang, YongAn; Bu, Ningbin; Yin, Zhouping
2013-06-01
Nozzles with micro-scale orifices drive high-resolution printing techniques for generating micro- to nano-scale droplets/lines. This paper presents the fabrication and application of Si-nozzles in mechano-electrospinning (MES). The fabrication process mainly consists of photolithography, Au deposition, inductively coupled plasma, and polydimethylsiloxane encapsulation. The 6 wt% polyethylene oxide solution is adopted to study the electrospinning behaviour and the relations between fibre diameter and process parameters in MES. A fibre grid with 250 µm spacing is able to be direct written, and the diameters are less than 3 µm. To improve the printing efficiency, positioning accuracy and flexibility, a rotatable multi-nozzle is adopted. The distance between parallel lines reduces sharply from 4.927 to 0.308 mm with the rotating angle increasing from 0° to 87°, and the fibre grids with tunable distance are achieved. This method paves the way for fabrication of addressable Si-nozzle array in parallel MES direct writing.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN.
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.; Mills, R. T.
2014-01-01
To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.
2001-08-31
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.
Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S
2003-01-01
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.
Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C
2010-12-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Long-time atomistic simulations with the Parallel Replica Dynamics method
NASA Astrophysics Data System (ADS)
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations
Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah; Carns, Philip; Ross, Robert; Li, Jianping Kelvin; Ma, Kwan-Liu
2016-11-13
Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a
Application of Parallel Discrete Event Simulation to the Space Surveillance Network
NASA Astrophysics Data System (ADS)
Jefferson, D.; Leek, J.
2010-09-01
In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to
A Component-Based Extension Framework for Large-Scale Parallel Simulations in NEURON
King, James G.; Hines, Michael; Hill, Sean; Goodman, Philip H.; Markram, Henry; Schürmann, Felix
2008-01-01
As neuronal simulations approach larger scales with increasing levels of detail, the neurosimulator software represents only a part of a chain of tools ranging from setup, simulation, interaction with virtual environments to analysis and visualizations. Previously published approaches to abstracting simulator engines have not received wide-spread acceptance, which in part may be to the fact that they tried to address the challenge of solving the model specification problem. Here, we present an approach that uses a neurosimulator, in this case NEURON, to describe and instantiate the network model in the simulator's native model language but then replaces the main integration loop with its own. Existing parallel network models are easily adopted to run in the presented framework. The presented approach is thus an extension to NEURON but uses a component-based architecture to allow for replaceable spike exchange components and pluggable components for monitoring, analysis, or control that can run in this framework alongside with the simulation. PMID:19430597
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors
Nishiura, Daisuke; Sakaguchi, Hide
2011-03-01
Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.
Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas
Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.
2006-07-15
The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.
Modeling of Weakly Collisional Parallel Electron Transport for Edge Plasma Simulations
NASA Astrophysics Data System (ADS)
Umansky, M. V.; Dimits, A. M.; Joseph, I.; Omotani, J. T.; Rognlien, T. D.
2014-10-01
The parallel electron heat transport in a weakly collisional regime can be represented in the framework of the Landau-fluid (LF) model. Practical implementation of LF-based transport models has become possible due to the recent invention of an efficient non- spectral method for the non-local closure operators. Here the implementation of a LF based model for the parallel plasma transport is described, and the model is tested for different collisionality regimes against a Fokker-Plank code. The new method appears to represent weakly collisional parallel electron transport more accurately than the conventional flux-limiter based models; on the other hand it is computationally efficient enough to be used in tokamak edge plasma simulations. Implementation of an LF-based model for the parallel plasma transport in the UEDGE code is described, and applications to realistic divertor simulations are discussed. Work performed for U.S. DoE by LLNL under Contract DE-AC52-07NA27344.
Treveaven, P.
1989-01-01
This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.
Advanced in turbulence physics and modeling by direct numerical simulations
NASA Technical Reports Server (NTRS)
Reynolds, W. C.
1987-01-01
The advent of direct numerical simulations of turbulence has opened avenues for research on turbulence physics and turbulence modeling. Direct numerical simulation provides values for anything that the scientist or modeler would like to know about the flow. An overview of some recent advances in the physical understanding of turbulence and in turbulence modeling obtained through such simulations is presented.
Parallel solutions for voxel-based simulations of reaction-diffusion systems.
D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan
2014-01-01
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures.
Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems
D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan
2014-01-01
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716
Wendel, D. E.; Olson, D. K.; Hesse, M.; Kuznetsova, M.; Adrian, M. L.; Aunai, N.; Karimabadi, H.; Daughton, W.
2013-12-15
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first–order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
Hepburn, I; Chen, W; De Schutter, E
2016-08-07
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.
Adaptive finite element simulation of flow and transport applications on parallel computers
NASA Astrophysics Data System (ADS)
Kirk, Benjamin Shelton
The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the
Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators
Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.
1999-11-13
In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.
NASA Astrophysics Data System (ADS)
Hepburn, I.; Chen, W.; De Schutter, E.
2016-08-01
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.
Construction of a parallel processor for simulating manipulators and other mechanical systems
NASA Technical Reports Server (NTRS)
Hannauer, George
1991-01-01
This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Parallel Processing Techniques for the Simulation of MOS VLSI Circuits Using Waveform Relaxation
1988-07-01
the number of pro- cessors used. The Gauss- Jacobi method with time point pipelining is introduced as a highly parallel algorithm which can outperform...the other algorithms when the number of processors is large. A theorem is presented comparing the Gauss-Seidel and Gauss- Jacobi methods . as applied to...time I depends on the characteristics of the circuit being simulated and on the number of pro- 3 cessors used. The Gauss- Jacobi method with time
PPM A highly efficient parallel particle mesh library for the simulation of continuum systems
NASA Astrophysics Data System (ADS)
Sbalzarini, I. F.; Walther, J. H.; Bergdorf, M.; Hieber, S. E.; Kotsalis, E. M.; Koumoutsakos, P.
2006-07-01
This paper presents a highly efficient parallel particle-mesh (PPM) library, based on a unifying particle formulation for the simulation of continuous systems. In this formulation, the grid-free character of particle methods is relaxed by the introduction of a mesh for the reinitialization of the particles, the computation of the field equations, and the discretization of differential operators. The present utilization of the mesh does not detract from the adaptivity, the efficient handling of complex geometries, the minimal dissipation, and the good stability properties of particle methods. The coexistence of meshes and particles, allows for the development of a consistent and adaptive numerical method, but it presents a set of challenging parallelization issues that have hindered in the past the broader use of particle methods. The present library solves the key parallelization issues involving particle-mesh interpolations and the balancing of processor particle loading, using a novel adaptive tree for mixed domain decompositions along with a coloring scheme for the particle-mesh interpolation. The high parallel efficiency of the library is demonstrated in a series of benchmark tests on distributed memory and on a shared-memory vector architecture. The modularity of the method is shown by a range of simulations, from compressible vortex rings using a novel formulation of smooth particle hydrodynamics, to simulations of diffusion in real biological cell organelles. The present library enables large scale simulations of diverse physical problems using adaptive particle methods and provides a computational tool that is a viable alternative to mesh-based methods.
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1998-01-01
The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.
NASA Astrophysics Data System (ADS)
Lee, Nicholas Jabari Ouma
Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-01-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations
NASA Technical Reports Server (NTRS)
Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.
2013-01-01
Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.
Khan, Md Ashfaquzzaman; Herbordt, Martin C
2011-07-20
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Parallel discrete molecular dynamics simulation with speculation and in-order commitment
NASA Astrophysics Data System (ADS)
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-07-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Simulation and Gaming: Directions, Issues, Ponderables.
ERIC Educational Resources Information Center
Uretsky, Michael
1995-01-01
Discusses the current use of simulation and gaming in a variety of settings. Describes advances in technology that facilitate the use of simulation and gaming, including computer power, computer networks, software, object-oriented programming, video, multimedia, virtual reality, and artificial intelligence. Considers the future use of simulation…
Simulation and Gaming: Directions, Issues, Ponderables.
ERIC Educational Resources Information Center
Uretsky, Michael
1995-01-01
Discusses the current use of simulation and gaming in a variety of settings. Describes advances in technology that facilitate the use of simulation and gaming, including computer power, computer networks, software, object-oriented programming, video, multimedia, virtual reality, and artificial intelligence. Considers the future use of simulation…
Future directions for simulation of recreation use
David N. Cole
2005-01-01
As the case studies in Chapter 4 illustrate, simulation modeling can be a valuable tool for recreation planning and management. Although simulation modeling is already well developed for business applications, its adaptation to recreation management is less developed. Relatively few resources have been devoted to realizing its potential. Further progress is needed in...
NASA Astrophysics Data System (ADS)
Reuter, K.; Jenko, F.; Forest, C. B.; Bayliss, R. A.
2008-08-01
A parallel implementation of a nonlinear pseudo-spectral MHD code for the simulation of turbulent dynamos in spherical geometry is reported. It employs a dual domain decomposition technique in both real and spectral space. It is shown that this method shows nearly ideal scaling going up to 128 CPUs on Beowulf-type clusters with fast interconnect. Furthermore, the potential of exploiting single precision arithmetic on standard x86 processors is examined. It is pointed out that the MHD code thereby achieves a maximum speedup of 1.7, whereas the validity of the computations is still granted. The combination of both measures will allow for the direct numerical simulation of highly turbulent cases ( 1500
Holkundkar, Amol R.
2013-11-15
The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.
NASA Astrophysics Data System (ADS)
Kim, Eui Joong
Large scale ground motion simulation requires supercomputing systems in order to obtain reliable and useful results within reasonable elapsed time. In this study, we develop a framework for terascale ground motion simulations in highly heterogeneous basins. As part of the development, we present a parallel octree-based multiresolution finite element methodology for the elastodynamic wave propagation problem. The octree-based multiresolution finite element method reduces memory use significantly and improves overall computational performance. The framework is comprised of three parts; (1) an octree-based mesh generator, Euclid developed by TV and O'Hallaron, (2) a parallel mesh partitioner, ParMETIS developed by Karypis et al.[2], and (3) a parallel octree-based multiresolution finite element solver, QUAKE developed in this study. Realistic earthquakes parameters, soil material properties, and sedimentary basins dimensions will produce extremely large meshes. The out-of-core versional octree-based mesh generator, Euclid overcomes the resulting severe memory limitations. By using a parallel, distributed-memory graph partitioning algorithm, ParMETIS partitions large meshes, overcoming the memory and cost problem. Despite capability of the Octree-Based Multiresolution Mesh Method ( OBM3), large problem sizes necessitate parallelism to handle large memory and work requirements. The parallel OBM 3 elastic wave propagation code, QUAKE has been developed to address these issues. The numerical methodology and the framework have been used to simulate the seismic response of both idealized systems and of the Greater Los Angeles basin to simple pulses and to a mainshock of the 1994 Northridge Earthquake, for frequencies of up to 1 Hz and domain size of 80 km x 80 km x 30 km. In the idealized models, QUAKE shows good agreement with the analytical Green's function solutions. In the realistic models for the Northridge earthquake mainshock, QUAKE qualitatively agrees, with at most
NASA Astrophysics Data System (ADS)
Honkonen, I.
2015-03-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.
He, H.-Q.; Wan, W. E-mail: wanw@mail.iggcas.ac.cn
2012-03-01
The parallel mean free path of solar energetic particles (SEPs), which is determined by physical properties of SEPs as well as those of solar wind, is a very important parameter in space physics to study the transport of charged energetic particles in the heliosphere, especially for space weather forecasting. In space weather practice, it is necessary to find a quick approach to obtain the parallel mean free path of SEPs for a solar event. In addition, the adiabatic focusing effect caused by a spatially varying mean magnetic field in the solar system is important to the transport processes of SEPs. Recently, Shalchi presented an analytical description of the parallel diffusion coefficient with adiabatic focusing. Based on Shalchi's results, in this paper we provide a direct analytical formula as a function of parameters concerning the physical properties of SEPs and solar wind to directly and quickly determine the parallel mean free path of SEPs with adiabatic focusing. Since all of the quantities in the analytical formula can be directly observed by spacecraft, this direct method would be a very useful tool in space weather research. As applications of the direct method, we investigate the inherent relations between the parallel mean free path and various parameters concerning physical properties of SEPs and solar wind. Comparisons of parallel mean free paths with and without adiabatic focusing are also presented.
NASA Astrophysics Data System (ADS)
Vashishta, P.; Bachlechner, M. E.; Campbell, T.; Kalia, R. K.; Kikuchi, H.; Kodiyalam, S.; Nakano, A.; Ogata, S.; Shimojo, F.; Walsh, P.
Multiresolution molecular-dynamics approach for multimillion atom simulations has been used to investigate structural properties, mechanical failure in ceramic materials, and atomic-level stresses in nanoscale semiconductor/ceramic mesas (Si/Si3N4). Crack propagation and fracture in silicon nitride, silicon carbide, gallium arsenide, and nanophase ceramics are investigated. We observe a crossover from slow to rapid fracture and a correlation between the speed of crack propagation and morphology of fracture surface. A 100 million atom simulation is carried out to study crack propagation in GaAs. Mechanical failure in the Si/Si3N4 interface is studied by applying tensile strain parallel to the interface. Ten million atom molecular dynamics simulations are performed to determine atomic-level stress distributions in a 54 nm nanopixel on a 0.1 μm silicon substrate. Multimillion atom simulations of oxidation of aluminum nanoclusters and nanoindentation in silicon nitride are also discussed.
NASA Astrophysics Data System (ADS)
Shim, Yunsic; Amar, Jacques G.; Uberuaga, B. P.; Voter, A. F.
2007-11-01
We present a method for performing parallel temperature-accelerated dynamics (TAD) simulations over extended length scales. In our method, a two-dimensional spatial decomposition is used along with the recently proposed semirigorous synchronous sublattice algorithm of Shim and Amar [Phys. Rev. B 71, 125432 (2005)]. The scaling behavior of the simulation time as a function of system size is studied and compared with serial TAD in simulations of the early stages of Cu/Cu(100) growth as well as for a simple case of surface relaxation. In contrast to the corresponding serial TAD simulations, for which the simulation time tser increases as a power of the system size N (tser˜Nx) with an exponent x that can be as large as three, in our parallel simulations the simulation time increases only logarithmically with system size. As a result, even for relatively small system sizes our parallel TAD simulations are significantly faster than the corresponding serial TAD simulations. The significantly improved scaling behavior of our parallel TAD simulations over the corresponding serial simulations indicates that our parallel TAD method may be useful in performing simulations over significantly larger length scales than serial TAD, while preserving all the atomistic details provided by the TAD method.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN
2009-06-19
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
A parallel program for numerical simulation of discrete fracture network and groundwater flow
NASA Astrophysics Data System (ADS)
Huang, Ting-Wei; Liou, Tai-Sheng; Kalatehjari, Roohollah
2017-04-01
The ability of modeling fluid flow in Discrete Fracture Network (DFN) is critical to various applications such as exploration of reserves in geothermal and petroleum reservoirs, geological sequestration of carbon dioxide and final disposal of spent nuclear fuels. Although several commerical or acdametic DFN flow simulators are already available (e.g., FracMan and DFNWORKS), challenges in terms of computational efficiency and three-dimensional visualization still remain, which therefore motivates this study for developing a new DFN and flow simulator. A new DFN and flow simulator, DFNbox, was written in C++ under a cross-platform software development framework provided by Qt. DFNBox integrates the following capabilities into a user-friendly drop-down menu interface: DFN simulation and clipping, 3D mesh generation, fracture data analysis, connectivity analysis, flow path analysis and steady-state grounwater flow simulation. All three-dimensional visualization graphics were developed using the free OpenGL API. Similar to other DFN simulators, fractures are conceptualized as random point process in space, with stochastic characteristics represented by orientation, size, transmissivity and aperture. Fracture meshing was implemented by Delaunay triangulation for visualization but not flow simulation purposes. Boundary element method was used for flow simulations such that only unknown head or flux along exterior and interection bounaries are needed for solving the flow field in the DFN. Parallel compuation concept was taken into account in developing DFNbox for calculations that such concept is possible. For example, the time-consuming seqential code for fracture clipping calculations has been completely replaced by a highly efficient parallel one. This can greatly enhance compuational efficiency especially on multi-thread platforms. Furthermore, DFNbox have been successfully tested in Windows and Linux systems with equally-well performance.
Large eddy simulations and direct numerical simulations of high speed turbulent reacting flows
NASA Technical Reports Server (NTRS)
Givi, P.; Frankel, S. H.; Adumitroaie, V.; Sabini, G.; Madnia, C. K.
1993-01-01
The primary objective of this research is to extend current capabilities of Large Eddy Simulations (LES) and Direct Numerical Simulations (DNS) for the computational analyses of high speed reacting flows. Our efforts in the first two years of this research have been concentrated on a priori investigations of single-point Probability Density Function (PDF) methods for providing subgrid closures in reacting turbulent flows. In the efforts initiated in the third year, our primary focus has been on performing actual LES by means of PDF methods. The approach is based on assumed PDF methods and we have performed extensive analysis of turbulent reacting flows by means of LES. This includes simulations of both three-dimensional (3D) isotropic compressible flows and two-dimensional reacting planar mixing layers. In addition to these LES analyses, some work is in progress to assess the extent of validity of our assumed PDF methods. This assessment is done by making detailed companions with recent laboratory data in predicting the rate of reactant conversion in parallel reacting shear flows. This report provides a summary of our achievements for the first six months of the third year of this program.
Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan
2004-01-01
Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335
NASA Astrophysics Data System (ADS)
Ovaysi, S.; Piri, M.
2009-12-01
We present a three-dimensional fully dynamic parallel particle-based model for direct pore-level simulation of incompressible viscous fluid flow in disordered porous media. The model was developed from scratch and is capable of simulating flow directly in three-dimensional high-resolution microtomography images of naturally occurring or man-made porous systems. It reads the images as input where the position of the solid walls are given. The entire medium, i.e., solid and fluid, is then discretized using particles. The model is based on Moving Particle Semi-implicit (MPS) technique. We modify this technique in order to improve its stability. The model handles highly irregular fluid-solid boundaries effectively. It takes into account viscous pressure drop in addition to the gravity forces. It conserves mass and can automatically detect any false connectivity with fluid particles in the neighboring pores and throats. It includes a sophisticated algorithm to automatically split and merge particles to maintain hydraulic connectivity of extremely narrow conduits. Furthermore, it uses novel methods to handle particle inconsistencies and open boundaries. To handle the computational load, we present a fully parallel version of the model that runs on distributed memory computer clusters and exhibits excellent scalability. The model is used to simulate unsteady-state flow problems under different conditions starting from straight noncircular capillary tubes with different cross-sectional shapes, i.e., circular/elliptical, square/rectangular and triangular cross-sections. We compare the predicted dimensionless hydraulic conductances with the data available in the literature and observe an excellent agreement. We then test the scalability of our parallel model with two samples of an artificial sandstone, samples A and B, with different volumes and different distributions (non-uniform and uniform) of solid particles among the processors. An excellent linear scalability is
Multitasking simulation: Present application and future directions.
Adams, Traci Nicole; Rho, Jason C
2017-02-01
The Accreditation Council for Graduate Medical Education lists multi-tasking as a core competency in several medical specialties due to increasing demands on providers to manage the care of multiple patients simultaneously. Trainees often learn multitasking on the job without any formal curriculum, leading to high error rates. Multitasking simulation training has demonstrated success in reducing error rates among trainees. Studies of multitasking simulation demonstrate that this type of simulation is feasible, does not hinder the acquisition of procedural skill, and leads to better performance during subsequent periods of multitasking. Although some healthcare agencies have discouraged multitasking due to higher error rates among multitasking providers, it cannot be eliminated entirely in settings such as the emergency department in which providers care for more than one patient simultaneously. Simulation can help trainees to identify situations in which multitasking is inappropriate, while preparing them for situations in which multitasking is inevitable.
Wang, Junye; Zhang, Xiaoxian; Bengough, Anthony G; Crawford, John W
2005-07-01
The lattice Boltzmann method has proven to be a promising method to simulate flow in porous media. Its practical application often relies on parallel computation because of the demand for a large domain and fine grid resolution to adequately resolve pore heterogeneity. The existing domain-decomposition methods for parallel computation usually decompose a domain into a number of subdomains first and then recover the interfaces and perform the load balance. Normally, the interface recovery and the load balance have to be performed iteratively until an acceptable load balance is achieved; this costs time. In this paper we propose a cell-based domain-decomposition method for parallel lattice Boltzmann simulation of flow in porous media. Unlike the existing methods, the cell-based method performs the load balance first to divide the total number of fluid cells into a number of groups (or subdomains), in which the difference of fluid cells in each group is either 0 or 1, depending on if the total number of fluid cells is a multiple of the processor numbers; the interfaces between the subdomains are recovered at last. The cell-based method is to recover the interfaces rather than the load balance; it does not need iteration and gives an exact load balance. The performance of the proposed method is analyzed and compared using different computer systems; the results indicate that it reaches the theoretical parallel efficiency. The method is then applied to simulate flow in a three-dimensional porous medium obtained with microfocus x-ray computed tomography to calculate the permeability, and the result shows good agreement with the experimental data.
NASA Technical Reports Server (NTRS)
Zhang, Meng; Maxworthy, Tony
1999-01-01
It has long been recognized that flow in the melt can have a profound influence on the dynamics of a solidifying interface and hence the quality of the solid material. In particular, flow affects the heat and mass transfer, and causes spatial and temporal variations in the flow and melt composition. This results in a crystal with nonuniform physical properties. Flow can be generated by buoyancy, expansion or contraction upon phase change, and thermo-soluto capillary effects. In general, these flows can not be avoided and can have an adverse effect on the stability of the crystal structures. This motivates crystal growth experiments in a microgravity environment, where buoyancy-driven convection is significantly suppressed. However, transient accelerations (g-jitter) caused by the acceleration of the spacecraft can affect the melt, while convection generated from the effects other than buoyancy remain important. Rather than bemoan the presence of convection as a source of interfacial instability, Hurle in the 1960s suggested that flow in the melt, either forced or natural convection, might be used to stabilize the interface. Delves considered the imposition of both a parabolic velocity profile and a Blasius boundary layer flow over the interface. He concluded that fast stirring could stabilize the interface to perturbations whose wave vector is in the direction of the fluid velocity. Forth and Wheeler considered the effect of the asymptotic suction boundary layer profile. They showed that the effect of the shear flow was to generate travelling waves parallel to the flow with a speed proportional to the Reynolds number. There have been few quantitative, experimental works reporting on the coupling effect of fluid flow and morphological instabilities. Huang studied plane Couette flow over cells and dendrites. It was found that this flow could greatly enhance the planar stability and even induce the cell-planar transition. A rotating impeller was buried inside the
Massively parallel Monte Carlo for many-particle simulations on GPUs
Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.
2013-12-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations
Perumalla, Kalyan S
2009-01-01
The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.
Use of Parallel Micro-Platform for the Simulation the Space Exploration
NASA Astrophysics Data System (ADS)
Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen
The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
NASA Astrophysics Data System (ADS)
Kwon, Deuk-Chul; Shin, Sung-Sik; Yu, Dong-Hun
2017-10-01
In order to reduce the computing time in simulation of radio frequency (rf) plasma sources, various numerical schemes were developed. It is well known that the upwind, exponential, and power-law schemes can efficiently overcome the limitation on the grid size for fluid transport simulations of high density plasma discharges. Also, the semi-implicit method is a well-known numerical scheme to overcome on the simulation time step. However, despite remarkable advances in numerical techniques and computing power over the last few decades, efficient multi-dimensional modeling of low temperature plasma discharges has remained a considerable challenge. In particular, there was a difficulty on parallelization in time for the time periodic steady state problems such as capacitively coupled plasma discharges and rf sheath dynamics because values of plasma parameters in previous time step are used to calculate new values each time step. Therefore, we present a parallelization method for the time periodic steady state problems by using period-slices. In order to evaluate the efficiency of the developed method, one-dimensional fluid simulations are conducted for describing rf sheath dynamics. The result shows that speedup can be achieved by using a multithreading method.
A novel parallel-rotation algorithm for atomistic Monte Carlo simulation of dense polymer systems
NASA Astrophysics Data System (ADS)
Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.
2001-06-01
We develop and test a new elementary Monte Carlo move for use in the off-lattice simulation of polymer systems. This novel Parallel-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The parallel-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo simulation. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo simulation of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice simulation of realistic models of dense polymer systems.
Data Parallel Execution Challenges and Runtime Performance of Agent Simulations on GPUs
Perumalla, Kalyan S; Aaby, Brandon G
2008-01-01
Programmable graphics processing units (GPUs) have emerged as excellent computational platforms for certain general-purpose applications. The data parallel execution capabilities of GPUs specifically point to the potential for effective use in simulations of agent-based models (ABM). In this paper, the computational efficiency of ABM simulation on GPUs is evaluated on representative ABM benchmarks. The runtime speed of GPU-based models is compared to that of traditional CPU-based implementation, and also to that of equivalent models in traditional ABM toolkits (Repast and NetLogo). As expected, it is observed that, GPU-based ABM execution affords excellent speedup on simple models, with better speedup on models exhibiting good locality and fair amount of computation per memory element. Execution is two to three orders of magnitude faster with a GPU than with leading ABM toolkits, but at the cost of decrease in modularity, ease of programmability and reusability. At a more fundamental level, however, the data parallel paradigm is found to be somewhat at odds with traditional model-specification approaches for ABM. Effective use of data parallel execution, in general, seems to require resolution of modeling and execution challenges. Some of the challenges are identified and related solution approaches are described.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
Note: Application of a novel 2(3HUS+S) parallel manipulator for simulation of hip joint motion
NASA Astrophysics Data System (ADS)
Shan, X. L.; Cheng, G.; Liu, X. Z.
2016-07-01
In the paper, a novel 2(3HUS+S) parallel manipulator, which has two moving platforms, is proposed. The parallel manipulator is adopted to simulate hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the simulated motions. The experimental results indicate that the 2(3HUS+S) parallel manipulator can realize the simulation of many kinds of hip joint motions without changing the structure size.
NASA Astrophysics Data System (ADS)
Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro
2016-08-01
We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H
2006-09-04
We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM
Steepening of parallel propagating hydromagnetic waves into magnetic pulsations - A simulation study
NASA Technical Reports Server (NTRS)
Akimoto, K.; Winske, D.; Onsager, T. G.; Thomsen, M. F.; Gary, S. P.
1991-01-01
The steepening mechanism of parallel propagating low-frequency MHD-like waves observed upstream of the earth's quasi-parallel bow shock has been investigated by means of electromagnetic hybrid simulations. It is shown that an ion beam through the resonant electromagnetic ion/ion instability excites large-amplitude waves, which consequently pitch angle scatter, decelerate, and eventually magnetically trap beam ions in regions where the wave amplitudes are largest. As a result, the beam ions become bunched in both space and gyrophase. As these higher-density, nongyrotropic beam segments are formed, the hydromagnetic waves rapidly steepen, resulting in magnetic pulsations, with properties generally in agreement with observations. This steepening process operates on the scale of the linear growth time of the resonant ion/ion instability. Many of the pulsations generated by this mechanism are left-hand polarized in the spacecraft frame.
A fast parallel Poisson solver on irregular domains applied to beam dynamics simulations
Adelmann, A. Arbenz, P. Ineichen, Y.
2010-06-20
We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or 'mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our iterative solver with an FFT-based solver that is more commonly used for applications in beam dynamics.
Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.
NASA Astrophysics Data System (ADS)
He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.
2015-03-01
This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.
Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems
Martinez, E.; Monasterio, P.R.; Marian, J.
2011-02-20
An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.
Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures
Sampath, Rahul S; Veerapaneni, Shravan; Biros, George; Zorin, Denis; Vuduc, Richard; Vetter, Jeffrey S; Moon, Logan; Malhotra, Dhairya; Shringarpure, Aashay; Rahimian, Abtin; Lashuk, Ilya; Chandramowlishwaran, Aparna
2010-01-01
We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 260 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for highly non-uniform spatial distributions of RBCs. The new method has been implemented in the software library MOBO (for 'Moving Boundaries'). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.
Embedded Microclusters in Zeolites and Cluster Beam Sputtering -- Simulation on Parallel Computers
Greenwell, Donald L.; Kalia, Rajiv K.; Vashishta, Priya
1996-12-01
This report summarizes the research carried out under DOE supported program (DOE/ER/45477) Computer Science--during the course of this project. Large-scale molecular-dynamics (MD) simulations were performed to investigate: (1) sintering of microporous and nanophase Si{sub 3}N{sub 4}; (2) crack-front propagation in amorphous silica; (3) phonons in highly efficient multiscale algorithms and dynamic load-balancing schemes for mapping process, structural correlations, and mechanical behavior including dynamic fracture in graphitic tubules; and (4) amorphization and fracture in nanowires. The simulations were carried out with irregular atomistic simulations on distributed-memory parallel architectures. These research activities resulted in fifty-three publications and fifty-five invited presentations.
Gabrieli, Andrea; Demontis, Pierfranco; Pazzona, Federico G; Suffritti, Giuseppe B
2011-05-01
Understanding the behaviors of molecules in tight confinement is a challenging task. Standard simulation tools like kinetic Monte Carlo have proven to be very effective in the study of adsorption and diffusion phenomena in microporous materials, but they turn out to be very inefficient when simulation time and length scales are extended. In this paper we have explored the possibility of application of a discrete version of the synchronous parallel kinetic Monte Carlo algorithm introduced by Martínez et al. [J. Comput. Phys. 227, 3804 (2008)] to the study of aromatic hydrocarbons diffusion in zeolites. The efficiency of this algorithm is investigated as a function of the number of processors and domain size. We show that with an accurate choice of domains size it is possible to achieve very good efficiencies thus permitting us to effectively extend space and time scales of the simulated system. © 2011 American Physical Society
Xyce parallel electronic simulator design : mathematical formulation, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.
2004-06-01
This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.
NASA Astrophysics Data System (ADS)
Yeh, Mei-Ling
We have performed a parallel decomposition of the fictitious Lagrangian method for molecular dynamics with tight-binding total energy expression into the hypercube computer. This is the first time in literature that the dynamical simulation of semiconducting systems containing more than 512 silicon atoms has become possible with the electrons treated as quantum particles. With the utilization of the Intel Paragon system, our timing analysis predicts that our code is expected to perform realistic simulations on very large systems consisting of thousands of atoms with time requirements of the order of tens of hours. Timing results and performance analysis of our parallel code are presented in terms of calculation time, communication time, and setup time. The accuracy of the fictitious Lagrangian method in molecular dynamics simulation is also investigated, especially the energy conservation of the total energy of ions. We find that the accuracy of the fictitious Lagrangian scheme in small silicon cluster and very large silicon system simulations is good for as long as the simulations proceed, even though we quench the electronic coordinates to the Born-Oppenheimer surface only in the beginning of the run. The kinetic energy of electrons does not increase as time goes on, and the energy conservation of the ionic subsystem remains very good. This means that, as far as the ionic subsystem is concerned, the electrons are on the average in the true quantum ground states. We also tie up some odds and ends regarding a few remaining questions about the fictitious Lagrangian method, such as the difference between the results obtained from the Gram-Schmidt and SHAKE method of orthonormalization, and differences between simulations where the electrons are quenched to the Born -Oppenheimer surface only once compared with periodic quenching.
FLY. A parallel tree N-body code for cosmological simulations
NASA Astrophysics Data System (ADS)
Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.
2003-10-01
FLY is a parallel treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: parallel program. The user can select the number of processors >=1 Has the code been vectorized or parallelized?: parallelized Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: Parallel tree N-body code for cosmological simulations Nature of physical problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles simulation on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY
Future Directions in Simulating Solar Geoengineering
Kravitz, Benjamin S.; Robock, Alan; Boucher, Olivier
2014-08-05
Solar geoengineering is a proposed set of technologies to temporarily alleviate some of the consequences of anthropogenic greenhouse gas emissions. The Geoengineering Model Intercomparison Project (GeoMIP) created a framework of geoengineering simulations in climate models that have been performed by modeling centers throughout the world (B. Kravitz et al., The Geoengineering Model Intercomparison Project (GeoMIP), Atmospheric Science Letters, 12(2), 162-167, doi:10.1002/asl.316, 2011). These experiments use state-of-the-art climate models to simulate solar geoengineering via uniform solar reduction, creation of stratospheric sulfate aerosol layers, or injecting sea spray into the marine boundary layer. GeoMIP has been quite successful in its mission of revealing robust features and key uncertainties of the modeled effects of solar geoengineering.
Direct simulation Monte Carlo study of quantum effects on the spherical expansion of 4He
NASA Astrophysics Data System (ADS)
Koura, Katsuhisa
1999-10-01
Quantum effects on the translational nonequilibrium at low temperatures in a spherical expansion of 4He from room temperature are studied using the direct simulation Monte Carlo method to make a comparison with the experimental measurements along the axis of a helium free jet expansion. The quantum-mechanical scattering cross sections are obtained by a quantum phase-shift calculation for the Lennard-Jones and more elaborate Hartree-Fock dispersion potentials. It is shown that the parallel and perpendicular kinetic temperatures are higher and lower, respectively, for the quantum-mechanical scattering than for the classical-mechanical scattering. A comparison with the parallel temperature determined by fitting the ellipsoidal velocity distribution function to the measured spectral profiles indicates that the parallel kinetic temperature for the quantum-mechanical scattering is higher than the measured temperature, with which the parallel kinetic temperature for the classical-mechanical scattering is fortuitously in better agreement. Because both the parallel and perpendicular velocity distribution functions appreciably deviate from Maxwellians and the Maxwellian (half-width) fit temperatures are lower than the kinetic temperatures, the discrepancy between the quantum-mechanical and measured parallel temperatures may partly be resolved by the difference between the kinetic and fitting temperatures.
NASA Astrophysics Data System (ADS)
Honkonen, I.
2014-07-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for parallel programming is also provided by allowing users to select which simulation variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2016-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
A general parallelization strategy for random path based geostatistical simulation methods
NASA Astrophysics Data System (ADS)
Mariethoz, Grégoire
2010-07-01
The size of simulation grids used for numerical models has increased by many orders of magnitude in the past years, and this trend is likely to continue. Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. As cluster computers become widely available, using parallel strategies is a natural step for increasing the usable grid size and the complexity of the models. These strategies must profit from of the possibilities offered by machines with a large number of processors. On such machines, the bottleneck is often the communication time between processors. We present a strategy distributing grid nodes among all available processors while minimizing communication and latency times. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node every time. The key is to decouple the sending and the receiving operations to avoid synchronization. Centralization allows having a conflict management system ensuring that nodes being simulated simultaneously do not interfere in terms of neighborhood. The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods.
Simulation of Unsteady Combustion in a Ramjet Engine Using a Highly Parallel Computer
NASA Technical Reports Server (NTRS)
Menon, Suresh; Weeratunga, Sisira; Cooper, D. M. (Technical Monitor)
1994-01-01
Combustion instability in ramjets is a complex phenomenon that involve nonlinear interaction between acoustic waves, vortex motion and unsteady heat release in the combustor. To numerically simulate this 3-D, transient phenomenon, enormous computer resources (time, memory and disk storage) are required. Although current generation vector supercomputers are capable of providing adequate resources for simulations of this nature, their high cost and limited availability, makes such machines less than satisfactory for routine use. The primary focus of this study is to assess the feasibility of using highly parallel computer systems as a cost-effective alternative for conducting such unsteady flow simulations. Towards this end, a large-eddy simulation model for combustion instability was implemented on the Intel iPSC/860 and a careful study was conducted to determine the benefits and the problems associated with the use of such machines for transient simulations. Details of this study along with the results obtained from the unsteady combustion simulations carried out on the iPSC/860 are discussed in this paper.
Accelerating groundwater flow simulation in MODFLOW using JASMIN-based parallel computing.
Cheng, Tangpei; Mo, Zeyao; Shao, Jingli
2014-01-01
To accelerate the groundwater flow simulation process, this paper reports our work on developing an efficient parallel simulator through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and parallel algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is simulating flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine