Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
Parallel direct numerical simulation of three-dimensional spray formation
NASA Astrophysics Data System (ADS)
Chergui, Jalel; Juric, Damir; Shin, Seungwon; Kahouadji, Lyes; Matar, Omar
2015-11-01
We present numerical results for the breakup mechanism of a liquid jet surrounded by a fast coaxial flow of air with density ratio (water/air) ~ 1000 and kinematic viscosity ratio ~ 60. We use code BLUE, a three-dimensional, two-phase, high performance, parallel numerical code based on a hybrid Front-Tracking/Level Set algorithm for Lagrangian tracking of arbitrarily deformable phase interfaces and a precise treatment of surface tension forces. The parallelization of the code is based on the technique of domain decomposition where the velocity field is solved by a parallel GMRes method for the viscous terms and the pressure by a parallel multigrid/GMRes method. Communication is handled by MPI message passing procedures. The interface method is also parallelized and defines the interface both by a discontinuous density field as well as by a triangular Lagrangian mesh and allows the interface to undergo large deformations including the rupture and/or coalescence of interfaces. EPSRC Programme Grant, MEMPHIS, EP/K0039761/1.
Scalable simulations for directed self-assembly patterning with the use of GPU parallel computing
NASA Astrophysics Data System (ADS)
Yoshimoto, Kenji; Peters, Brandon L.; Khaira, Gurdaman S.; de Pablo, Juan J.
2012-03-01
Directed self-assembly (DSA) patterning has been increasingly investigated as an alternative lithographic process for future technology nodes. One of the critical specs for DSA patterning is defects generated through annealing process or by roughness of pre-patterned structure. Due to their high sensitivity to the process and wafer conditions, however, characterization of those defects still remain challenging. DSA simulations can be a powerful tool to predict the formation of the DSA defects. In this work, we propose a new method to perform parallel computing of DSA Monte Carlo (MC) simulations. A consumer graphics card was used to access its hundreds of processing units for parallel computing. By partitioning the simulation system into non-interacting domains, we were able to run MC trial moves in parallel on multiple graphics-processing units (GPUs). Our results show a significant improvement in computational performance.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Scalar and Parallel Optimized Implementation of the Direct Simulation Monte Carlo Method
NASA Astrophysics Data System (ADS)
Dietrich, Stefan; Boyd, Iain D.
1996-07-01
This paper describes a new concept for the implementation of the direct simulation Monte Carlo (DSMC) method. It uses a localized data structure based on a computational cell to achieve high performance, especially on workstation processors, which can also be used in parallel. Since the data structure makes it possible to freely assign any cell to any processor, a domain decomposition can be found with equal calculation load on each processor while maintaining minimal communication among the nodes. Further, the new implementation strictly separates physical modeling, geometrical issues, and organizational tasks to achieve high maintainability and to simplify future enhancements. Three example flow configurations are calculated with the new implementation to demonstrate its generality and performance. They include a flow through a diverging channel using an adapted unstructured triangulated grid, a flow around a planetary probe, and an internal flow in a contactor used in plasma physics. The results are validated either by comparison with results obtained from other simulations or by comparison with experimental data. High performance on an IBM SP2 system is achieved if problem size and number of parallel processors are adapted accordingly. On 400 nodes, DSMC calculations with more than 100 million particles are possible.
Direct numerical simulation of instabilities in parallel flow with spherical roughness elements
NASA Technical Reports Server (NTRS)
Deanna, R. G.
1992-01-01
Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Weening, J.S.
1988-05-01
CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Xyce parallel electronic simulator.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Parallel Dislocation Simulator
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-01-01
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-08-13
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs.
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-08-13
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
NASA Astrophysics Data System (ADS)
Zhang, Hong-Na; Li, Feng-Chen; Li, Xiao-Bin; Li, Dong-Yang; Cai, Wei-Hua; Yu, Bo
2016-09-01
Direct numerical simulations (DNSs) of purely elastic turbulence in rectilinear shear flows in a three-dimensional (3D) parallel plate channel were carried out, by which numerical databases were established. Based on the numerical databases, the present paper analyzed the structural and statistical characteristics of the elastic turbulence including flow patterns, the wall effect on the turbulent kinetic energy spectrum, and the local relationship between the flow motion and the microstructures’ behavior. Moreover, to address the underlying physical mechanism of elastic turbulence, its generation was presented in terms of the global energy budget. The results showed that the flow structures in elastic turbulence were 3D with spatial scales on the order of the geometrical characteristic length, and vortex tubes were more likely to be embedded in the regions where the polymers were strongly stretched. In addition, the patterns of microstructures’ elongation behave like a filament. From the results of the turbulent kinetic energy budget, it was found that the continuous energy releasing from the polymers into the main flow was the main source of the generation and maintenance of the elastic turbulent status. Project supported by the National Natural Science Foundation of China (Grant Nos. 51276046 and 51506037), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 51421063), the China Postdoctoral Science Foundation (Grant No. 2016M591526), the Heilongjiang Postdoctoral Fund, China (Grant No. LBH-Z15063), and the China Postdoctoral International Exchange Program.
NASA Astrophysics Data System (ADS)
Zhang, Hong-Na; Li, Feng-Chen; Li, Xiao-Bin; Li, Dong-Yang; Cai, Wei-Hua; Yu, Bo
2016-09-01
Direct numerical simulations (DNSs) of purely elastic turbulence in rectilinear shear flows in a three-dimensional (3D) parallel plate channel were carried out, by which numerical databases were established. Based on the numerical databases, the present paper analyzed the structural and statistical characteristics of the elastic turbulence including flow patterns, the wall effect on the turbulent kinetic energy spectrum, and the local relationship between the flow motion and the microstructures’ behavior. Moreover, to address the underlying physical mechanism of elastic turbulence, its generation was presented in terms of the global energy budget. The results showed that the flow structures in elastic turbulence were 3D with spatial scales on the order of the geometrical characteristic length, and vortex tubes were more likely to be embedded in the regions where the polymers were strongly stretched. In addition, the patterns of microstructures’ elongation behave like a filament. From the results of the turbulent kinetic energy budget, it was found that the continuous energy releasing from the polymers into the main flow was the main source of the generation and maintenance of the elastic turbulent status. Project supported by the National Natural Science Foundation of China (Grant Nos. 51276046 and 51506037), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 51421063), the China Postdoctoral Science Foundation (Grant No. 2016M591526), the Heilongjiang Postdoctoral Fund, China (Grant No. LBH-Z15063), and the China Postdoctoral International Exchange Program.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Simulating Billion-Task Parallel Programs
Perumalla, Kalyan S; Park, Alfred J
2014-01-01
In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.
NASA Astrophysics Data System (ADS)
Davis, J. R.; Ozdemir, C. E.; Balachandar, S.; Hsu, T.
2012-12-01
Fine sediment transport and its potential to dampen turbulence under energetic waves and combined wave-current flows are critical to better understanding of the fate of terrestrial sediment particles in the river mouth and eventually, coastal morphodynamics. The unsteady nature of these oscillatory flows necessitates a computationally intense, turbulence resolving approach. Whereas a sophisticated shared memory parallel model has been successfully used to simulate these flows in the intermittently turbulent regime (Remax ~ 1000), scaling issues of shared memory computational hardware limit the applicability of the model to perform very high resolution (> 192x192x193) simulations within reasonable wall-clock times. Thus to meet the need to simulate high resolution, fully turbulent oscillatory flows, a new hybrid shared memory / distributed memory parallel model has been developed. Using OpenMP and MPI constructs, this new model implements a highly-accurate pseudo-spectral scheme in an idealized oscillatory bottom boundary layer (OBBL). Data is stored locally and transferred between computational nodes as appropriate such that FFTs used to calculate derivatives in the x and y-directions and the Chebyshev polynomials used to calculated derivatives in the z-direction are calculated completely in-processor. The model is fully configurable at compile time to support: multiple methods of operation (serial or OpenMP, MPI, OpenMP+MPI parallel), available FFT libraries (DFTI, FFTW3), high temporal resolution timing, persistent or non-persistent MPI, etc. Output is fully distributed to support both independent and shared filesystems. At run time, the model automatically selects the best performing algorithms given the computational resources and domain size. Nearly 40 Integrated test routines (derivatives, FFT transformations, eigenvalues, Poission / Helmholtz solvers, etc.) are used to validate individual components of the model. Test simulations have been performed at the
Parallel Power Grid Simulation Toolkit
Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Parallelizing alternating direction implicit solver on GPUs
Technology Transfer Automated Retrieval System (TEKTRAN)
We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Parallel network simulations with NEURON.
Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L
2006-10-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
NASA Technical Reports Server (NTRS)
Combi, Michael R.
2004-01-01
In order to understand the global structure, dynamics, and physical and chemical processes occurring in the upper atmospheres, exospheres, and ionospheres of the Earth, the other planets, comets and planetary satellites and their interactions with their outer particles and fields environs, it is often necessary to address the fundamentally non-equilibrium aspects of the physical environment. These are regions where complex chemistry, energetics, and electromagnetic field influences are important. Traditional approaches are based largely on hydrodynamic or magnetohydrodynamic (MHD) formulations and are very important and highly useful. However, these methods often have limitations in rarefied physical regimes where the molecular collision rates and ion gyrofrequencies are small and where interactions with ionospheres and upper neutral atmospheres are important. At the University of Michigan we have an established base of experience and expertise in numerical simulations based on particle codes which address these physical regimes. The Principal Investigator, Dr. Michael Combi, has over 20 years of experience in the development of particle-kinetic and hybrid kinetichydrodynamics models and their direct use in data analysis. He has also worked in ground-based and space-based remote observational work and on spacecraft instrument teams. His research has involved studies of cometary atmospheres and ionospheres and their interaction with the solar wind, the neutral gas clouds escaping from Jupiter s moon Io, the interaction of the atmospheres/ionospheres of Io and Europa with Jupiter s corotating magnetosphere, as well as Earth s ionosphere. This report describes our progress during the year. The contained in section 2 of this report will serve as the basis of a paper describing the method and its application to the cometary coma that will be continued under a research and analysis grant that supports various applications of theoretical comet models to understanding the
Reservoir Thermal Recover Simulation on Parallel Computers
NASA Astrophysics Data System (ADS)
Li, Baoyan; Ma, Yuanle
The rapid development of parallel computers has provided a hardware background for massive refine reservoir simulation. However, the lack of parallel reservoir simulation software has blocked the application of parallel computers on reservoir simulation. Although a variety of parallel methods have been studied and applied to black oil, compositional, and chemical model numerical simulations, there has been limited parallel software available for reservoir simulation. Especially, the parallelization study of reservoir thermal recovery simulation has not been fully carried out, because of the complexity of its models and algorithms. The authors make use of the message passing interface (MPI) standard communication library, the domain decomposition method, the block Jacobi iteration algorithm, and the dynamic memory allocation technique to parallelize their serial thermal recovery simulation software NUMSIP, which is being used in petroleum industry in China. The parallel software PNUMSIP was tested on both IBM SP2 and Dawn 1000A distributed-memory parallel computers. The experiment results show that the parallelization of I/O has great effects on the efficiency of parallel software PNUMSIP; the data communication bandwidth is also an important factor, which has an influence on software efficiency. Keywords: domain decomposition method, block Jacobi iteration algorithm, reservoir thermal recovery simulation, distributed-memory parallel computer
Icarus: A 2D direct simulation Monte Carlo (DSMC) code for parallel computers. User`s manual - V.3.0
Bartel, T.; Plimpton, S.; Johannes, J.; Payne, J.
1996-10-01
Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modelled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modelled using steric factors derived from Arrhenius reaction rates. Surface chemistry is modelled with surface reaction probabilities. The electron number density is either a fixed external generated field or determined using a local charge neutrality assumption. Ion chemistry is modelled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electrostatic fields can either be externally input or internally generated using a Langmuir-Tonks model. The Icarus software package includes the grid generation, parallel processor decomposition, postprocessing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. The majority of the software packages are written in standard Fortran.
Xyce() Parallel Electronic Simulator
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less
Xyce() Parallel Electronic Simulator
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.
Parallelization and automatic data distribution for nuclear reactor simulations
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Structured building model reduction toward parallel simulation
Dobbs, Justin R.; Hencey, Brondon M.
2013-08-26
Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
Program For Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.
1991-01-01
User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Acoustic simulation in architecture with parallel algorithm
NASA Astrophysics Data System (ADS)
Li, Xiaohong; Zhang, Xinrong; Li, Dan
2004-03-01
In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique
Stochastic Parallel PARticle Kinetic Simulator
2008-07-01
SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore » simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel processing of a rotating shaft simulation
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.
1989-01-01
A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.
Xyce parallel electronic simulator release notes.
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
On extending parallelism to serial simulators
NASA Technical Reports Server (NTRS)
Nicol, David; Heidelberger, Philip
1994-01-01
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
Theory and practice of parallel direct optimization.
Janies, Daniel A; Wheeler, Ward C
2002-01-01
Our ability to collect and distribute genomic and other biological data is growing at a staggering rate (Pagel, 1999). However, the synthesis of these data into knowledge of evolution is incomplete. Phylogenetic systematics provides a unifying intellectual approach to understanding evolution but presents formidable computational challenges. A fundamental goal of systematics, the generation of evolutionary trees, is typically approached as two distinct NP-complete problems: multiple sequence alignment and phylogenetic tree search. The number of cells in a multiple alignment matrix are exponentially related to sequence length. In addition, the number of evolutionary trees expands combinatorially with respect to the number of organisms or sequences to be examined. Biologically interesting datasets are currently comprised of hundreds of taxa and thousands of nucleotides and morphological characters. This standard will continue to grow with the advent of highly automated sequencing and development of character databases. Three areas of innovation are changing how evolutionary computation can be addressed: (1) novel concepts for determination of sequence homology, (2) heuristics and shortcuts in tree-search algorithms, and (3) parallel computing. In this paper and the online software documentation we describe the basic usage of parallel direct optimization as implemented in the software POY (ftp://ftp.amnh.org/pub/molecular/poy).
Parallel Performance of a Combustion Chemistry Simulation
Skinner, Gregg; Eigenmann, Rudolf
1995-01-01
We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
NASA Astrophysics Data System (ADS)
Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.
2016-09-01
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1997-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be
Parallel algorithm strategies for circuit simulation.
Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard
2010-01-01
Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.
Inflated speedups in parallel simulations via malloc()
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.
Parallel Implicit Kinetic Simulation with PARSEK
NASA Astrophysics Data System (ADS)
Stefano, Markidis; Giovanni, Lapenta
2004-11-01
Kinetic plasma simulation is the ultimate tool for plasma analysis. One of the prime tools for kinetic simulation is the particle in cell (PIC) method. The explicit or semi-implicit (i.e. implicit only on the fields) PIC method requires exceedingly small time steps and grid spacing, limited by the necessity to resolve the electron plasma frequency, the Debye length and the speed of light (for fully explicit schemes). A different approach is to consider fully implicit PIC methods where both particles and fields are discretized implicitly. This approach allows radically larger time steps and grid spacing, reducing the cost of a simulation by orders of magnitude while keeping the full kinetic treatment. In our previous work, simulations impossible for the explicit PIC method even on massively parallel computers have been made possible on a single processor machine using the implicit PIC code CELESTE3D [1]. We propose here another quantum leap: PARSEK, a parallel cousin of CELESTE3D, based on the same approach but sporting a radically redesigned software architecture (object oriented C++, where CELESTE3D was structured and written in FORTRAN77/90) and fully parallelized using MPI for both particle and grid communication. [1] G. Lapenta, J.U. Brackbill, W.S. Daughton, Phys. Plasmas, 10, 1577 (2003).
Xyce parallel electronic simulator : reference guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.
Parallel node placement method by bubble simulation
NASA Astrophysics Data System (ADS)
Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang
2014-03-01
An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.
Fracture simulations via massively parallel molecular dynamics
Holian, B.L.; Abraham, F.F.; Ravelo, R.
1993-09-01
Fracture simulations at the atomistic level have heretofore been carried out for relatively small systems of particles, typically 10,000 or less. In order to study anything approaching a macroscopic system, massively parallel molecular dynamics (MD) must be employed. In two spatial dimensions (2D), it is feasible to simulate a sample that is 0.1 {mu}m on a side. We report on recent MD simulations of mode I crack extension under tensile loading at high strain rates. The method of uniaxial, homogeneously expanding periodic boundary conditions was employed to represent tensile stress conditions near the crack tip. The effects of strain rate, temperature, material properties (equation of state and defect energies), and system size were examined. We found that, in order to mimic a bulk sample, several tricks (in addition to expansion boundary conditions) need to be employed: (1) the sample must be pre-strained to nearly the condition at which the crack will spontaneously open; (2) to relieve the stresses at free surfaces, such as the initial notch, annealing by kinetic-energy quenching must be carried out to prevent unwanted rarefactions; (3) sound waves emitted as the crack tip opens and dislocations emitted from the crack tip during blunting must be absorbed by special reservoir regions. The tricks described briefly in this paper will be especially important to carrying out feasible massively parallel 3D simulations via MD.
Parallel Simulated Annealing by Mixing of States
NASA Astrophysics Data System (ADS)
Chu, King-Wai; Deng, Yuefan; Reinitz, John
1999-01-01
We report the results of testing the performance of a new, efficient, and highly general-purpose parallel optimization method, based upon simulated annealing. This optimization algorithm was applied to analyze the network of interacting genes that control embryonic development and other fundamental biological processes. We found several sets of algorithmic parameters that lead to optimal parallel efficiency for up to 100 processors on distributed-memory MIMD architectures. Our strategy contains two major elements. First, we monitor and pool performance statistics obtained simultaneously on all processors. Second, we mix states at intervals to ensure a Boltzmann distribution of energies. The central scientific issue is the inverse problem, the determination of the parameters of a set of nonlinear ordinary differential equations by minimizing the total error between the model behavior and experimental observations.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Parallel Computing Environments and Methods for Power Distribution System Simulation
Lu, Ning; Taylor, Zachary T.; Chassin, David P.; Guttromson, Ross T.; Studham, Scott S.
2005-11-10
The development of cost-effective high-performance parallel computing on multi-processor super computers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution system simulator (PDSS) takes a bottom-up approach and simulates load at appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its run time increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator (PDSS) to a 128-processor shared-memory super computer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performances of the PDSS running on a standalone PC and on the super computer are compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.
Design and implementation of a massively parallel version of DIRECT
He, J.; Verstak, A.; Watson, L.; Sosonkina, M.
2007-10-24
This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.
Empirical study of parallel LRU simulation algorithms
NASA Technical Reports Server (NTRS)
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
A note on parallel efficiency of fire simulation on cluster
NASA Astrophysics Data System (ADS)
Valasek, L.; Glasa, J.
2016-08-01
Current HPC clusters are capable to reduce execution time of parallelized tasks significantly. The paper discusses the use of two selected strategies of cluster computational resources allocation and their impact on parallel efficiency of fire simulation. Simulation of a simple corridor fire scenario by Fire Dynamics Simulator parallelized by the MPI programming model is tested on the HPC cluster at the Institute of Informatics of Slovak Academy of Sciences in Bratislava (Slovakia). The tests confirm that parallelization has a great potential to reduce execution times achieving promising values of parallel efficiency of the simulation, however, the results also show that the use of increasing numbers of computational meshes resulting in increasing numbers of used computational cores does not necessarily decrease the execution time nor the parallel efficiency of simulation. The results obtained indicate that the simulation achieves different values of the execution time and the parallel efficiency in regard of the used strategy for cluster computational resources allocation.
Parallel Proximity Detection for Computer Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1998-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel Proximity Detection for Computer Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1997-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel solver for trajectory optimization search directions
NASA Technical Reports Server (NTRS)
Psiaki, M. L.; Park, K. H.
1992-01-01
A key algorithmic element of a real-time trajectory optimization hardware/software implementation is presented, the search step solver. This is one piece of an algorithm whose overall goal is to make nonlinear trajectory optimization fast enough to provide real-time commands during guidance of a vehicle such as an aeromaneuvering orbiter or the National Aerospace Plane. Many methods of nonlinear programming require the solution of a quadratic program (QP) at each iteration to determine the search step. In the trajectory optimization case, the QP has a special dynamic programming structure. The algorithm exploits this special structure with a divide- and conquer type of parallel implementation. The algorithm solves a (p.N)-stage problem on N processors in O(p + log2 N) operations. The algorithm yields a factor of 8 speed-up over the fastest known serial algorithm when solving a 1024-stage test problem on 32 processors.
Parallel solvers for reservoir simulation on MIMD computers
Piault, E.; Willien, F.; Roux, F.X.
1995-12-01
We have investigated parallel solvers for reservoir simulation. We compare different solvers and preconditioners using T3D and SP1 parallel computers. We use block diagonal domain decomposition preconditioner with non-overlapping sub-domains.
A parallel algorithm for implicit depletant simulations
NASA Astrophysics Data System (ADS)
Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.
2015-11-01
We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ϕc < 0.50, and additionally enables simulation of the fluid-solid transition.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Parallel filtering in global gyrokinetic simulations
NASA Astrophysics Data System (ADS)
Jolliet, S.; McMillan, B. F.; Villard, L.; Vernay, T.; Angelino, P.; Tran, T. M.; Brunner, S.; Bottino, A.; Idomura, Y.
2012-02-01
In this work, a Fourier solver [B.F. McMillan, S. Jolliet, A. Bottino, P. Angelino, T.M. Tran, L. Villard, Comp. Phys. Commun. 181 (2010) 715] is implemented in the global Eulerian gyrokinetic code GT5D [Y. Idomura, H. Urano, N. Aiba, S. Tokuda, Nucl. Fusion 49 (2009) 065029] and in the global Particle-In-Cell code ORB5 [S. Jolliet, A. Bottino, P. Angelino, R. Hatzky, T.M. Tran, B.F. McMillan, O. Sauter, K. Appert, Y. Idomura, L. Villard, Comp. Phys. Commun. 177 (2007) 409] in order to reduce the memory of the matrix associated with the field equation. This scheme is verified with linear and nonlinear simulations of turbulence. It is demonstrated that the straight-field-line angle is the coordinate that optimizes the Fourier solver, that both linear and nonlinear turbulent states are unaffected by the parallel filtering, and that the k∥ spectrum is independent of plasma size at fixed normalized poloidal wave number.
Parallel Simulation of Explosion in AN Unlimited Atmosphere
NASA Astrophysics Data System (ADS)
Ma, Tianbao; Wang, Cheng; Fei, Guanglei; Ning, Jianguo
In this paper, a parallel Eulerian hydrocode for the simulation of large scale complicated explosion and impact problem is developed. The data dependency in the parallel algorithm is studied in particular. As a test, the three dimensional numerical simulation of the explosion field in an unlimited atmosphere is performed. The numerical results are in good agreement with the empirical results, indicating that the proposed parallel algorithm in this paper is valid. Finally, the parallel speedup and parallel efficiency under different dividing domain areas are analyzed.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Massively parallel algorithms for trace-driven cache simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.
1991-01-01
Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
Parallel methods for dynamic simulation of multiple manipulator systems
NASA Technical Reports Server (NTRS)
Mcmillan, Scott; Sadayappan, P.; Orin, David E.
1993-01-01
In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.
High Performance Parallel Methods for Space Weather Simulations
NASA Technical Reports Server (NTRS)
Hunter, Paul (Technical Monitor); Gombosi, Tamas I.
2003-01-01
This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10
Electromagnetic direct implicit PIC simulation
Langdon, A.B.
1983-03-29
Interesting modelling of intense electron flow has been done with implicit particle-in-cell simulation codes. In this report, the direct implicit PIC simulation approach is applied to simulations that include full electromagnetic fields. The resulting algorithm offers advantages relative to moment implicit electromagnetic algorithms and may help in our quest for robust and simpler implicit codes.
Series-parallel method of direct solar array regulation
NASA Technical Reports Server (NTRS)
Gooder, S. T.
1976-01-01
A 40 watt experimental solar array was directly regulated by shorting out appropriate combinations of series and parallel segments of a solar array. Regulation switches were employed to control the array at various set-point voltages between 25 and 40 volts. Regulation to within + or - 0.5 volt was obtained over a range of solar array temperatures and illumination levels as an active load was varied from open circuit to maximum available power. A fourfold reduction in regulation switch power dissipation was achieved with series-parallel regulation as compared to the usual series-only switching for direct solar array regulation.
Efficient Parallel Transaction Level Simulation by Exploiting Temporal Decoupling
NASA Astrophysics Data System (ADS)
Salimi Khaligh, Rauf; Radetzki, Martin
In recent years, transaction level modeling (TLM) has enabled designers to simulate complex embedded systems and SoCs, orders of magnitude faster than simulation at the RTL. The increasing complexity of the systems on one hand, and availability of low cost parallel processing resources on the other hand have motivated the development of parallel simulation environments for TLMs. The existing simulation environments used for parallel simulation of TLMs are intended for general discrete event models and do not take advantage of the specific properties of TLMs. The fine-grain synchronization and communication between simulators in these environments can become a major impediment to the efficiency of the simulation environment. In this work, we exploit the properties of temporally decoupled TLMs to increase the efficiency of parallel simulation. Our approach does not require a special simulation kernel. We have implemented a parallel TLM simulation framework based on the publicly available OSCI SystemC simulator. The framework is based on the communication interfaces proposed in the recent OSCI TLM 2 standard. Our experimental results show the reduced synchronization overhead and improved simulation performance.
Parallel Monte Carlo simulation of multilattice thin film growth
NASA Astrophysics Data System (ADS)
Shu, J. W.; Lu, Qin; Wong, Wai-on; Huang, Han-chen
2001-07-01
This paper describe a new parallel algorithm for the multi-lattice Monte Carlo atomistic simulator for thin film deposition (ADEPT), implemented on parallel computer using the PVM (Parallel Virtual Machine) message passing library. This parallel algorithm is based on domain decomposition with overlapping and asynchronous communication. Multiple lattices are represented by a single reference lattice through one-to-one mappings, with resulting computational demands being comparable to those in the single-lattice Monte Carlo model. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. Results show that the algorithm is highly efficient with large number of processors. The algorithm was implemented on a parallel machine with 50 processors, and it is suitable for parallel Monte Carlo simulation of thin film growth with either a distributed memory parallel computer or a shared memory machine with message passing libraries. In this paper, the significant communication time in parallel MC simulation of thin film growth is effectively reduced by adopting domain decomposition with overlapping between sub-domains and asynchronous communication among processors. The overhead of communication does not increase evidently and speedup shows an ascending tendency when the number of processor increases. A near linear increase in computing speed was achieved with number of processors increases and there is no theoretical limit on the number of processors to be used. The techniques developed in this work are also suitable for the implementation of the Monte Carlo code on other parallel systems.
A compositional reservoir simulator on distributed memory parallel computers
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.
Parallel discrete-event simulation of FCFS stochastic queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
HPC Infrastructure for Solid Earth Simulation on Parallel Computers
NASA Astrophysics Data System (ADS)
Nakajima, K.; Chen, L.; Okuda, H.
2004-12-01
Recently, various types of parallel computers with various types of architectures and processing elements (PE) have emerged, which include PC clusters and the Earth Simulator. Moreover, users can easily access to these computer resources through network on Grid environment. It is well-known that thorough tuning is required for programmers to achieve excellent performance on each computer. The method for tuning strongly depends on the type of PE and architecture. Optimization by tuning is a very tough work, especially for developers of applications. Moreover, parallel programming using message passing library such as MPI is another big task for application programmers. In GeoFEM project (http://gefeom.tokyo.rist.or.jp), authors have developed a parallel FEM platform for solid earth simulation on the Earth Simulator, which supports parallel I/O, parallel linear solvers and parallel visualization. This platform can efficiently hide complicated procedures for parallel programming and optimization on vector processors from application programmers. This type of infrastructure is very useful. Source codes developed on PC with single processor is easily optimized on massively parallel computer by linking the source code to the parallel platform installed on the target computer. This parallel platform, called HPC Infrastructure will provide dramatic efficiency, portability and reliability in development of scientific simulation codes. For example, line number of the source codes is expected to be less than 10,000 and porting legacy codes to parallel computer takes 2 or 3 weeks. Original GeoFEM platform supports only I/O, linear solvers and visualization. In the present work, further development for adaptive mesh refinement (AMR) and dynamic load-balancing (DLB) have been carried out. In this presentation, examples of large-scale solid earth simulation using the Earth Simulator will be demonstrated. Moreover, recent results of a parallel computational steering tool using an
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical
Xyce parallel electronic simulator : users' guide. Version 5.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical
Parallel processor simulator for multiple optic channel architectures
NASA Astrophysics Data System (ADS)
Wailes, Tom S.; Meyer, David G.
1992-12-01
A parallel processing architecture based on multiple channel optical communication is described and compared with existing interconnection strategies for parallel computers. The proposed multiple channel architecture (MCA) uses MQW-DBR lasers to provide a large number of independent, selectable channels (or virtual buses) for data transport. Arbitrary interconnection patterns as well as machine partitions can be emulated via appropriate channel assignments. Hierarchies of parallel architectures and simultaneous execution of parallel tasks are also possible. Described are a basic overview of the proposed architecture, various channel allocation strategies that can be utilized by the MCA, and a summary of advantages of the MCA compared with traditional interconnection techniques. Also describes is a comprehensive multiple processor simulator that has been developed to execute parallel algorithms using the MCA as a data transport mechanism between processors and memory units. Simulation results -- including average channel load, effective channel utilization, and average network latency for different algorithms and different transmission speeds -- are also presented.
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Iterative Schemes for Time Parallelization with Application to Reservoir Simulation
Garrido, I; Fladmark, G E; Espedal, M S; Lee, B
2005-04-18
Parallel methods are usually not applied to the time domain because of the inherit sequentialness of time evolution. But for many evolutionary problems, computer simulation can benefit substantially from time parallelization methods. In this paper, they present several such algorithms that actually exploit the sequential nature of time evolution through a predictor-corrector procedure. This sequentialness ensures convergence of a parallel predictor-corrector scheme within a fixed number of iterations. The performance of these novel algorithms, which are derived from the classical alternating Schwarz method, are illustrated through several numerical examples using the reservoir simulator Athena.
Running Parallel Discrete Event Simulators on Sierra
Barnes, P. D.; Jefferson, D. R.
2015-12-03
In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
A CUDA based parallel multi-phase oil reservoir simulator
NASA Astrophysics Data System (ADS)
Zaza, Ayham; Awotunde, Abeeb A.; Fairag, Faisal A.; Al-Mouhamed, Mayez A.
2016-09-01
Forward Reservoir Simulation (FRS) is a challenging process that models fluid flow and mass transfer in porous media to draw conclusions about the behavior of certain flow variables and well responses. Besides the operational cost associated with matrix assembly, FRS repeatedly solves huge and computationally expensive sparse, ill-conditioned and unsymmetrical linear system. Moreover, as the computation for practical reservoir dimensions lasts for long times, speeding up the process by taking advantage of parallel platforms is indispensable. By considering the state of art advances in massively parallel computing and the accompanying parallel architecture, this work aims primarily at developing a CUDA-based parallel simulator for oil reservoir. In addition to the initial reported 33 times speed gain compared to the serial version, running experiments showed that BiCGSTAB is a stable and fast solver which could be incorporated in such simulations instead of the more expensive, storage demanding and usually utilized GMRES.
Parallel-in-time molecular-dynamics simulations.
Baffico, L; Bernard, S; Maday, Y; Turinici, G; Zérah, G
2002-11-01
While there have been many progress in the field of multiscale simulations in the space domain, in particular, due to efficient parallelization techniques, much less is known in the way to perform similar approaches in the time domain. In this paper we show on two examples that, provided we can describe in a rough but still accurate way the system under consideration, it is indeed possible to parallelize molecular dynamics simulations in time by using the recently introduced pararealalgorithm. The technique is most useful for ab initio simulations. PMID:12513644
Parallel-in-time molecular-dynamics simulations
NASA Astrophysics Data System (ADS)
Baffico, L.; Bernard, S.; Maday, Y.; Turinici, G.; Zérah, G.
2002-11-01
While there have been many progress in the field of multiscale simulations in the space domain, in particular, due to efficient parallelization techniques, much less is known in the way to perform similar approaches in the time domain. In this paper we show on two examples that, provided we can describe in a rough but still accurate way the system under consideration, it is indeed possible to parallelize molecular dynamics simulations in time by using the recently introduced pararealalgorithm. The technique is most useful for ab initio simulations.
Parallel finite element simulation of large ram-air parachutes
NASA Astrophysics Data System (ADS)
Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.
1997-06-01
In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.
Direct Simulation Monte Carlo (DSMC) on the Connection Machine
Wong, B.C.; Long, L.N. )
1992-01-01
The massively parallel computer Connection Machine is utilized to map an improved version of the direct simulation Monte Carlo (DSMC) method for solving flows with the Boltzmann equation. The kinetic theory is required for analyzing hypersonic aerospace applications, and the features and capabilities of the DSMC particle-simulation technique are discussed. The DSMC is shown to be inherently massively parallel and data parallel, and the algorithm is based on molecule movements, cross-referencing their locations, locating collisions within cells, and sampling macroscopic quantities in each cell. The serial DSMC code is compared to the present parallel DSMC code, and timing results show that the speedup of the parallel version is approximately linear. The correct physics can be resolved from the results of the complete DSMC method implemented on the connection machine using the data-parallel approach. 41 refs.
Xyce Parallel Electronic Simulator - User's Guide, Version 1.0
HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.
2002-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models
Asynchronous parallel hybrid optimization combining DIRECT and GSS.
Griffin, Joshua D.; Kolda, Tamara Gibson
2008-10-01
In this talk, we explore the benefits of hybrid optimization using parallel versions of DIRECT and asynchronous generating set search (GSS) for optimization. Both of these methods are derivative-free, making them useful for a variety of science and engineering problems. Our goal is to ideally find a global minimum, but practically to find a good local minimum in a small amount of time. DIRECT is a global search method that systematically divides the search space into ever-smaller rectangles, and GSS is a local search method. The combination of these method guarantees a good local minimum but is better than a purely local approach because it finds more global solutions. We compare the performance of hybrid and non-hybrid methods on a suite of standard global optimization test problems. Overall, the hybrid methods are more robust than the non-hybrid methods at the cost of more function evaluations. In terms of wall-clock time on a parallel system, the hybrid methods are actually less expensive due to nearly perfect parallel load balance and scaling.
Applying Parallel Processing Techniques to Tether Dynamics Simulation
NASA Technical Reports Server (NTRS)
Wells, B. Earl
1996-01-01
The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.
Blocksome, Michael A; Mamidala, Amith R
2014-02-11
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.
Parallelizing N-Body Simulations on a Heterogeneous Cluster
NASA Astrophysics Data System (ADS)
Stenborg, T. N.
2009-10-01
This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.
A tool for simulating parallel branch-and-bound methods
NASA Astrophysics Data System (ADS)
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
A hybrid parallel framework for the cellular Potts model simulations
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Direct drive digital servo press with high parallel control
NASA Astrophysics Data System (ADS)
Murata, Chikara; Yabe, Jun; Endou, Junichi; Hasegawa, Kiyoshi
2013-12-01
Direct drive digital servo press has been developed as the university-industry joint research and development since 1998. On the basis of this result, 4-axes direct drive digital servo press has been developed and in the market on April of 2002. This servo press is composed of 1 slide supported by 4 ball screws and each axis has linearscale measuring the position of each axis with high accuracy less than μm order level. Each axis is controlled independently by servo motor and feedback system. This system can keep high level parallelism and high accuracy even with high eccentric load. Furthermore the 'full stroke full power' is obtained by using ball screws. Using these features, new various types of press forming and stamping have been obtained by development and production. The new stamping and forming methods are introduced and 'manufacturing' need strategy of press forming with high added value and also the future direction of press forming are also introduced.
On the hierarchical parallelization of ab initio simulations
NASA Astrophysics Data System (ADS)
Ruiz-Barragan, Sergi; Ishimura, Kazuya; Shiga, Motoyuki
2016-02-01
A hierarchical parallelization has been implemented in a new unified code PIMD-SMASH for ab initio simulation where the replicas and the Born-Oppenheimer forces are parallelized. It is demonstrated that ab initio path integral molecular dynamics simulations can be carried out very efficiently for systems up to a few tens of water molecules. The code was then used to study a Diels-Alder reaction of cyclopentadiene and butenone by ab initio string method. A reduction in the reaction energy barrier is found in the presence of hydrogen-bonded water, in accordance with experiment.
Parallel runway requirement analysis study. Volume 2: Simulation manual
NASA Technical Reports Server (NTRS)
Ebrahimi, Yaghoob S.; Chun, Ken S.
1993-01-01
This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.
Efficient parallel CFD-DEM simulations using OpenMP
NASA Astrophysics Data System (ADS)
Amritkar, Amit; Deb, Surya; Tafti, Danesh
2014-01-01
The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.
Parallel PDE-Based Simulations Using the Common Component Architecture
McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia
2006-03-05
Summary. The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of componentbased software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and generalpurpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications.
Xyce parallel electronic simulator reference guide, version 6.0.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator : reference guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Xyce parallel electronic simulator reference guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator : reference guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Xyce™ Parallel Electronic Simulator: Reference Guide, Version 5.1
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Santarelli, Keith R.; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S.; Pawlowski, Roger P.
2009-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide.
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.
Dematté, Lorenzo
2012-01-01
Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
The parallel subdomain-levelset deflation method in reservoir simulation
NASA Astrophysics Data System (ADS)
van der Linden, J. H.; Jönsthövel, T. B.; Lukyanov, A. A.; Vuik, C.
2016-01-01
Extreme and isolated eigenvalues are known to be harmful to the convergence of an iterative solver. These eigenvalues can be produced by strong heterogeneity in the underlying physics. We can improve the quality of the spectrum by 'deflating' the harmful eigenvalues. In this work, deflation is applied to linear systems in reservoir simulation. In particular, large, sudden differences in the permeability produce extreme eigenvalues. The number and magnitude of these eigenvalues is linked to the number and magnitude of the permeability jumps. Two deflation methods are discussed. Firstly, we state that harmonic Ritz eigenvector deflation, which computes the deflation vectors from the information produced by the linear solver, is unfeasible in modern reservoir simulation due to high costs and lack of parallelism. Secondly, we test a physics-based subdomain-levelset deflation algorithm that constructs the deflation vectors a priori. Numerical experiments show that both methods can improve the performance of the linear solver. We highlight the fact that subdomain-levelset deflation is particularly suitable for a parallel implementation. For cases with well-defined permeability jumps of a factor 104 or higher, parallel physics-based deflation has potential in commercial applications. In particular, the good scalability of parallel subdomain-levelset deflation combined with the robust parallel preconditioner for deflated system suggests the use of this method as an alternative for AMG.
A space time-ensemble parallel nudged elastic band algorithm for molecular kinetics simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
2008-02-01
A scalable parallel algorithm has been designed to study long-time dynamics of many-atom systems based on the nudged elastic band method, which performs mutually constrained molecular dynamics simulations for a sequence of atomic configurations (or states) to obtain a minimum energy path between initial and final local minimum-energy states. A directionally heated nudged elastic band method is introduced to search for thermally activated events without the knowledge of final states, which is then applied to an ensemble of bands in a path ensemble method for long-time simulation in the framework of the transition state theory. The resulting molecular kinetics (MK) simulation method is parallelized with a space-time-ensemble parallel nudged elastic band (STEP-NEB) algorithm, which employs spatial decomposition within each state, while temporal parallelism across the states within each band and band-ensemble parallelism are implemented using a hierarchy of communicator constructs in the Message Passing Interface library. The STEP-NEB algorithm exhibits good scalability with respect to spatial, temporal and ensemble decompositions on massively parallel computers. The MK simulation method is used to study low strain-rate deformation of amorphous silica.
Computer Science Techniques Applied to Parallel Atomistic Simulation
NASA Astrophysics Data System (ADS)
Nakano, Aiichiro
1998-03-01
Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.
Scalable parallel solution coupling for multi-physics reactor simulation.
Tautges, T. J.; Caceres, A.; Mathematics and Computer Science
2009-01-01
Reactor simulation depends on the coupled solution of various physics types, including neutronics, thermal/hydraulics, and structural mechanics. This paper describes the formulation and implementation of a parallel solution coupling capability being developed for reactor simulation. The coupling process consists of mesh and coupler initialization, point location, field interpolation, and field normalization. We report here our test of this capability on an example problem, namely, a reflector assembly from an advanced burner test reactor. Performance of this coupler in parallel is reasonable for the chosen problem size and range of processor counts. The runtime is dominated by startup costs, which amortize over the entire coupled simulation. Future efforts will include adding more sophisticated interpolation and normalization methods, to accommodate different numerical solvers used in various physics modules and to obtain better conservation properties for certain field types.
Reusable Component Model Development Approach for Parallel and Distributed Simulation
Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng
2014-01-01
Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Potts-model grain growth simulations: Parallel algorithms and applications
Wright, S.A.; Plimpton, S.J.; Swiler, T.P.
1997-08-01
Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.
Xyce Parallel Electronic Simulator Users Guide Version 6.2.
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-09-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Model for the evolution of the time profile in optimistic parallel discrete event simulations
NASA Astrophysics Data System (ADS)
Ziganurova, L.; Novotny, M. A.; Shchur, L. N.
2016-02-01
We investigate synchronisation aspects of an optimistic algorithm for parallel discrete event simulations (PDES). We present a model for the time evolution in optimistic PDES. This model evaluates the local virtual time profile of the processing elements. We argue that the evolution of the time profile is reminiscent of the surface profile in the directed percolation problem and in unrestricted surface growth. We present results of the simulation of the model and emphasise predictive features of our approach.
Multi-directional search: A direct search algorithm for parallel machines
Torczon, V.J.
1989-01-01
In recent years there has been a great deal in the development of optimization algorithms which exploit the computational power of parallel computer architectures. The author has developed a new direct search algorithm, which he calls multi-directional search, that is ideally suited for parallel computation. His algorithm belongs to the class of direct search methods, a class of optimization algorithms which neither compute nor approximate any derivatives of the objective function. His work, in fact, was inspired by the simplex method of Spendley, Hext, and Himsworth, and the simplex method of Nelder and Mead. The multi-directional search algorithm is inherently parallel. The basic idea of the algorithm is to perform concurrent searches in multiple directions. These searches are free of any interdependencies, so the information required can be computed in parallel. A central result of his work is the convergence analysis for his algorithm. By requiring only that the function be continuously differentiable over a bounded level set, he can prove that a subsequence of the points generated by the multi-directional search algorithm converges to a stationary point of the objective function. This is of great interest since he knows of few convergence results for practical direct search algorithms. He also presents numerical results indicating that the multidirectional search algorithm is robust, even in the presence of noise. His results include comparisons with the Nelder-Mead simplex algorithm, the method of steepest descent, and a quasi-Newton method. One surprising conclusion of his numerical tests is that the Nelder-Mead simplex algorithm is not robust. He closes with some comments about future directions of research.
Numerical simulation of supersonic wake flow with parallel computers
Wong, C.C.; Soetrisno, M.
1995-07-01
Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.
The cost of conservative synchronization in parallel discrete event simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Casting pearls ballistically: Efficient massively parallel simulation of particle deposition
Lubachevsky, B.D.; Privman, V.; Roy, S.C.
1996-06-01
We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.
Non-intrusive parallelization of multibody system dynamic simulations
NASA Astrophysics Data System (ADS)
González, Francisco; Luaces, Alberto; Lugrís, Urbano; González, Manuel
2009-09-01
This paper evaluates two non-intrusive parallelization techniques for multibody system dynamics: parallel sparse linear equation solvers and OpenMP. Both techniques can be applied to existing simulation software with minimal changes in the code structure; this is a major advantage over Message Passing Interface, the standard parallelization method in multibody dynamics. Both techniques have been applied to parallelize a starting sequential implementation of a global index-3 augmented Lagrangian formulation combined with the trapezoidal rule as numerical integrator, in order to solve the forward dynamics of a variable-loop four-bar mechanism. Numerical experiments have been performed to measure the efficiency as a function of problem size and matrix filling. Results show that the best parallel solver (Pardiso) performs better than the best sequential solver (CHOLMOD) for multibody problems of large and medium sizes leading to matrix fillings above 10. OpenMP also proved to be advantageous even for problems of small sizes. Both techniques delivered speedups above 70% of the maximum theoretical values for a wide range of multibody problems.
Li, Qiyue; Qu, Xiaobo; Liu, Yunsong; Guo, Di; Lai, Zongying; Ye, Jing; Chen, Zhong
2015-06-01
Compressed sensing MRI (CS-MRI) is a promising technology to accelerate magnetic resonance imaging. Both improving the image quality and reducing the computation time are important for this technology. Recently, a patch-based directional wavelet (PBDW) has been applied in CS-MRI to improve edge reconstruction. However, this method is time consuming since it involves extensive computations, including geometric direction estimation and numerous iterations of wavelet transform. To accelerate computations of PBDW, we propose a general parallelization of patch-based processing by taking the advantage of multicore processors. Additionally, two pertinent optimizations, excluding smooth patches and pre-arranged insertion sort, that make use of sparsity in MR images are also proposed. Simulation results demonstrate that the acceleration factor with the parallel architecture of PBDW approaches the number of central processing unit cores, and that pertinent optimizations are also effective to make further accelerations. The proposed approaches allow compressed sensing MRI reconstruction to be accomplished within several seconds.
Parallel algorithms for simulating continuous time Markov chains
NASA Technical Reports Server (NTRS)
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
Synchronous parallel system for emulation and discrete event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1992-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.
Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.
2005-06-01
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to
Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman
2014-11-01
Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Evaluation of parallel direct sparse linear solvers in electromagnetic geophysical problems
NASA Astrophysics Data System (ADS)
Puzyrev, Vladimir; Koric, Seid; Wilkin, Scott
2016-04-01
High performance computing is absolutely necessary for large-scale geophysical simulations. In order to obtain a realistic image of a geologically complex area, industrial surveys collect vast amounts of data making the computational cost extremely high for the subsequent simulations. A major computational bottleneck of modeling and inversion algorithms is solving the large sparse systems of linear ill-conditioned equations in complex domains with multiple right hand sides. Recently, parallel direct solvers have been successfully applied to multi-source seismic and electromagnetic problems. These methods are robust and exhibit good performance, but often require large amounts of memory and have limited scalability. In this paper, we evaluate modern direct solvers on large-scale modeling examples that previously were considered unachievable with these methods. Performance and scalability tests utilizing up to 65,536 cores on the Blue Waters supercomputer clearly illustrate the robustness, efficiency and competitiveness of direct solvers compared to iterative techniques. Wide use of direct methods utilizing modern parallel architectures will allow modeling tools to accurately support multi-source surveys and 3D data acquisition geometries, thus promoting a more efficient use of the electromagnetic methods in geophysics.
Mapping a battlefield simulation onto message-passing parallel architectures
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos
2015-11-01
The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.
Conservative parallel simulation of priority class queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.
Xyce Parallel Electronic Simulator Users Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Development of magnetron sputtering simulator with GPU parallel computing
NASA Astrophysics Data System (ADS)
Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil
2014-12-01
Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that
Numerical Simulation of Flow Field Within Parallel Plate Plastometer
NASA Technical Reports Server (NTRS)
Antar, Basil N.
2002-01-01
Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
Schneider, Evan E.; Robertson, Brant E.
2015-04-15
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
Simulation of hypervelocity impact on massively parallel supercomputer
Fang, H.E.
1994-12-31
Hypervelocity impact studies are important for debris shield and armor/anti-armor research and development. Numerical simulations are frequently performed to complement experimental studies, and to evaluate code accuracy. Parametric computational studies involving material properties, geometry and impact velocity can be used to understand hypervelocity impact processes. These impact simulations normally need to address shock wave physics phenomena, material deformation and failure, and motion of debris particles. Detailed, three-dimensional calculations of such events have large memory and processing time requirements. At Sandia National Laboratories, many impact problems of interest require tens of millions of computational cells. Furthermore, even the inadequately resolved problems often require tens or hundred of Cray CPU hours to complete. Recent numerical studies done by Grady and Kipp at Sandia using the Eulerian shock wave physics code CTH demonstrated very good agreement with many features of a copper sphere-on-steel plate oblique impact experiment, fully utilizing the compute power and memory of Sandia`s Cray supercomputer. To satisfy requirements for more finely resolved simulations in order to obtain a better understanding of the crater formation process and impact ejecta motion, the numerical work has been moved from the shared-memory Cray to a large, distributed-memory, massively parallel supercomputing system using PCTH, a parallel version of CTH. The current work is a continuation of the studies, but done on Sandia`s Intel 1840-processor Paragon X/PS parallel computer. With the great compute power and large memory provided by the Paragon, a highly detailed PCTH calculation has been completed for the copper sphere impacting steel plate experiment. Although the PCTH calculation used a mesh which is 4.5 times bigger than the original Cray setup, it finished in much less CPU time.
New Directions in Maintenance Simulation.
ERIC Educational Resources Information Center
Miller, Gary G.
A two-phase effort was conducted to design and evaluate a maintenance simulator which incorporated state-of-the-art information in simulation and instructional technology. The particular equipment selected to be simulated was the 6883 Convert/Flight Controls Test Station. Phase I included a generalized block diagram of the computer-trainer, the…
Plimpton, Steve; Thompson, Aidan; Crozier, Paul
LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.
Finding Low-Temperature States with Parallel Tempering, Simulated Annealing and Simple Monte Carlo
NASA Astrophysics Data System (ADS)
Moreno, J. J.; Katzgraber, Helmut G.; Hartmann, Alexander K.
Monte Carlo simulation techniques, like simulated annealing and parallel tempering, are often used to evaluate low-temperature properties and find ground states of disordered systems. Here we compare these methods using direct calculations of ground states for three-dimensional Ising diluted antiferromagnets in a field (DAFF) and three-dimensional Ising spin glasses (ISG). For the DAFF, we find that, with respect to obtaining ground states, parallel tempering is superior to simple Monte Carlo and to simulated annealing. However, equilibration becomes more difficult with increasing magnitude of the externally applied field. For the ISG with bimodal couplings, which exhibits a high degeneracy, we conclude that finding true ground states is easy for small systems, as is already known. But finding each of the degenerate ground states with the same probability (or frequency), as required by Boltzmann statistics, is considerably harder and becomes almost impossible for larger systems.
A Generic Scheduling Simulator for High Performance Parallel Computers
Yoo, B S; Choi, G S; Jette, M A
2001-08-01
It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of
Parallel 3D Multi-Stage Simulation of a Turbofan Engine
NASA Technical Reports Server (NTRS)
Turner, Mark G.; Topp, David A.
1998-01-01
A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.
Roadmap for efficient parallelization of breast anatomy simulation
NASA Astrophysics Data System (ADS)
Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.
2012-03-01
A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.
NASA Astrophysics Data System (ADS)
Shimizu, Futoshi; Kimizuka, Hajime; Kaburaki, Hideo
2002-08-01
A new parallel computing environment, called as ``Parallel Molecular Dynamics Stencil'', has been developed to carry out a large-scale short-range molecular dynamics simulation of solids. The stencil is written in C language using MPI for parallelization and designed successfully to separate and conceal parts of the programs describing cutoff schemes and parallel algorithms for data communication. This has been made possible by introducing the concept of image atoms. Therefore, only a sequential programming of the force calculation routine is required for executing the stencil in parallel environment. Typical molecular dynamics routines, such as various ensembles, time integration methods, and empirical potentials, have been implemented in the stencil. In the presentation, the performance of the stencil on parallel computers of Hitachi, IBM, SGI, and PC-cluster using the models of Lennard-Jones and the EAM type potentials for fracture problem will be reported.
NASA Astrophysics Data System (ADS)
Sharma, Anupam; Long, Lyle N.
2004-10-01
A particle approach using the Direct Simulation Monte Carlo (DSMC) method is used to solve the problem of blast impact with structures. A novel approach to model the solid boundary condition for particle methods is presented. The solver is validated against an analytical solution of the Riemann shocktube problem and against experiments on interaction of a planar shock with a square cavity. Blast impact simulations are performed for two model shapes, a box and an I-shaped beam, assuming that the solid body does not deform. The solver uses domain decomposition technique to run in parallel. The parallel performance of the solver on two Beowulf clusters is also presented.
Parallel grid library for rapid and flexible simulation development
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2013-04-01
We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
A new parallel P3M code for very large-scale cosmological simulations
NASA Astrophysics Data System (ADS)
MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob
1998-12-01
We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.
Particle-in-cell simulation using parallel techniques
NASA Astrophysics Data System (ADS)
Hanzlikova, N.; Leggate, H.; Turner, M. M.
2011-10-01
Particle-in-cell simulation is an accurate but computationally expensive approach to modelling low-temperature plasma. Consequently, implementations of this method should preferably make efficient use of computer resources. In modern hardware, such resources typically include a high degree of parallelism, using facilities such as vectorisation and multi-threading. Capabilities of this kind appear in both general purpose processors and in more specialised hardware such as graphical processing units. In principle, very large improvements in performance can be achieved by exploiting such hardware. This paper discusses particle-in-cell implementation using features of this kind. We will show that accelerations in excess of an order of magnitude are quite easily achieved, and that considerably greater performance is likely to be achieved with specialized hardware.
Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash
NASA Astrophysics Data System (ADS)
Luo, Y.; Chen, S. Y.; Huang, J.; Xiong, Y. Y.; Tang, C. J.
2016-04-01
The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognized as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-28
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-01-01
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
NASA Astrophysics Data System (ADS)
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-01
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Direct electron detection for TEM with column parallel CCD
NASA Astrophysics Data System (ADS)
Moldovan, Grigore; Jeffery, Ben; Nomerotski, Andrei; Kirkland, Angus
2009-06-01
Step improvements in electron detectors are needed for transmission electron microscopes (TEM) to take full advantage of latest developments in electron optics and electron sources. This work presents beam tests performed with column parallel charge-coupled devices (CPCCD) and discusses measured cluster properties. Excellent signal-to-noise ratio is found, demonstrating that CPCCD are well suited for TEM. Large variations in cluster-integrated intensity and size are observed, attributed to the strong scattering of 120-200 keV electrons in silicon. Spatial momentum analysis of clusters reveals strong asymmetry indicating that position resolution could be limited for certain cluster shapes.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Li, Qiyue; Qu, Xiaobo; Liu, Yunsong; Guo, Di; Lai, Zongying; Ye, Jing; Chen, Zhong
2015-06-01
Compressed sensing MRI (CS-MRI) is a promising technology to accelerate magnetic resonance imaging. Both improving the image quality and reducing the computation time are important for this technology. Recently, a patch-based directional wavelet (PBDW) has been applied in CS-MRI to improve edge reconstruction. However, this method is time consuming since it involves extensive computations, including geometric direction estimation and numerous iterations of wavelet transform. To accelerate computations of PBDW, we propose a general parallelization of patch-based processing by taking the advantage of multicore processors. Additionally, two pertinent optimizations, excluding smooth patches and pre-arranged insertion sort, that make use of sparsity in MR images are also proposed. Simulation results demonstrate that the acceleration factor with the parallel architecture of PBDW approaches the number of central processing unit cores, and that pertinent optimizations are also effective to make further accelerations. The proposed approaches allow compressed sensing MRI reconstruction to be accomplished within several seconds. PMID:25620521
Xyce Parallel Electronic Simulator Reference Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M H; Najafi, Bijan
2012-11-14
We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ~10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm(-3), bubble formation and restructuring of the water layers are observed.
NASA Astrophysics Data System (ADS)
Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M. H.; Najafi, Bijan
2012-11-01
We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ˜10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm-3, bubble formation and restructuring of the water layers are observed.
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.
NASA Astrophysics Data System (ADS)
Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan
2016-01-01
3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.
Ames testing of Direct Black 38 parallels carcinogenicity testing.
Gregory, A R; Elliott, J; Kluge, P
1981-12-01
Studies have established that Direct Black 38 and two other benzidine-based dyes are carcinogenic. The carcinogenic effect has been generally considered attributable to the metabolic release of benzidine from Direct Black 38 and similar dyes. However, Ames tests indicated that when Direct Black 38 is reduced with sodium dithionate it is more mutagenic than can be accounted for by complete release of all the benzidine present in the dye molecule. While most dyes are not mutagenic when tested with S-9, a series of benzidine congener dyes were all found to be mutagenic with either TA 98 or TA 100 strains, if the dyes were first reduced with sodium dithionate. Unreduced dyes were not mutagenic. Neither anaerobic conditions nor addition of riboflavin induced mutagenicity of these dyes under the condition of our experiments.
On the suitability of the connection machine for direct particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonard
1990-01-01
The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Lattice Boltzmann modeling of directional wetting: comparing simulations to experiments.
Jansen, H Patrick; Sotthewes, Kai; van Swigchem, Jeroen; Zandvliet, Harold J W; Kooij, E Stefan
2013-07-01
Lattice Boltzmann Modeling (LBM) simulations were performed on the dynamic behavior of liquid droplets on chemically striped patterned surfaces, ultimately with the aim to develop a predictive tool enabling reliable design of future experiments. The simulations accurately mimic experimental results, which have shown that water droplets on such surfaces adopt an elongated shape due to anisotropic preferential spreading. Details of the contact line motion such as advancing of the contact line in the direction perpendicular to the stripes exhibit pronounced similarities in experiments and simulations. The opposite of spreading, i.e., evaporation of water droplets, leads to a characteristic receding motion first in the direction parallel to the stripes, while the contact line remains pinned perpendicular to the stripes. Only when the aspect ratio is close to unity, the contact line also starts to recede in the perpendicular direction. Very similar behavior was observed in the LBM simulations. Finally, droplet movement can be induced by a gradient in surface wettability. LBM simulations show good semiquantitative agreement with experimental results of decanol droplets on a well-defined striped gradient, which move from high- to low-contact angle surfaces. Similarities and differences for all systems are described and discussed in terms of the predictive capabilities of LBM simulations to model direction wetting. PMID:23944550
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Contact-impact simulations on massively parallel SIMD supercomputers
Plaskacz, E.J. ); Belytscko, T.; Chiang, H.Y. )
1992-01-01
The implementation of explicit finite element methods with contact-impact on massively parallel SIMD computers is described. The basic parallel finite element algorithm employs an exchange process which minimizes interprocessor communication at the expense of redundant computations and storage. The contact-impact algorithm is based on the pinball method in which compatibility is enforced by preventing interpenetration on spheres embedded in elements adjacent to surfaces. The enhancements to the pinball algorithm include a parallel assembled surface normal algorithm and a parallel detection of interpenetrating pairs. Some timings with and without contact-impact are given.
NASA Astrophysics Data System (ADS)
Thummala, P.; Zhang, Z.; Andersen, M. A. E.; Rahimullah, S.
2014-03-01
Dielectric electroactive polymer (DEAP) actuators are capacitive devices which provide mechanical motions when charged electrically. The charging characteristics of a DEAP actuator depends on its size, voltage applied to its electrodes, and its operating frequency. The main idea of this paper is to design and implement driving circuits for the DEAP actuators for their use in various applications. This paper presents implementation of parallel input, parallel output, high voltage (~2.5 kV) bi-directional DC-DC converters for driving the DEAP actuators. The topology is a bidirectional flyback DC-DC converter incorporating commercially available high voltage MOSFETs (4 kV) and high voltage diodes (5 kV). Although the average current of the aforementioned devices is limited to 300 mA and 150 mA, respectively, connecting the outputs of multiple converters in parallel can provide a scalable design. This enables operating the DEAP actuators in various static and dynamic applications e.g. positioning, vibration generation or damping, and pumps. The proposed idea is experimentally verified by connecting three high voltage converters in parallel to operate a single DEAP actuator. The experimental results with both film capacitive load and the DEAP actuator are shown for a maximum charging voltage of 2 kV.
NASA Astrophysics Data System (ADS)
Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles
2007-11-01
Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.
Rideaux, Reuben; Apthorp, Deborah; Edwards, Mark
2015-01-01
Recent findings have indicated the capacity to consolidate multiple items into visual short-term memory in parallel varies as a function of the type of information. That is, while color can be consolidated in parallel, evidence suggests that orientation cannot. Here we investigated the capacity to consolidate multiple motion directions in parallel and reexamined this capacity using orientation. This was achieved by determining the shortest exposure duration necessary to consolidate a single item, then examining whether two items, presented simultaneously, could be consolidated in that time. The results show that parallel consolidation of direction and orientation information is possible, and that parallel consolidation of direction appears to be limited to two. Additionally, we demonstrate the importance of adequate separation between feature intervals used to define items when attempting to consolidate in parallel, suggesting that when multiple items are consolidated in parallel, as opposed to serially, the resolution of representations suffer. Finally, we used facilitation of spatial attention to show that the deterioration of item resolution occurs during parallel consolidation, as opposed to storage. PMID:25761335
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-14
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A.; Mamidala, Amith R.
2015-07-07
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Characterization of parallel-hole collimator using Monte Carlo Simulation
Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh
2015-01-01
Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730
A two-level parallel direct search implementation for arbitrarily sized objective functions
Hutchinson, S.A.; Shadid, N.; Moffat, H.K.
1994-12-31
In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
Short-term gas dispersion in idealised urban canopy in street parallel with flow direction
NASA Astrophysics Data System (ADS)
Chaloupecká, Hana; Jaňour, Zbyněk; Nosek, Štěpán
2016-03-01
Chemical attacks (e.g. Syria 2014-15 chlorine, 2013 sarine or Iraq 2006-7 chlorine) as well as chemical plant disasters (e.g. Spain 2015 nitric oxide, ferric chloride; Texas 2014 methyl mercaptan) threaten mankind. In these crisis situations, gas clouds are released. Dispersion of gas clouds is the issue of interest investigated in this paper. The paper describes wind tunnel experiments of dispersion from ground level point gas source. The source is situated in a model of an idealised urban canopy. The short duration releases of passive contaminant ethane are created by an electromagnetic valve. The gas cloud concentrations are measured in individual places at the height of the human breathing zone within a street parallel with flow direction by Fast-response Ionisation Detector. The simulations of the gas release for each measurement position are repeated many times under the same experimental set up to obtain representative datasets. These datasets are analysed to compute puff characteristics (arrival, leaving time and duration). The results indicate that the mean value of the dimensionless arrival time can be described as a growing linear function of the dimensionless coordinate in the street parallel with flow direction where the gas source is situated. The same might be stated about the dimensionless leaving time as well as the dimensionless duration, however these fits are worse. Utilising a linear function, we might also estimate some other statistical characteristics from datasets than the datasets means (medians, trimeans). The datasets of the dimensionless arrival time, the dimensionless leaving time and the dimensionless duration can be fitted by the generalized extreme value distribution (GEV) in all sampling positions except one.
Lombardini, Manuel; Deiterding, Ralf
2010-01-01
This paper presents the use of a dynamically adaptive mesh refinement strategy for the simulations of shock-driven turbulent mixing. Large-eddy simulations are necessary due the high Reynolds number turbulent regime. In this approach, the large scales are simulated directly and small scales at which the viscous dissipation occurs are modeled. A low-numerical centered finite-difference scheme is used in turbulent flow regions while a shock-capturing method is employed to capture shocks. Three-dimensional parallel simulations of the Richtmyer-Meshkov instability performed in plane and converging geometries are described.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
Understanding transition and turbulence through direct simulations
NASA Technical Reports Server (NTRS)
Spalart, P. R.; Kim, J. J.
1989-01-01
Direct simulations consist in solving the full Navier-Stokes equations, without any turbulence model, and describing all the detailed features of the flow. Usually the flows are three-dimensional and time-dependent and contain both coarse and fine structures, which makes the numerical task very challenging in terms of both the algorithm and the computational effort. Most of the work until now has involved spectral methods, which are highly accurate but not very flexible in terms of geometry or complex equations. For that reason, future work will also rely on high-order finite-difference or other methods. Direct simulations complement experimental work, and both contribute to the theory and the empirical knowledge of turbulence. Once such a simulation has been shown to be accurate, the flow field is completely known in three dimensions and time, including the pressure, the vorticity and any other quantity. On the other hand, most simulations to date solved the incompressible equations in rather simple geometries, and direct simulations will always be limited to moderate Reynolds numbers. Extensive simulations have been conducted in homogeneous turbulence, channel flows, boundary layers, and mixing layers. Much effort is devoted to addressing flows with compressibility and chemical reactions, and to new geometries such as a backward-facing step.
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Parallel 3D Simulation of Seismic Wave Propagation in the Structure of Nobi Plain, Central Japan
NASA Astrophysics Data System (ADS)
Kotani, A.; Furumura, T.; Hirahara, K.
2003-12-01
We performed large-scale parallel simulations of the seismic wave propagation to understand the complex wave behavior in the 3D basin structure of the Nobi Plain, which is one of the high population cities in central Japan. In this area, many large earthquakes occurred in the past, such as the 1891 Nobi earthquake (M8.0), the 1944 Tonankai earthquake (M7.9) and the 1945 Mikawa earthquake (M6.8). In order to mitigate the potential disasters for future earthquakes, 3D subsurface structure of Nobi Plain has recently been investigated by local governments. We referred to this model together with bouguer anomaly data to construct a detail 3D basin structure model for Nobi plain, and conducted computer simulations of ground motions. We first evaluated the ground motions for two small earthquakes (M4~5); one occurred just beneath the basin edge at west, and the other occurred at south. The ground motions from these earthquakes were well recorded by the strong motion networks; K-net, Kik-net, and seismic intensity instruments operated by local governments. We compare the observed seismograms with simulations to validate the 3D model. For the 3D simulation we sliced the 3D model into a number of layers to assign to many processors for concurrent computing. The equation of motions are solved using a high order (32nd) staggered-grid FDM in horizontal directions, and a conventional (4th-order) FDM in vertical direction with the MPI inter-processor communications between neighbor region. The simulation model is 128km by 128km by 43km, which is discritized at variable grid size of 62.5-125m in horizontal directions and of 31.25-62.5m in vertical direction. We assigned a minimum shear wave velocity is Vs=0.4km/s, at the top of the sedimentary basin. The seismic sources for the small events are approximated by double-couple point source and we simulate the seismic wave propagation at maximum frequency of 2Hz. We used the Earth Simulator (JAMSTEC, Yokohama Inst) to conduct such
ANNarchy: a code generation approach to neural simulations on parallel hardware.
Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
ANNarchy: a code generation approach to neural simulations on parallel hardware.
Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Xu, Rongda; Wang, Tao; Isbell, John; Cai, Zhe; Sykes, Christopher; Brailsford, Andrew; Kassel, Daniel B
2002-07-01
We report on the development of a parallel HPLC/MS purification system incorporating an indexed (i.e., multiplexed) ion source. In the method described, each of the flow streams from a parallel array of HPLC columns is directed toward the multiplexed (MUX) ion source and sampled in a time-dependent, parallel manner. A visual basic application has been developed and monitors in real-time the extracted ion current from each sprayer channel. Mass-directed fraction collection is initiated into a parallel array of fraction collectors specific for each of the spray channels. In the first embodiment of this technique, we report on a four-column semipreparative parallel LC/MS system incorporating MUX detection. In this parallel LC/MS application (in which sample loads between 1 and 10 mg on-column are typically made), no cross talk was observed. Ion signals from each of the channels were found reproducible over 192 injections, with interchannel signal variations between 11 and 17%. The visual basic fraction collection application permits preset individual start collection and end collection thresholds for each channel, thereby compensating for the slight variation in signal between sprayers. By incorporating postfraction collector UV detection, we have been able to optimize the valve-triggering delay time with precut transfer tubing between the mass spectrometer and fraction collectors and achieve recoveries greater than 80%. Examples of the MUX-guided, mass-directed fraction purification of both standards and real library reaction mixtures are presented within.
Xu, Rongda; Wang, Tao; Isbell, John; Cai, Zhe; Sykes, Christopher; Brailsford, Andrew; Kassel, Daniel B
2002-07-01
We report on the development of a parallel HPLC/MS purification system incorporating an indexed (i.e., multiplexed) ion source. In the method described, each of the flow streams from a parallel array of HPLC columns is directed toward the multiplexed (MUX) ion source and sampled in a time-dependent, parallel manner. A visual basic application has been developed and monitors in real-time the extracted ion current from each sprayer channel. Mass-directed fraction collection is initiated into a parallel array of fraction collectors specific for each of the spray channels. In the first embodiment of this technique, we report on a four-column semipreparative parallel LC/MS system incorporating MUX detection. In this parallel LC/MS application (in which sample loads between 1 and 10 mg on-column are typically made), no cross talk was observed. Ion signals from each of the channels were found reproducible over 192 injections, with interchannel signal variations between 11 and 17%. The visual basic fraction collection application permits preset individual start collection and end collection thresholds for each channel, thereby compensating for the slight variation in signal between sprayers. By incorporating postfraction collector UV detection, we have been able to optimize the valve-triggering delay time with precut transfer tubing between the mass spectrometer and fraction collectors and achieve recoveries greater than 80%. Examples of the MUX-guided, mass-directed fraction purification of both standards and real library reaction mixtures are presented within. PMID:12141664
Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069
Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Parallel computing simulation of electrical excitation and conduction in the 3D human heart.
Di Yu; Dongping Du; Hui Yang; Yicheng Tu
2014-01-01
A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart simulation model with the use of massive parallel computing with GPGPU and OpenGL. The simulation algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based simulation outperforms the conventional CPU-based approach and significantly improves the speed of simulation. By adopting modern computer architecture, this present investigation enables real-time simulation and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart.
Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; Visser, Sid; Stevens, Rick L.; Wildeman, Albert; van Drongelen, Wim
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less
NASA Astrophysics Data System (ADS)
Weiss, C. J.; Schultz, A.
2011-12-01
The high computational cost of the forward solution for modeling low-frequency electromagnetic induction phenomena is one of the primary impediments against broad-scale adoption by the geoscience community of exploration techniques, such as magnetotellurics and geomagnetic depth sounding, that rely on fast and cheap forward solutions to make tractable the inverse problem. As geophysical observables, electromagnetic fields are direct indicators of Earth's electrical conductivity - a physical property independent of (but in some cases correlative with) seismic wavespeed. Electrical conductivity is known to be a function of Earth's physiochemical state and temperature, and to be especially sensitive to the presence of fluids, melts and volatiles. Hence, electromagnetic methods offer a critical and independent constraint on our understanding of Earth's interior processes. Existing methods for parallelization of time-harmonic electromagnetic simulators, as applied to geophysics, have relied heavily on a combination of strategies: coarse-grained decompositions of the model domain; and/or, a high-order functional decomposition across spectral components, which in turn can be domain-decomposed themselves. Hence, in terms of scaling, both approaches are ultimately limited by the growing communication cost as the granularity of the forward problem increases. In this presentation we examine alternate parallelization strategies based on OpenMP shared-memory parallelization and CUDA-based GPU parallelization. As a test case, we use two different numerical simulation packages, each based on a staggered Cartesian grid: FDM3D (Weiss, 2006) which solves the curl-curl equation directly in terms of the scattered electric field (available under the LGPL at www.openem.org); and APHID, the A-Phi Decomposition based on mixed vector and scalar potentials, in which the curl-curl operator is replaced operationally by the vector Laplacian. We describe progress made in modifying the code to
A direct Vlasov simulation of nonlinear plasma waves
NASA Astrophysics Data System (ADS)
Hara, Kentaro; Boyd, Iain; Kaganovich, Igor
2013-10-01
A direct Vlasov simulation, which solves the collisionless Vlasov equation directly on a discretized phase space, achieves good resolution of velocity distribution functions in comparison to particle methods. In this presentation, nonlinear electron plasma waves (EPWs) and ion acoustic waves (IAWs) are investigated with a fully-kinetic one-dimensional Vlasov simulation. A parallelized Vlasov simulation is employed since grid resolution of the discretized phase space is required to be fine enough in order to capture the nonlinear waves with higher harmonic modes. The primary goal is benchmarking our simulation with results obtained from another Vlasov code and verification with the nonlinear theories [R. L. Berger et al., Phys. Plasmas 20, 032107 (2013)]. The frequency shift of nonlinear plasma waves is investigated by applying an initial density perturbation or an external driver potential. It has been observed that the plasma frequency decreases for EPWs and increases for IAWs for Te /Ti = 10 , which agrees with Berger's simulation and theories. A further investigation varying the generation of the nonlinear wave such as driver amplitude and duration time will be performed and discussed. Supported by the U.S. Department of Energy Office of Science, Fusion Energy Sciences Program, Grant # DE-SC0001939, and the Air Force Research Laboratory Grant # F9550-09-1-0695.
Robust large-scale parallel nonlinear solvers for simulations.
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0
NASA Astrophysics Data System (ADS)
Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin
2016-04-01
This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
The IDES framework: A case study in development of a parallel discrete-event simulation system
Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.
1997-12-31
This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.
Parallel Adaptive Multi-Mechanics Simulations using Diablo
Parsons, D; Solberg, J
2004-12-03
Coupled multi-mechanics simulations (such as thermal-stress and fluidstructure interaction problems) are of substantial interest to engineering analysts. In addition, adaptive mesh refinement techniques present an attractive alternative to current mesh generation procedures and provide quantitative error bounds that can be used for model verification. This paper discusses spatially adaptive multi-mechanics implicit simulations using the Diablo computer code. (U)
A bidirectional data driven lisp engine for the direct execution of lisp in parallel
Yen, C.K.; Wong, W.F. )
1989-06-01
This article gives a brief introduction to BIDDLE. The authors' aim here is to argue that the basic principle of BIDDLE are quite straightforward and that they can, indeed, design an architecture to directly execute Lisp in parallel. As mentioned in the introduction, important and interesting issues like side effect handling, object storage, environment maintenance etc. are not dealt with in great enough details.
Cimlib: A Fully Parallel Application For Numerical Simulations Based On Components Assembly
NASA Astrophysics Data System (ADS)
Digonnet, Hugues; Silva, Luisa; Coupez, Thierry
2007-05-01
This paper presents CIMLIB with its two main characteristics: an Object Oriented Program and a fully parallel code. CIMLIB aims at providing a set of components that can be organized to build numerical simulation of a certain process. We describe two components: one treats the complex task of parallel remeshing, the other puts the focus on the Finite Element modeling. In a second part, we present some parallel performances and an example of a very large simulation (over a mesh of 25 millions nodes) that begins with the mesh generation and ends up writing results files, all done using 88 processors.
O( N) parallel tight binding molecular dynamics simulation of carbon nanotubes
NASA Astrophysics Data System (ADS)
Özdoğan, Cem; Dereli, Gülay; Çağın, Tahir
2002-10-01
We report an O( N) parallel tight binding molecular dynamics simulation study of (10×10) structured carbon nanotubes (CNT) at 300 K. We converted a sequential O( N3) TBMD simulation program into an O( N) parallel code, utilizing the concept of parallel virtual machines (PVM). The code is tested in a distributed memory system consisting of a cluster with 8 PC's that run under Linux (Slackware 2.2.13 kernel). Our results on the speed up, efficiency and system size are given.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
A parallel neural network simulator on the connection machine CM-5.
Reczko, M; Hatzigeorgiou, A; Mache, N; Zell, A; Suhai, S
1995-06-01
We here present a parallel implementation of artificial neural networks on the connection machine CM-5 and compare it with other parallel implementations on SIMD and MIMD architectures. This parallel implementation was developed with the goal of efficiently training large neural networks with huge training pattern sets for applications in molecular biology, in particular the prediction of coding regions in DNA sequences. The implementation uses training pattern parallelism and makes use of the parallel I/O facilities of the CM-5 and its efficient reduction operations available within the control network to achieve a high scalability. The parallel simulator obtains a maximum speed of 149.25 MCUPS for training feedforward networks with backpropagation on a 512 processor CM-5 system without using the CM-5 vector facility. The implementation poses no restriction on the type of network topology and works with different batch training algorithms like BP. Quickprop and Rprop.
NASA Astrophysics Data System (ADS)
Wu, Di M.; Zhao, S. S.; Lu, Jun Q.; Hu, Xin-Hua
2000-06-01
In Monte Carlo simulations of light propagating in biological tissues, photons propagating in the media are described as classic particles being scattered and absorbed randomly in the media, and their path are tracked individually. To obtain any statistically significant results, however, a large number of photons is needed in the simulations and the calculations are time consuming and sometime impossible with existing computing resource, especially when considering the inhomogeneous boundary conditions. To overcome this difficulty, we have implemented a parallel computing technique into our Monte Carlo simulations. And this moment is well justified due to the nature of the Monte Carlo simulation. Utilizing the PVM (Parallel Virtual Machine, a parallel computing software package), parallel codes in both C and Fortran have been developed on the massive parallel computer of Cray T3E and a local PC-network running Unix/Sun Solaris. Our results show that parallel computing can significantly reduce the running time and make efficient usage of low cost personal computers. In this report, we present a numerical study of light propagation in a slab phantom of skin tissue using the parallel computing technique.
Xyce parallel electronic simulator users' guide, Version 6.0.1.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.0.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Pelegant : a parallel accelerator simulation code for electron generation and tracking.
Wang, Y.; Borland, M. D.; Accelerator Systems Division
2006-01-01
elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.
Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers
Vashishta, Priya; Kalia, Rajiv
2005-02-24
Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.
A parallel implementation of the Cellular Potts Model for simulation of cell-based morphogenesis
Chen, Nan; Glazier, James A.; Izaguirre, Jesús A.; Alber, Mark S.
2007-01-01
The Cellular Potts Model (CPM) has been used in a wide variety of biological simulations. However, most current CPM implementations use a sequential modified Metropolis algorithm which restricts the size of simulations. In this paper we present a parallel CPM algorithm for simulations of morphogenesis, which includes cell–cell adhesion, a cell volume constraint, and cell haptotaxis. The algorithm uses appropriate data structures and checkerboard subgrids for parallelization. Communication and updating algorithms synchronize properties of cells simulated on different processor nodes. Tests show that the parallel algorithm has good scalability, permitting large-scale simulations of cell morphogenesis (107 or more cells) and broadening the scope of CPM applications. The new algorithm satisfies the balance condition, which is sufficient for convergence of the underlying Markov chain. PMID:18084624
Parallelization of a Molecular Dynamics Simulation of AN Ion-Surface Collision System:
NASA Astrophysics Data System (ADS)
Atiş, Murat; Özdoğan, Cem; Güvenç, Ziya B.
Parallel molecular dynamics simulation study of the ion-surface collision system is reported. A sequential molecular dynamics simulation program is converted into a parallel code utilizing the concept of parallel virtual machine (PVM). An effective and favorable algorithm is developed. Our parallelization of the algorithm shows that it is more efficient because of the optimal pair listing, linear scaling, and constant behavior of the internode communications. The code is tested in a distributed memory system consisting of a cluster of eight PCs that run under Linux (Debian 2.4.20 kernel). Our results on the collision system are discussed based on the speed up, efficiency and the system size. Furthermore, the code is used for a full simulation of the Ar-Ni(100) collision system and calculated physical quantities are presented.
Polar-direct-drive simulations and experiments
Marozas, J.A.; Marshall, F.J.; Craxton, R.S.; Igumenshchev, I.V.; Skupsky, S.; Bonino, M.J.; Collins, T.J.B.; Epstein, R.; Glebov, V.Yu.; Jacobs-Perkins, D.; Knauer, J.P.; McCrory, R.L.; McKenty, P.W.; Meyerhofer, D.D.; Noyes, S.G.; Radha, P.B.; Sangster, T.C.; Seka, W.; Smalyuk, V.A.
2006-05-15
Polar direct drive (PDD) [S. Skupsky et al., Phys. Plasmas 11, 2763 (2004)] will allow direct-drive ignition experiments on the National Ignition Facility (NIF) [J. Paisner et al., Laser Focus World 30, 75 (1994)] as it is configured for x-ray drive. Optimal drive uniformity is obtained via a combination of beam repointing, pulse shapes, spot shapes, and/or target design. This article describes progress in the development of standard and 'Saturn' [R. S. Craxton and D. W. Jacobs-Perkins, Phys. Rev. Lett. 94, 0952002 (2005)] PDD target designs. Initial evaluation of experiments on the OMEGA Laser System [T. R. Boehly et al., Rev. Sci. Instrum. 66, 508 (1995)] and simulations were carried out with the two-dimensional hydrodynamics code SAGE [R. S. Craxton et al., Phys. Plasmas 12, 056304 (2005)]. This article adds to this body of work by including fusion particle production and transport as well as radiation transport within the two-dimensional DRACO [P. B. Radha et al., Phys. Plasmas 12, 032702 (2005)] hydrodynamics simulations used to model experiments. Forty OMEGA beams arranged in six rings to emulate the NIF x-ray-drive configuration are used to perform direct-drive implosions of CH shells filled with D{sub 2} gas. Target performance was diagnosed with framed x-ray backlighting and by the measured fusion yield. Saturn target experiments have resulted in {approx}75% of the yield from energy-equivalent, symmetrically irradiated implosions. The results of the two-dimensional PDD simulations performed with DRACO are in good agreement with experimental x-ray radiographs. DRACO is being used to further optimize standard PDD designs. In addition, DRACO simulations of NIF-scale PDD designs show ignition with a gain of 20 and the development of a 40 {mu}m radius, 10 keV region with a neutron-averaged {rho}r of 1270 mg/cm{sup 2} near stagnation.
Direct simulation of rarefied hypersonic flows
NASA Technical Reports Server (NTRS)
Moss, James N.
1989-01-01
As the capability of the space transportation vehicles (STV's) expand to meet the requirements for future space exploration and utilization, the effects of rarefied hypersonic flows will play a more significant role in defining the aerodynamic and aerothermodynamic performance of STV's. This is particularly true of the low lift/drag aeroassisted STV's where aerobraking occurs at relatively high altitudes and high velocity. Because of the limitations of the continuum description as expressed by the Navier-Stokes equations and the difficulties of solving the Boltzmann equation, the particle of molecular approach has been developed over the last three decades for modeling rarefied gas effects. The direct simulation Monte Carlo (DSMC) method of Bird is the most used method today for simulating rarefied flows. The DSMC method provides a direct physical simulation as opposed to a numerical solution of a set of model equations. This is accomplished by developing phenomenological models of the relevant physical events. The DSMC method accounts for translational, thermal, chemical, and radiative nonequilibrium effects. The general features of the DSMC method, the numerical requirements for obtaining meaningful results, the modeling used to simulate high temperature gas effects, and applications of the method to calculate the flow about an aeroassist flight experiment vehicle (AFE) are reviewed. The AFE simulates a geosynchronous return while entering the Earth's upper atmosphere at approximately 10 km/s. Results obtained using a general 3-D code are presented for the more rarefied portion of the atmospheric encounter (altitudes of 200 to 100 km) emphasizing surface, flowfield, and aerodynamic characteristics of the AFE. Finally, results obtained using axisymmetric and 1-D versions of the code are presented for lower altitude conditions.
Vector and parallel Monte Carlo radiative heat transfer simulation
Burns, P.J. . Dept. of Mechanical Engineering); Pryor, D.V. )
1989-01-01
A fully vectorized version of a Monte Carlo algorithm of radiative heat transfer in two-dimensional geometries is presented. This algorithm differs from previous applications in that its capabilities are more extensive, with arbitrary numbers of surfaces, arbitrary numbers of material properties, and surface characteristics that include transmission, specular reflection, and diffuse reflection (all of which may be functions of the angle of incidence). The algorithm is applied to an irregular, experimental geometry and implemented on a Cyber 205. A speedup factor of approximately 16, for this combination of geometry and material properties, is achieved for the vector version over the scalar code. Issues related to the details of vectorization, including heavy use of bit addressability, the maintaining of long vector lengths, and gather/scatter use, are discussed. The parallel application of this algorithm is straightforward and is discussed in light of architectural differences among several current supercomputers.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
Virtual reality visualization of parallel molecular dynamics simulation
Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.
1995-12-31
When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.
Direct numerical simulation of reacting flows
NASA Technical Reports Server (NTRS)
Riley, J. J.; Metcalfe, R. W.
1984-01-01
The objectives of this work are: (1) to extend the technique of direct numerical simulations to turbulent, chemically reacting flows, (2) to test the validity of the method by comparing computational results with laboratory data, and (3) to use the simulations to gain a better understanding of the effects of turbulence on chemical reactions. The effects of both the large scale structure and the smaller scale turbulence on the overall reaction rates are addressed. The relationship between infinite reaction rate and finite reaction rate chemistry is compared with some of the results of calculations with existing theories and laboratory data. The direct numerical simulation method involves the numerical solution of the detailed evolution of the complex turbulent velocity and concentration fields. Using very efficient numerical methods (e.g., pseudospectral methods), the fully nonlinear (possibly low pass filtered) equations of motion are solved and no closure assumptions or turbulence models are used. Statistical data are obtained by performing spatial, temporal, and/or ensemble averages over the computed flow fields.
Object-oriented particle simulation on parallel computers
Reynders, J.V.W.; Forslund, D.W.; Hinker, P.J.; Tholburn, M.; Kilman, D.G.; Humphrey, W.F.
1994-04-01
A general purpose, object-oriented particle simulation (OOPS) library has been developed for use on a variety of system architectures with a uniform high-level interface. This includes the development of library implementations for the CM5, Intel Paragon, and CRI T3D. Codes written on any of these platforms can be ported to other platforms without modifications by utilizing the high-level library. The general character of the library allows application to such diverse areas as plasma physics, suspension flows, vortex simulations, porous media, and materials science.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
DC simulator of large-scale nonlinear systems for parallel processors
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
In this paper it is shown how the idea of the BBD decomposition of large-scale nonlinear systems can be implemented in a parallel DC circuit simulation algorithm. Usually, the BBD nonlinear circuits decomposition was used together with the multi-level Newton-Raphson iterative process. We propose the simulation consisting in the circuit decomposition and the process parallelization on the single level only. This block-parallel approach may give a considerable profit in simulation time though it is strongly dependent on the system topology and, of course, on the processor type. The paper presents the architecture of the decomposition-based algorithm, explains details of its implementation, including two steps of the one level bypassing techniques and discusses a construction of the dedicated benchmarks for this simulation software.
Parallel finite element simulation of mooring forces on floating objects
NASA Astrophysics Data System (ADS)
Aliabadi, S.; Abedi, J.; Zellars, B.
2003-03-01
The coupling between the equations governing the free-surface flows, the six degrees of freedom non-linear rigid body dynamics, the linear elasticity equations for mesh-moving and the cables has resulted in a fluid-structure interaction technology capable of simulating mooring forces on floating objects. The finite element solution strategy is based on a combination approach derived from fixed-mesh and moving-mesh techniques. Here, the free-surface flow simulations are based on the Navier-Stokes equations written for two incompressible fluids where the impact of one fluid on the other one is extremely small. An interface function with two distinct values is used to locate the position of the free-surface. The stabilized finite element formulations are written and integrated in an arbitrary Lagrangian-Eulerian domain. This allows us to handle the motion of the time dependent geometries. Forces and momentums exerted on the floating object by both water and hawsers are calculated and used to update the position of the floating object in time. In the mesh moving scheme, we assume that the computational domain is made of elastic materials. The linear elasticity equations are solved to obtain the displacements for each computational node. The non-linear rigid body dynamics equations are coupled with the governing equations of fluid flow and are solved simultaneously to update the position of the floating object. The numerical examples includes a 3D simulation of water waves impacting on a moored floating box and a model boat and simulation of floating object under water constrained with a cable.
NASA Astrophysics Data System (ADS)
Liu, Xiaofeng; Siddle-Mitchell, Seth
2015-11-01
This paper presents a novel pressure reconstruction method featuring rotating parallel ray omni-directional integration, as an improvement over the circular virtual boundary integration method introduced by Liu and Katz (2003, 2006, 2008 and 2013) for non-intrusive instantaneous pressure measurement in incompressible flow field. Unlike the virtual boundary omni-directional integration, where the integration path is originated from a virtual circular boundary at a finite distance from the real boundary of the integration domain, the new method utilizes parallel rays, which can be viewed as being originated from a distance of infinity, as guidance for integration paths. By rotating the parallel rays, omni-directional paths with equal weights coming from all directions toward the point of interest at any location within the computation domain will be generated. In this way, the location dependence of the integration weight inherent in the old algorithm will be eliminated. By implementing this new algorithm, the accuracy of the reconstructed pressure for a synthetic rotational flow in terms of r.m.s. error from theoretical values is reduced from 1.03% to 0.30%. Improvement is further demonstrated from the comparison of the reconstructed pressure with that from the Johns Hopkins University isotropic turbulence database (JHTDB). This project is funded by the San Diego State University.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm
NASA Technical Reports Server (NTRS)
Povitsky, A.
1998-01-01
In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
Implementation of a parallel algorithm for thermo-chemical nonequilibrium flow simulations
Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.
1995-01-01
Massively parallel (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to simulate a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale simulation with a large number of computer nodes. This paper will discuss the implementation of a massively parallel computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)
NASA Astrophysics Data System (ADS)
Kum, Oyeon
2004-11-01
Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Time-partitioning simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Milner, Edward J.; Blech, Richard A.; Chima, Rodrick V.
1987-01-01
A technique allowing time-staggered solution of partial differential equations is presented in this report. Using this technique, called time-partitioning, simulation execution speedup is proportional to the number of processors used because all processors operate simultaneously, with each updating of the solution grid at a different time point. The technique is limited by neither the number of processors available nor by the dimension of the solution grid. Time-partitioning was used to obtain the flow pattern through a cascade of airfoils, modeled by the Euler partial differential equations. An execution speedup factor of 1.77 was achieved using a two processor Cray X-MP/24 computer.
Direct Statistical Simulation of Geophysical Flows
NASA Astrophysics Data System (ADS)
Marston, Brad; Chini, Greg; Tobias, Steve
2015-11-01
Statistics of models of geophysical and astrophysical fluids may be directly accessed by solving the equations of motion for the statistics themselves as proposed by Lorenz nearly 50 years ago. Motivated by the desire to capture seamlessly multiscale physics we introduce a new approach to such Direct Statistical Simulation (DSS) based upon separating eddies by length scale. Discarding triads that involve only small-scale waves, the equations of motion generalize the quasi-linear approximation (GQL) and are able to accurately reproduce the low-order statistics of a stochastically-driven barotropic jet. Furthermore the two-point statistics of high wavenumber modes close and thus generalize second-order cumulant expansions (CE2) that employ zonal averaging. This GCE2 approach is tested on two-layer primitive equations. Comparison to statistics accumulated from numerical simulation finds GCE2 to be quantitatively accurate. DSS thus leads to new insight into important processes in geophysical and astrophysical flows. Supported in part by NSF DMR-1306806 and NSF CCF-1048701.
LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS
R. RYNE; J. QIANG
2000-08-01
In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.
Direct numerical simulation of turbulent reacting flows
Chen, J.H.
1993-12-01
The development of turbulent combustion models that reflect some of the most important characteristics of turbulent reacting flows requires knowledge about the behavior of key quantities in well defined combustion regimes. In turbulent flames, the coupling between the turbulence and the chemistry is so strong in certain regimes that is is very difficult to isolate the role played by one individual phenomenon. Direct numerical simulation (DNS) is an extremely useful tool to study in detail the turbulence-chemistry interactions in certain well defined regimes. Globally, non-premixed flames are controlled by two limiting cases: the fast chemistry limit, where the turbulent fluctuations. In between these two limits, finite-rate chemical effects are important and the turbulence interacts strongly with the chemical processes. This regime is important because industrial burners operate in regimes in which, locally the flame undergoes extinction, or is at least in some nonequilibrium condition. Furthermore, these nonequilibrium conditions strongly influence the production of pollutants. To quantify the finite-rate chemistry effect, direct numerical simulations are performed to study the interaction between an initially laminar non-premixed flame and a three-dimensional field of homogeneous isotropic decaying turbulence. Emphasis is placed on the dynamics of extinction and on transient effects on the fine scale mixing process. Differential molecular diffusion among species is also examined with this approach, both for nonreacting and reacting situations. To address the problem of large-scale mixing and to examine the effects of mean shear, efforts are underway to perform large eddy simulations of round three-dimensional jets.
Parallel-plate transmission line type of EMP simulators: Systematic review and recommendations
NASA Astrophysics Data System (ADS)
Giri, D. V.; Liu, T. K.; Tesche, F. M.; King, R. W. P.
1980-05-01
This report presents various aspects of the two-parallel-plate transmission line type of EMP simulator. Much of the work is the result of research efforts conducted during the last two decades at the Air Force Weapons Laboratory, and in industries/universities as well. The principal features of individual simulator components are discussed. The report also emphasizes that it is imperative to hybridize our understanding of individual components so that we can draw meaningful conclusions of simulator performance as a whole.
NASA Astrophysics Data System (ADS)
Hiramatsu, Takuya; Takahashi, Isao; Matsushima, Satoru; Usami, Noritaka
2016-09-01
We performed numerical calculations of temperature distributions in a furnace and clarified that a simple modification of heat insulators allows the realization of a complex temperature distribution for a parallel arrangement of adjacent dendrite crystals at the initial stage of the floating cast method. The temperature distribution included a unidirectional temperature gradient on the Si melt surface, which led to the preferential nucleation on one side of a square crucible. Numerical simulation was utilized to design crystal growth experiments, and we demonstrated the preferential formation of dendrite crystals on the expected side of the crucible.
NASA Astrophysics Data System (ADS)
Lee, J.; Kim, K.
A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.
NASA Technical Reports Server (NTRS)
Lee, J.; Kim, K.
1991-01-01
A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.
A parallel algorithm for transient solid dynamics simulations with contact detection
Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.
1996-06-01
Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations.
Monte Carlo simulations of converging laser beam propagating in turbid media with parallel computing
NASA Astrophysics Data System (ADS)
Wu, Di; Lu, Jun Q.; Hu, Xin H.; Zhao, S. S.
1999-11-01
Due to its flexibility and simplicity, Monte Carlo method is often used to study light propagation in turbid medium where the photons are treated like classic particles being scattered and absorbed randomly based on a radiative transfer theory. However, due to the need of large number of photons to produce statistically significance results, this type of calculations requires large computing resources. To overcome such difficulty, we implemented parallel computing technique into our Monte Carlo simulations. The algorithm is based on the fact that the classic particles are uncorrelated, and the trajectories of multiple photons can be tracked simultaneously. When a beam of focused light incident to the medium, the incident photons are divided into groups according to the available processes on a parallel machine and the calculations are carried out in parallel. Utilizing PVM (Parallel Virtual Machine, a parallel computing software), the parallel programs in both C and FORTRAN are developed on the massive parallel computer Cray T3E at the North Carolina Supercomputer Center and a local PC-cluster network running UNIX/Sun Solaris. The parallel performances of our codes have been excellent on both Cray T3E and the PC clusters. In this paper, we present results on a focusing laser beam propagating through a highly scattering and diluted solution of intralipid. The dependence of the spatial distribution of light near the focal point on the concentration of intralipid solution is studied and its significance is discussed.
Simulations of parallel and transverse remanences in textured nano-patterned thin film media
NASA Astrophysics Data System (ADS)
El-Hilo, M.
2010-05-01
In this work the effects of dipolar coupling on the distribution of effective orientations in textured nano-patterned media is simulated. The modelled films consist of 50×50 cobalt grains of uniform diameters ( D=20 nm) arranged in hexagonal (triangular) arrays. The grains easy axes are distributed according to a Gaussian texture function with a standard deviation of 30° about the texture direction. For different array separations ( d), the distribution of anisotropy orientations is extracted from the simulated parallel Mrp( β) and transverse Mrt( β) remanence curves where β is the angle by which the film is rotated. For the non-interacting case, predictions show that, Mrt( β)=d Mrp( β)/d β which is consistent with Shrikman and Treves theory whereas for the interacting case, Mrt( β) is deviated from d Mrp( β)/d β. The extracted distribution of effective orientations is found to become narrower as the array separation is decreased which is due to dipolar-induced texturing effects.
NASA Astrophysics Data System (ADS)
Cohen, Randi L.
There is both theoretical and observational evidence that giant planets collided with objects ≥ Mearth during their evolution. These impacts may play a key role in giant planet formation. This paper describes impacts of a ˜ Earth-mass object onto a suite of proto-giant-planets, as simulated using an SPH parallel tree code. We run 6 simulations, varying the impact angle and evolutionary stage of the proto-Jupiter. We find that it is possible for an impactor to free some mass from the core of the proto-planet it impacts through direct collision, as well as to make physical contact with the core yet escape partially, or even completely, intact. None of the 6 cases we consider produced a solid disk or resulted in a net decrease in the core mass of the pinto-planet (since the mass decrease due to disruption was outweighed by the increase due to the addition of the impactor's mass to the core). However, we suggest parameters which may have these effects, and thus decrease core mass and formation time in protoplanetary models and/or create satellite systems. We find that giant impacts can remove significant envelope mass from forming giant planets, leaving only 2 MEarth of gas, similar to Uranus and Neptune. They can also create compositional inhomogeneities in planetary cores, which creates differences in planetary thermal emission characteristics.
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Parallel Molecular Dynamics simulation: implementation of PVM for a lipid membrane
NASA Astrophysics Data System (ADS)
Fang, Zhiwu; Haymet, A. D. J.; Shinoda, Wataru; Okazaki, Susumu
1999-02-01
This paper describes a parallel algorithm for Molecular Dynamics simulation of a lipid membrane using the isothermal—isobaric ensemble. A message-passing paradigm is adopted for interprocessor communications using PVM3 (Parallel Virtual Machine). A data decomposition technique is employed for the parallelization of the calculation of intermolecular forces. The algorithm has been tested both on distributed memory architecture (DEC Alpha 500 workstation clusters) and shared memory architecture (SGI Powerchallenge with 20 R10000 processors) for a dipalmitoylphosphatidylcholine (DPPC) lipid bilayer consisting of 32 DPPC molecules and 928 water molecules. For each architecture, we measure the execution time with average work load, and the optimal number of processors for the current simulation. Some dynamical quantities are presented for a 2 ns simulation obtained with 5 processors on DEC Alpha 500 workstations. Our results show that the code is extremely efficient on 5-8 processors, and a useful addition to other major computational resources.
Simulation of reflooding on two parallel heated channel by TRACE
NASA Astrophysics Data System (ADS)
Zakir, Md. Ghulam
2016-07-01
In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis
NASA Technical Reports Server (NTRS)
Sawyer, Darren Charles
1994-01-01
The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
A conflict-free, path-level parallelization approach for sequential simulation algorithms
NASA Astrophysics Data System (ADS)
Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.
2015-07-01
Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.
A parallel finite volume algorithm for large-eddy simulation of turbulent flows
NASA Astrophysics Data System (ADS)
Bui, Trong Tri
1998-11-01
A parallel unstructured finite volume algorithm is developed for large-eddy simulation of compressible turbulent flows. Major components of the algorithm include piecewise linear least-square reconstruction of the unknown variables, trilinear finite element interpolation for the spatial coordinates, Roe flux difference splitting, and second-order MacCormack explicit time marching. The computer code is designed from the start to take full advantage of the additional computational capability provided by the current parallel computer systems. Parallel implementation is done using the message passing programming model and message passing libraries such as the Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The development of the numerical algorithm is presented in detail. The parallel strategy and issues regarding the implementation of a flow simulation code on the current generation of parallel machines are discussed. The results from parallel performance studies show that the algorithm is well suited for parallel computer systems that use the message passing programming model. Nearly perfect parallel speedup is obtained on MPP systems such as the Cray T3D and IBM SP2. Performance comparison with the older supercomputer systems such as the Cray YMP show that the simulations done on the parallel systems are approximately 10 to 30 times faster. The results of the accuracy and performance studies for the current algorithm are reported. To validate the flow simulation code, a number of Euler and Navier-Stokes simulations are done for internal duct flows. Inviscid Euler simulation of a very small amplitude acoustic wave interacting with a shock wave in a quasi-1D convergent-divergent nozzle shows that the algorithm is capable of simultaneously tracking the very small disturbances of the acoustic wave and capturing the shock wave. Navier-Stokes simulations are made for fully developed laminar flow in a square duct, developing laminar flow in a
High performance Python for direct numerical simulations of turbulent flows
NASA Astrophysics Data System (ADS)
Mortensen, Mikael; Langtangen, Hans Petter
2016-06-01
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
NASA Astrophysics Data System (ADS)
Abe, S.; Place, D.; Mora, P.
2001-12-01
The particle based lattice solid model has been used successfully as a virtual laboratory to simulate the dynamics of faults, earthquakes and gouge processes. The phenomena investigated with the lattice solid model range from the stick-slip behavior of faults, localization phenomena in gouge and the evolution of stress correlation in multi-fault systems, to the influence of rate and state-dependent friction laws on the macroscopic behavior of faults. However, the results from those simulations also show that in order to make a next step towards more realistic simulations it will be necessary to use three-dimensional models containing a large number of particles with a range of sizes, thus requiring a significantly increased amount of computing resources. Whereas the computing power provided by a single processor can be expected to double every 18 to 24 months, parallel computers which provide hundreds of times the computing power are available today and there are several efforts underway to construct dedicated parallel computers and associated simulation software systems for large-scale earth science simulation (e.g. The Australian Computational Earth Systems Simulator[1] and Japanese Earth Simulator[2])". In order to use the computing power made available by those large parallel computers, a parallel version of the lattice solid model has been implemented. In order to guarantee portability over a wide range of computer architectures, a message passing approach based on MPI has been used in the implementation. Particular care has been taken to eliminate serial bottlenecks in the program, thus ensuring high scalability on systems with a large number of CPUs. Measures taken to achieve this objective include the use of asynchronous communication between the parallel processes and the minimization of communication with and work done by a central ``master'' process. Benchmarks using models with up to 6 million particles on a parallel computer with 128 CPUs show that the
Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J.
1996-09-01
Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Parallelization of ALICE simulation - a jump through the looking-glass
NASA Astrophysics Data System (ADS)
Tadel, Matevž; Carminati, Federico
2010-04-01
HEP computing is approaching the end of an era when simulation parallelization could be performed simply by running one instance of full simulation per core. The increasing number of cores and appearance of hardware-thread support both pose a severe limitation on memory and memory-bandwidth available to each execution unit. Typical simulation and reconstruction jobs of AliROOT (offline framework of the ALICE experiment at LHC) do not differ significantly in memory usage - but the input/output rate of reconstruction is approximately three times higher. This makes simulation a more natural candidate for parallelization, especially since the simulation code is relatively stable while the reconstruction code is not expected to settle until the detector is fully calibrated with real data and understood under stable running conditions. We have chosen to use multi-threading solution with one primary particle and all its secondaries being tracked by a given thread. This model corresponds well to Pb-Pb ion collision simulation where 60,000 primary particles need to be transported. After the MC processing of a primary particle is completed, the same thread also performs output serialization. Modifications of ROOT, AliROOT and GEANT3 that were required to perform this task are discussed. Performance of the parallelized version of simulation under varying running conditions is presented.
Co-ordination of directional overcurrent protection with load current for parallel feeders
Wright, J.W.; Lloyd, G.; Hindle, P.J.
1999-11-01
Directional phase overcurrent relays are commonly applied at the receiving ends of parallel feeders or transformer feeders. Their purpose is to ensure full discrimination of main or back-up power system overcurrent protection for a fault near the receiving end of one feeder. This paper reviews this type of relay application and highlights load current setting constraints for directional protection. Such constraints have not previously been publicized in well-known text books. A directional relay current setting constraint that is suggested in some text books is based purely on thermal rating considerations for older technology relays. This constraint may not exist with modern numerical relays. In the absence of any apparent constraint, there is a temptation to adopt lower current settings with modern directional relays in relation to reverse load current at the receiving ends of parallel feeders. This paper identifies the danger of adopting very low current settings without any special relay feature to ensure protection security with load current during power system faults. A system incident recorded by numerical relays is also offered to highlight this danger. In cases where there is a need to infringe the identified constraints an implemented and testing relaying technique is proposed.
Efficient parallel algorithm for statistical ion track simulations in crystalline materials
NASA Astrophysics Data System (ADS)
Jeon, Byoungseon; Grønbech-Jensen, Niels
2009-02-01
We present an efficient parallel algorithm for statistical Molecular Dynamics simulations of ion tracks in solids. The method is based on the Rare Event Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has been successfully applied to studies of, e.g., ion implantation into crystalline semiconductor wafers. We discuss the strategies for parallelizing the method, and we settle on a host-client type polling scheme in which a multiple of asynchronous processors are continuously fed to the host, which, in turn, distributes the resulting feed-back information to the clients. This real-time feed-back consists of, e.g., cumulative damage information or statistics updates necessary for the cloning in the rare event algorithm. We finally demonstrate the algorithm for radiation effects in a nuclear oxide fuel, and we show the balanced parallel approach with high parallel efficiency in multiple processor configurations.
Pacheco, P; Miller, P; Kim, J; Leese, T; Zabiyaka, Y
2003-05-07
Object-oriented NeuroSys (ooNeuroSys) is a collection of programs for simulating very large networks of biologically accurate neurons on distributed memory parallel computers. It includes two principle programs: ooNeuroSys, a parallel program for solving the large systems of ordinary differential equations arising from the interconnected neurons, and Neurondiz, a parallel program for visualizing the results of ooNeuroSys. Both programs are designed to be run on clusters and use the MPI library to obtain parallelism. ooNeuroSys also includes an easy-to-use Python interface. This interface allows neuroscientists to quickly develop and test complex neuron models. Both ooNeuroSys and Neurondiz have a design that allows for both high performance and relative ease of maintenance.
NASA Astrophysics Data System (ADS)
Mizrah, E. A.; Tkachev, S. B.; Shtabel, N. V.
2015-10-01
Solar array simulators are nonlinear control systems designed to reproduce static and dynamic characteristics of solar array. Solar array characteristics depend on illumination, temperature, space environment and other causes. During on-earth testing of spacecraft power systems there is a problem reaching stable work of simulator with different impedance loads in wide range load regulation. In the article authors propose a research method for absolute process stability in solar array simulators and present results of absolute stability research for solar array simulator with continuous parallel type power amplifier.
Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation
NASA Technical Reports Server (NTRS)
Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.
2009-01-01
A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-01
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus. PMID:26575558
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-01
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Gedney, S.D.
1990-12-01
The Parallel-Plate Bounded-Wave EMP Simulator is typically used to test the vulnerability of electronic systems to the electromagnetic pulse (EMP) produced by a high altitude nuclear burst by subjecting the systems to a simulated EMP environment. However, when large test objects are placed within the simulator for investigation, the desired EMP environment may be affected by the interaction between the simulator and the test object. This simulator/obstacle interaction can be attributed to the following phenomena: (1) mutual coupling between the test object and the simulator, (2) fringing effects due to the finite width of the conducting plates of the simulator, and (3) multiple reflections between the object and the simulator's tapered end-sections. When the interaction is significant, the measurement of currents coupled into the system may not accurately represent those induced by an actual EMP. To better understand the problem of simulator/obstacle interaction, a dynamic analysis of the fields within the parallel-plate simulator is presented. The fields are computed using a moment method solution based on a wire mesh approximation of the conducting surfaces of the simulator. The fields within an empty simulator are found to be predominately transversse electromagnetic (TEM) for frequencies within the simulator's bandwidth, properly simulating the properties of the EMP propagating in free space. However, when a large test object is placed within the simulator, it is found that the currents induced on the object can be quite different from those on an object situated in free space. A comprehensive study of the mechanisms contributing to this deviation is presented.
Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo
2014-01-01
Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957
Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip
2010-01-01
We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that
Xyce parallel electronic simulator reference guide, Version 6.0.1.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Direct and Inverse Kinematics of a Novel Tip-Tilt-Piston Parallel Manipulator
NASA Technical Reports Server (NTRS)
Tahmasebi, Farhad
2004-01-01
Closed-form direct and inverse kinematics of a new three degree-of-freedom (DOF) parallel manipulator with inextensible limbs and base-mounted actuators are presented. The manipulator has higher resolution and precision than the existing three DOF mechanisms with extensible limbs. Since all of the manipulator actuators are base-mounted; higher payload capacity, smaller actuator sizes, and lower power dissipation can be obtained. The manipulator is suitable for alignment applications where only tip, tilt, and piston motions are significant. The direct kinematics of the manipulator is reduced to solving an eighth-degree polynomial in the square of tangent of half-angle between one of the limbs and the base plane. Hence, there are at most 16 assembly configurations for the manipulator. In addition, it is shown that the 16 solutions are eight pairs of reflected configurations with respect to the base plane. Numerical examples for the direct and inverse kinematics of the manipulator are also presented.
Imaging modes for direct electron detection in TEM with column parallel CCD
NASA Astrophysics Data System (ADS)
Moldovan, Grigore; Jeffery, Ben; Nomerotski, Andrei; Kirkland, Angus
2009-08-01
Electron imaging detectors have become the main limiting factor in transmission electron microscopy (TEM). A transition is now being made from indirect scintillator-coupled cameras to directly exposed detectors, which propose imaging modes that are novel in TEM. This work uses a dataset recoded with a directly exposed column parallel charge-coupled-device (CCD) to characterize modulation transfer and detective quantum efficiency of integrating, binary and counting imaging modes. Results presented here demonstrate that counting mode produces final images with largest contrast and highest efficiency because it takes into account the large lateral displacement of beam electrons in the detector. Counting imaging mode is recommended in TEM to take advantage from the higher sensitivity of directly exposed detectors.
Massively parallel simulation of flow and transport in variably saturated porous and fractured media
Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten
2002-01-15
This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.
On parallel random number generation for accelerating simulations of communication systems
NASA Astrophysics Data System (ADS)
Brugger, C.; Weithoffer, S.; de Schryver, C.; Wasenmüller, U.; Wehn, N.
2014-11-01
Powerful compute clusters and multi-core systems have become widely available in research and industry nowadays. This boost in utilizable computational power tempts people to run compute-intensive tasks on those clusters, either for speed or accuracy reasons. Especially Monte Carlo simulations with their inherent parallelism promise very high speedups. Nevertheless, the quality of Monte Carlo simulations strongly depends on the quality of the employed random numbers. In this work we present a comprehensive analysis of state-of-the-art pseudo random number generators like the MT19937 or the WELL generator used for parallel stream generation in different settings. These random number generators can be realized in hardware as well as in software and help to accelerate the analysis (or simulation) of communications systems. We show that it is possible to generate high-quality parallel random number streams with both generators, as long as some configuration constraints are met. We furthermore depict that distributed simulations with those generator types are viable even to very high degrees of parallelism.
Accelerating Markov chain Monte Carlo simulation through sequential updating and parallel computing
NASA Astrophysics Data System (ADS)
Ren, Ruichao
Monte Carlo simulation is a statistical sampling method used in studies of physical systems with properties that cannot be easily obtained analytically. The phase behavior of the Restricted Primitive Model of electrolyte solutions on the simple cubic lattice is studied using grand canonical Monte Carlo simulations and finite-size scaling techniques. The transition between disordered and ordered, NaCl-like structures is continuous, second-order at high temperatures and discrete, first-order at low temperatures. The line of continuous transitions meets the line of first-order transitions at a tricritical point. A new algorithm-Random Skipping Sequential (RSS) Monte Carl---is proposed, justified and shown analytically to have better mobility over the phase space than the conventional Metropolis algorithm satisfying strict detailed balance. The new algorithm employs sequential updating, and yields greatly enhanced sampling statistics than the Metropolis algorithm with random updating. A parallel version of Markov chain theory is introduced and applied in accelerating Monte Carlo simulation via cluster computing. It is shown that sequential updating is the key to reduce the inter-processor communication or synchronization which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time by the new method for systems of large and moderate sizes.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can
Electro-optic directed XOR logic circuits based on parallel-cascaded micro-ring resonators.
Tian, Yonghui; Zhao, Yongpeng; Chen, Wenjie; Guo, Anqi; Li, Dezhao; Zhao, Guolin; Liu, Zilong; Xiao, Huifu; Liu, Guipeng; Yang, Jianhong
2015-10-01
We report an electro-optic photonic integrated circuit which can perform the exclusive (XOR) logic operation based on two silicon parallel-cascaded microring resonators (MRRs) fabricated on the silicon-on-insulator (SOI) platform. PIN diodes embedded around MRRs are employed to achieve the carrier injection modulation. Two electrical pulse sequences regarded as two operands of operations are applied to PIN diodes to modulate two MRRs through the free carrier dispersion effect. The final operation result of two operands is output at the Output port in the form of light. The scattering matrix method is employed to establish numerical model of the device, and numerical simulator SG-framework is used to simulate the electrical characteristics of the PIN diodes. XOR operation with the speed of 100Mbps is demonstrated successfully.
Tan, Jiubin; Shan, Mingguang; Zhao, Chenguang; Liu, Jian
2008-04-01
Diffractive microlens arrays with continuous relief are designed, fabricated, and characterized by using Fermat's principle to create an array of spots on the photoresist-coated surface of a substrate for parallel laser direct writing. Experimental results indicate that a diffraction efficiency of 71.4% and a spot size of 1.97 microm (FWHM) can be achieved at normal incidence and a writing laser wavelength of 441.6 nm with an array of F/4 fabricated on fused silica, and the developed array can be used to improve the utilization ratio of writing laser energy. PMID:18382568
Berlicka, Anna; Białek, Michał J; Latos-Grażyński, Lechosław
2016-09-01
In the search of porphyrin arrays with a unique geometry, the efficient synthesis of a directly linked 21-carba-23-thiaporphyrin dimer with the distinctive dihydrofulvalene bridging motif has been developed. This compound acquires an uncommon parallel-displaced arrangement of two carbaporphyrin planes. The dimer undergoes an acid-triggered cleavage to create of the asymmetric carbathiaporphyrin-carbathiachlorin dyad or 2,3-dihalo-21-carba-23-thiachlorin depending on choice of acid. A formation of a reactive carbocation intermediate is postulated to account for mechanism of cleavage. PMID:27530897
Direct numerical simulation of active fiber composite
NASA Astrophysics Data System (ADS)
Kim, Seung J.; Hwang, Joon S.; Paik, Seung H.
2003-08-01
Active Fiber Composites (AFC) possess desirable characteristics for smart structure applications. One major advantage of AFC is the ability to create anisotropic laminate layers useful in applications requiring off-axis or twisting motions. AFC is naturally composed of two different constituents: piezoelectric fiber and matrix. Therefore, homogenization method, which is utilized in the analysis of laminated composite material, has been used to characterize the material properties. Using this approach, the global behaviors of the structures are predicted in an averaged sense. However, this approach has intrinsic limitations in describing the local behaviors in the level of the constituents. Actually, the failure analysis of AFC requires the knowledge of the local behaviors. Therefore, microscopic approach is necessary to predict the behaviors of AFC. In this work, a microscopic approach for the analysis of AFC was performed. Piezoelectric fiber and matrix were modeled separately and finite element method using three-dimensional solid elements was utilized. Because fine mesh is essential, high performance computing technology was applied to the solution of the immense degree-of-freedom problem. This approach is called Direct Numerical Simulation (DNS) of structure. Through the DNS of AFC, local stress distribution around the interface of fiber and matrix was analyzed.
Direct Numerical Simulation of Cell Printing
NASA Astrophysics Data System (ADS)
Qiao, Rui; He, Ping
2010-11-01
Structural cell printing, i.e., printing three dimensional (3D) structures of cells held in a tissue matrix, is gaining significant attention in the biomedical community. The key idea is to use desktop printer or similar devices to print cells into 3D patterns with a resolution comparable to the size of mammalian cells, similar to that in living organs. Achieving such a resolution in vitro can lead to breakthroughs in areas such as organ transplantation and understanding of cell-cell interactions in truly 3D spaces. Although the feasibility of cell printing has been demonstrated in the recent years, the printing resolution and cell viability remain to be improved. In this work, we investigate one of the unit operations in cell printing, namely, the impact of a cell-laden droplet into a pool of highly viscous liquids using direct numerical simulations. The dynamics of droplet impact (e.g., crater formation and droplet spreading and penetration) and the evolution of cell shape and internal stress are quantified in details.
Domel, N.D.; Thompson, D.S. )
1991-01-01
The effect of shock impingement on the mixing and combustion of a reacting shear-layer is numerically simulated. Hydrogen fuel is injected at sonic velocity behind a backward facing step in a direction parallel to a supersonic freestream vitiated with H{sub 2}O. The two-dimensional Navier-Stokes equations are solved and explicitly coupled to a chemistry package employing a global, two-step combustion model. The results show that shock impingement enhances the mixing and combustion. 17 refs.
Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP
NASA Astrophysics Data System (ADS)
Fan, Houfu; Li, Shaofan
2016-06-01
In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.
Three-dimensional direct particle simulation on the Connection Machine
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1991-01-01
This paper presents the algorithms necessary for an efficient data parallel implementation of a 3D particle simulation. In particular, a general master/slave algorithm and a fast sorting algorithm are described and the use of these algorithms in a particle simulation is outlined. A particle simulation using these algorithms has been implemented on a 32768 processor Connection Machine that is capable of simulating over 30 million particles at an average rate of 2.4-microsec/particle/step. Results are presented from the simulation of flow over an Aeroassisted Flight Experiment geometry at 100 km altitude.
NASA Astrophysics Data System (ADS)
Stupl, J.; Faber, N.; Foster, C.; Yang, F.; Nelson, B.; Aziz, J.; Nuttall, A.; Henze, C.; Levit, C.
2014-09-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has proven that a few ground-based systems consisting of 10 kW class lasers directed by 1.5 m telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present both our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.
Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code
NASA Technical Reports Server (NTRS)
Yamakov, Vesselin I.
2016-01-01
This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.
NASA Astrophysics Data System (ADS)
Wang, Shyh-Wei; Guo, Shuang-Fa
1998-07-01
A stepwise Boltzmann transport equation (BTE) simulation using non-uniform energy grid momentum matrix and exact nuclear scattering cross-section is successfully parallelized to simulate the ion implantation of multi-component targets. Assuming that the interactions of ion with different target atoms are independent, the scattering of ions with different components can be calculated concurrently by different processors. It is developed on CONVEX SPP-1000 and the software environment of parallel virtual machine (PVM) with a master-slave paradigm. A speedup of 3.3 has been obtained for the simulation of As ions implanted into AZ1350 (C6.2H6O1N0.15S0.06) which is composed of five components. In addition, our new scheme gives better agreement with the experimental results for heavy ion implantation than the conventional method using a uniform energy grid and approximated scattering function.
Parallel Simulation Algorithms for the Three Dimensional Strong-Strong Beam-Beam Interaction
Kabel, A.C.; /SLAC
2008-03-17
The strong-strong beam-beam effect is one of the most important effects limiting the luminosity of ring colliders. Little is known about it analytically, so most studies utilize numeric simulations. The two-dimensional realm is readily accessible to workstation-class computers (cf.,e.g.,[1, 2]), while three dimensions, which add effects such as phase averaging and the hourglass effect, require vastly higher amounts of CPU time. Thus, parallelization of three-dimensional simulation techniques is imperative; in the following we discuss parallelization strategies and describe the algorithms used in our simulation code, which will reach almost linear scaling of performance vs. number of CPUs for typical setups.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
Parallel simulations of Grover's algorithm for closest match search in neutron monitor data
NASA Astrophysics Data System (ADS)
Kussainov, Arman; White, Yelena
We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.
Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique
Kanarska, Y
2010-03-24
Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical simulation of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle
A method for data handling numerical results in parallel OpenFOAM simulations
Anton, Alin; Muntean, Sebastian
2015-12-31
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
NASA Technical Reports Server (NTRS)
Lyons, Daniel T.; Desai, Prasun N.
2005-01-01
This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Application of parallel computing techniques to a large-scale reservoir simulation
Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten
2001-02-01
Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.
SPILADY: A parallel CPU and GPU code for spin-lattice magnetic molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Ma, Pui-Wai; Dudarev, S. L.; Woo, C. H.
2016-10-01
Spin-lattice dynamics generalizes molecular dynamics to magnetic materials, where dynamic variables describing an evolving atomic system include not only coordinates and velocities of atoms but also directions and magnitudes of atomic magnetic moments (spins). Spin-lattice dynamics simulates the collective time evolution of spins and atoms, taking into account the effect of non-collinear magnetism on interatomic forces. Applications of the method include atomistic models for defects, dislocations and surfaces in magnetic materials, thermally activated diffusion of defects, magnetic phase transitions, and various magnetic and lattice relaxation phenomena. Spin-lattice dynamics retains all the capabilities of molecular dynamics, adding to them the treatment of non-collinear magnetic degrees of freedom. The spin-lattice dynamics time integration algorithm uses symplectic Suzuki-Trotter decomposition of atomic coordinate, velocity and spin evolution operators, and delivers highly accurate numerical solutions of dynamic evolution equations over extended intervals of time. The code is parallelized in coordinate and spin spaces, and is written in OpenMP C/C++ for CPU and in CUDA C/C++ for Nvidia GPU implementations. Temperatures of atoms and spins are controlled by Langevin thermostats. Conduction electrons are treated by coupling the discrete spin-lattice dynamics equations for atoms and spins to the heat transfer equation for the electrons. Worked examples include simulations of thermalization of ferromagnetic bcc iron, the dynamics of laser pulse demagnetization, and collision cascades. Catalogue identifier: AFAN_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFAN_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Apache License, Version 2.0 No. of lines in distributed program, including test data, etc.: 1611165 No. of bytes in distributed program, including test data, etc.: 367246683
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.; Mills, R. T.
2014-01-01
To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.
2001-08-31
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.
NASA Astrophysics Data System (ADS)
Jaure, S.; Duchaine, F.; Staffelbach, G.; Gicquel, L. Y. M.
2013-01-01
Optimizing gas turbines is a complex multi-physical and multi-component problem that has long been based on expensive experiments. Today, computer simulation can reduce design process costs and is acknowledged as a promising path for optimization. However, performing such computations using high-fidelity methods such as a large eddy simulation (LES) on gas turbines is challenging. Nevertheless, such simulations become accessible for specific components of gas turbines. These stand-alone simulations face a new challenge: to improve the quality of the results, new physics must be introduced. Therefore, an efficient massively parallel coupling methodology is investigated. The flow solver modeling relies on the LES code AVBP which has already been ported on massively parallel architectures. The conduction solver is based on the same data structure and thus shares its scalability. Accurately coupling these solvers while maintaining their scalability is challenging and is the actual objective of this work. To obtain such goals, a methodology is proposed and different key issues to code the coupling are addressed: convergence, stability, parallel geometry mapping, transfers and interpolation. This methodology is then applied to a real burner configuration, hence demonstrating the possibilities and limitations of the solution.
Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C
2010-12-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Long-time atomistic simulations with the Parallel Replica Dynamics method
NASA Astrophysics Data System (ADS)
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Application of Parallel Discrete Event Simulation to the Space Surveillance Network
NASA Astrophysics Data System (ADS)
Jefferson, D.; Leek, J.
2010-09-01
In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to
Simulation and Gaming: Directions, Issues, Ponderables.
ERIC Educational Resources Information Center
Uretsky, Michael
1995-01-01
Discusses the current use of simulation and gaming in a variety of settings. Describes advances in technology that facilitate the use of simulation and gaming, including computer power, computer networks, software, object-oriented programming, video, multimedia, virtual reality, and artificial intelligence. Considers the future use of simulation…
Numerical simulations of blast-impact problems using the direct simulation Monte Carlo method
NASA Astrophysics Data System (ADS)
Sharma, Anupam
There is an increasing need to design protective structures that can withstand or mitigate the impulsive loading due to the impact of a blast or a shock wave. A preliminary step in designing such structures is the prediction of the pressure loading on the structure. This is called the "load definition." This thesis is focused on a numerical approach to predict the load definition on arbitrary geometries for a given strength of the incident blast/shock wave. A particle approach, namely the Direct Simulation Monte Carlo (DSMC) method, is used as the numerical model. A three-dimensional, time-accurate DSMC flow solver is developed as a part of this study. Embedded surfaces, modeled as triangulations, are used to represent arbitrary-shaped structures. Several techniques to improve the computational efficiency of the algorithm of particle-structure interaction are presented. The code is designed using the Object Oriented Programming (OOP) paradigm. Domain decomposition with message passing is used to solve large problems in parallel. The solver is extensively validated against analytical results and against experiments. Two kinds of geometries, a box and an I-shaped beam are investigated for blast impact. These simulations are performed in both two- and three-dimensions. A major portion of the thesis is dedicated to studying the uncoupled fluid dynamics problem where the structure is assumed to remain stationary and intact during the simulation. A coupled, fluid-structure dynamics problem is solved in one spatial dimension using a simple, spring-mass-damper system to model the dynamics of the structure. A parametric study, by varying the mass, spring constant, and the damping coefficient, to study their effect on the loading and the displacement of the structure is also performed. Finally, the parallel performance of the solver is reported for three sample-size problems on two Beowulf clusters.
Wendel, D. E.; Olson, D. K.; Hesse, M.; Kuznetsova, M.; Adrian, M. L.; Aunai, N.; Karimabadi, H.; Daughton, W.
2013-12-15
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first–order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
Mahinthakumar, G.; Saied, F.; Valocchi, A.J.
1997-03-01
Some popular iterative solvers for non-symmetric systems arising from the finite-element discretization of three-dimensional groundwater contaminant transport problem are implemented and compared on distributed memory parallel platforms. This paper attempts to determine which solvers are most suitable for the contaminant transport problem under varied conditions for large scale simulations on distributed parallel platforms. The original parallel implementation was targeted for the 1024 node Intel paragon platform using explicit message passing with the NX library. This code was then ported to SGI Power Challenge Array, Convex Exemplar, and Origin 2000 machines using an MPI implementation. The performance of these solvers is studied for increasing problem size, roughness of the coefficients, and selected problem scenarios. These conditions affect the properties of the matrix and hence the difficulty level of the solution process. Performance is analyzed in terms of convergence behavior, overall time, parallel efficiency, and scalability. The solvers that are presented are BiCGSTAB, GMRES, ORTHOMIN, and CGS. A simple diagonal preconditioner is used in this parallel implementation for all the methods. The results indicate that all methods are comparable in performance with BiCGSTAB slightly outperforming the other methods for most problems. The authors achieved very good scalability in all the methods up to 1024 processors of the Intel Paragon XPS/150. They demonstrate scalability by solving 100 time steps of a 40 million element problem in about 5 minutes using either BiCGSTAB or GMRES.
Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas
Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.
2006-07-15
The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.
A scalable parallel algorithm for large-scale reactive force-field molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Nomura, Ken-ichi; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya
2008-01-01
A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles-based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions ( n⩽4 explicitly and ⩽6 due to chain-rule differentiation). These n-tuple computations are made modular, so that they can be reconfigured effectively with a multiple time-step integrator to further reduce the computation time. Atomic charges are updated dynamically with an electronegativity equalization method, by iteratively minimizing the electrostatic energy with the charge-neutrality constraint. The ReaxFF-MD simulation algorithm has been implemented on parallel computers based on a spatial decomposition scheme combined with distributed n-tuple data structures. The measured parallel efficiency of the parallel ReaxFF-MD algorithm is 0.998 on 131,072 IBM BlueGene/L processors for a 1.01 billion-atom RDX system.
Direct simulation Monte Carlo simulations of aerodynamic effects on sounding rockets
NASA Astrophysics Data System (ADS)
Allen, Jeffrey B.
Over the past several decades, atomic oxygen (AO) measurements taken from sounding rocket sensor payloads in the Mesosphere and lower Thermosphere (MALT) have shown marked variability. AO data retrieved from the second Coupling of Dynamics and Aurora (CODA II) experiment has shown that the data is highly dependent upon rocket orientation. Many sounding rocket payloads, including CODA II, contain AO sensors that are located in close proximity to the payload surface and are thus significantly influenced by compressible, aerodynamic effects. In addition, other external effects such as Doppler shift and the contamination of sensor optics from desorption may play a significant role. These effects serve to inhibit the AO sensors' ability to accurately determine undisturbed atmospheric conditions. The present research numerically models the influence caused by these effects (primarily aerodynamic), using the direct simulation Monte Carlo (DSMC) method. In particular, a new parallel, steady/unsteady, three-dimensional, DSMC solver, foamDSMC, is developed. The method of development and validation of this new solver is presented with comparisons made with available commercial solvers. The foamDSMC solver is then used to simulate the steady and unsteady flow-field of CODA II, with steady-state simulations conducted along 2 km intervals and unsteady simulations conducted near apogee. The results based on the compressible flow aerodynamics as well as Doppler shift and contamination effects are all examined, and are used to create correction functions based on the ratio of undisturbed to disturbed flowfield concentrations. The numerical simulations verify the experimental results showing the strong influence of rocket orientation on concentration, and show conclusive evidence pointing to the success of the correction functions to significantly minimize the external effects previously mentioned. In addition to the correction function approach, the optimal placement of the AO
Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems
D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan
2014-01-01
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716
Treveaven, P.
1989-01-01
This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.
NASA Astrophysics Data System (ADS)
Paćko, P.; Bielak, T.; Spencer, A. B.; Staszewski, W. J.; Uhl, T.; Worden, K.
2012-07-01
This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering.
NASA Astrophysics Data System (ADS)
Hepburn, I.; Chen, W.; De Schutter, E.
2016-08-01
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.
Hepburn, I; Chen, W; De Schutter, E
2016-08-01
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification. PMID:27497550
Adaptive finite element simulation of flow and transport applications on parallel computers
NASA Astrophysics Data System (ADS)
Kirk, Benjamin Shelton
The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the
Hepburn, I; Chen, W; De Schutter, E
2016-08-01
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.
NASA Astrophysics Data System (ADS)
Maynes, D.; Jeffs, K.; Woolford, B.; Webb, B. W.
2007-09-01
This paper reports results of an analytical and experimental investigation of the laminar flow in a parallel-plate microchannel with ultrahydrophobic top and bottom walls. The walls are fabricated with microribs and cavities that are oriented parallel to the flow direction. The channel walls are modeled in an idealized fashion, with the shape of the liquid-vapor meniscus approximated as flat. An analytical model of the vapor cavity flow is employed and coupled with a numerical model of the liquid flow by matching the local liquid and vapor phase velocity and shear stress at the interface. The numerical predictions show that the effective slip length and the reduction in the classical friction factor-Reynolds number product increase with increasing relative cavity width, increasing relative cavity depth, and decreasing relative microrib/cavity module length. Comparisons were also made between the zero shear interface model and the liquid-vapor cavity coupled model. The results illustrate that the zero shear interface model underpredicts the overall flow resistance. Further, the deviation between the two models was found to be significantly larger for increasing values of both the relative rib/cavity module width and the cavity fraction. The trends in the frictional pressure drop predictions are in good agreement with experimental measurements made at similar conditions, with greater deviation observed at increasing size of the cavity fraction. Based on the numerical predictions, an expression is proposed in which the friction factor-Reynolds number product may be estimated in terms of the important variables.
Construction of a parallel processor for simulating manipulators and other mechanical systems
NASA Technical Reports Server (NTRS)
Hannauer, George
1991-01-01
This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Large eddy simulations and direct numerical simulations of high speed turbulent reacting flows
NASA Technical Reports Server (NTRS)
Givi, P.; Frankel, S. H.; Adumitroaie, V.; Sabini, G.; Madnia, C. K.
1993-01-01
The primary objective of this research is to extend current capabilities of Large Eddy Simulations (LES) and Direct Numerical Simulations (DNS) for the computational analyses of high speed reacting flows. Our efforts in the first two years of this research have been concentrated on a priori investigations of single-point Probability Density Function (PDF) methods for providing subgrid closures in reacting turbulent flows. In the efforts initiated in the third year, our primary focus has been on performing actual LES by means of PDF methods. The approach is based on assumed PDF methods and we have performed extensive analysis of turbulent reacting flows by means of LES. This includes simulations of both three-dimensional (3D) isotropic compressible flows and two-dimensional reacting planar mixing layers. In addition to these LES analyses, some work is in progress to assess the extent of validity of our assumed PDF methods. This assessment is done by making detailed companions with recent laboratory data in predicting the rate of reactant conversion in parallel reacting shear flows. This report provides a summary of our achievements for the first six months of the third year of this program.
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1998-01-01
The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.
NASA Astrophysics Data System (ADS)
Reuter, K.; Jenko, F.; Forest, C. B.; Bayliss, R. A.
2008-08-01
A parallel implementation of a nonlinear pseudo-spectral MHD code for the simulation of turbulent dynamos in spherical geometry is reported. It employs a dual domain decomposition technique in both real and spectral space. It is shown that this method shows nearly ideal scaling going up to 128 CPUs on Beowulf-type clusters with fast interconnect. Furthermore, the potential of exploiting single precision arithmetic on standard x86 processors is examined. It is pointed out that the MHD code thereby achieves a maximum speedup of 1.7, whereas the validity of the computations is still granted. The combination of both measures will allow for the direct numerical simulation of highly turbulent cases ( 1500
Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations
NASA Technical Reports Server (NTRS)
Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.
2013-01-01
Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.
NASA Astrophysics Data System (ADS)
Lee, Nicholas Jabari Ouma
Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more
Holkundkar, Amol R.
2013-11-15
The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.
NASA Astrophysics Data System (ADS)
Honkonen, I.
2015-03-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.
NASA Astrophysics Data System (ADS)
Ovaysi, S.; Piri, M.
2009-12-01
We present a three-dimensional fully dynamic parallel particle-based model for direct pore-level simulation of incompressible viscous fluid flow in disordered porous media. The model was developed from scratch and is capable of simulating flow directly in three-dimensional high-resolution microtomography images of naturally occurring or man-made porous systems. It reads the images as input where the position of the solid walls are given. The entire medium, i.e., solid and fluid, is then discretized using particles. The model is based on Moving Particle Semi-implicit (MPS) technique. We modify this technique in order to improve its stability. The model handles highly irregular fluid-solid boundaries effectively. It takes into account viscous pressure drop in addition to the gravity forces. It conserves mass and can automatically detect any false connectivity with fluid particles in the neighboring pores and throats. It includes a sophisticated algorithm to automatically split and merge particles to maintain hydraulic connectivity of extremely narrow conduits. Furthermore, it uses novel methods to handle particle inconsistencies and open boundaries. To handle the computational load, we present a fully parallel version of the model that runs on distributed memory computer clusters and exhibits excellent scalability. The model is used to simulate unsteady-state flow problems under different conditions starting from straight noncircular capillary tubes with different cross-sectional shapes, i.e., circular/elliptical, square/rectangular and triangular cross-sections. We compare the predicted dimensionless hydraulic conductances with the data available in the literature and observe an excellent agreement. We then test the scalability of our parallel model with two samples of an artificial sandstone, samples A and B, with different volumes and different distributions (non-uniform and uniform) of solid particles among the processors. An excellent linear scalability is
NASA Technical Reports Server (NTRS)
Zhang, Meng; Maxworthy, Tony
1999-01-01
It has long been recognized that flow in the melt can have a profound influence on the dynamics of a solidifying interface and hence the quality of the solid material. In particular, flow affects the heat and mass transfer, and causes spatial and temporal variations in the flow and melt composition. This results in a crystal with nonuniform physical properties. Flow can be generated by buoyancy, expansion or contraction upon phase change, and thermo-soluto capillary effects. In general, these flows can not be avoided and can have an adverse effect on the stability of the crystal structures. This motivates crystal growth experiments in a microgravity environment, where buoyancy-driven convection is significantly suppressed. However, transient accelerations (g-jitter) caused by the acceleration of the spacecraft can affect the melt, while convection generated from the effects other than buoyancy remain important. Rather than bemoan the presence of convection as a source of interfacial instability, Hurle in the 1960s suggested that flow in the melt, either forced or natural convection, might be used to stabilize the interface. Delves considered the imposition of both a parabolic velocity profile and a Blasius boundary layer flow over the interface. He concluded that fast stirring could stabilize the interface to perturbations whose wave vector is in the direction of the fluid velocity. Forth and Wheeler considered the effect of the asymptotic suction boundary layer profile. They showed that the effect of the shear flow was to generate travelling waves parallel to the flow with a speed proportional to the Reynolds number. There have been few quantitative, experimental works reporting on the coupling effect of fluid flow and morphological instabilities. Huang studied plane Couette flow over cells and dendrites. It was found that this flow could greatly enhance the planar stability and even induce the cell-planar transition. A rotating impeller was buried inside the
Massively parallel Monte Carlo for many-particle simulations on GPUs
Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.
2013-12-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.
Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures
Sampath, Rahul S; Veerapaneni, Shravan; Biros, George; Zorin, Denis; Vuduc, Richard; Vetter, Jeffrey S; Moon, Logan; Malhotra, Dhairya; Shringarpure, Aashay; Rahimian, Abtin; Lashuk, Ilya; Chandramowlishwaran, Aparna
2010-01-01
We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 260 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for highly non-uniform spatial distributions of RBCs. The new method has been implemented in the software library MOBO (for 'Moving Boundaries'). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.
Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations
Perumalla, Kalyan S
2009-01-01
The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.
A novel parallel-rotation algorithm for atomistic Monte Carlo simulation of dense polymer systems
NASA Astrophysics Data System (ADS)
Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.
2001-06-01
We develop and test a new elementary Monte Carlo move for use in the off-lattice simulation of polymer systems. This novel Parallel-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The parallel-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo simulation. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo simulation of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice simulation of realistic models of dense polymer systems.
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Large-scale parallel arrays of silicon nanowires via block copolymer directed self-assembly.
Farrell, Richard A; Kinahan, Niall T; Hansel, Stefan; Stuen, Karl O; Petkov, Nikolay; Shaw, Matthew T; West, Laetitia E; Djara, Vladimir; Dunne, Robert J; Varona, Olga G; Gleeson, Peter G; Jung, Soon-Jung; Kim, Hye-Young; Koleśnik, Maria M; Lutz, Tarek; Murray, Christopher P; Holmes, Justin D; Nealey, Paul F; Duesberg, Georg S; Krstić, Vojislav; Morris, Michael A
2012-05-21
Extending the resolution and spatial proximity of lithographic patterning below critical dimensions of 20 nm remains a key challenge with very-large-scale integration, especially if the persistent scaling of silicon electronic devices is sustained. One approach, which relies upon the directed self-assembly of block copolymers by chemical-epitaxy, is capable of achieving high density 1 : 1 patterning with critical dimensions approaching 5 nm. Herein, we outline an integration-favourable strategy for fabricating high areal density arrays of aligned silicon nanowires by directed self-assembly of a PS-b-PMMA block copolymer nanopatterns with a L(0) (pitch) of 42 nm, on chemically pre-patterned surfaces. Parallel arrays (5 × 10(6) wires per cm) of uni-directional and isolated silicon nanowires on insulator substrates with critical dimension ranging from 15 to 19 nm were fabricated by using precision plasma etch processes; with each stage monitored by electron microscopy. This step-by-step approach provides detailed information on interfacial oxide formation at the device silicon layer, the polystyrene profile during plasma etching, final critical dimension uniformity and line edge roughness variation nanowire during processing. The resulting silicon-nanowire array devices exhibit Schottky-type behaviour and a clear field-effect. The measured values for resistivity and specific contact resistance were ((2.6 ± 1.2) × 10(5)Ωcm) and ((240 ± 80) Ωcm(2)) respectively. These values are typical for intrinsic (un-doped) silicon when contacted by high work function metal albeit counterintuitive as the resistivity of the starting wafer (∼10 Ωcm) is 4 orders of magnitude lower. In essence, the nanowires are so small and consist of so few atoms, that statistically, at the original doping level each nanowire contains less than a single dopant atom and consequently exhibits the electrical behaviour of the un-doped host material. Moreover this indicates that the processing
Future Directions in Simulating Solar Geoengineering
Kravitz, Benjamin S.; Robock, Alan; Boucher, Olivier
2014-08-05
Solar geoengineering is a proposed set of technologies to temporarily alleviate some of the consequences of anthropogenic greenhouse gas emissions. The Geoengineering Model Intercomparison Project (GeoMIP) created a framework of geoengineering simulations in climate models that have been performed by modeling centers throughout the world (B. Kravitz et al., The Geoengineering Model Intercomparison Project (GeoMIP), Atmospheric Science Letters, 12(2), 162-167, doi:10.1002/asl.316, 2011). These experiments use state-of-the-art climate models to simulate solar geoengineering via uniform solar reduction, creation of stratospheric sulfate aerosol layers, or injecting sea spray into the marine boundary layer. GeoMIP has been quite successful in its mission of revealing robust features and key uncertainties of the modeled effects of solar geoengineering.
Note: Application of a novel 2(3HUS+S) parallel manipulator for simulation of hip joint motion
NASA Astrophysics Data System (ADS)
Shan, X. L.; Cheng, G.; Liu, X. Z.
2016-07-01
In the paper, a novel 2(3HUS+S) parallel manipulator, which has two moving platforms, is proposed. The parallel manipulator is adopted to simulate hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the simulated motions. The experimental results indicate that the 2(3HUS+S) parallel manipulator can realize the simulation of many kinds of hip joint motions without changing the structure size.
Note: Application of a novel 2(3HUS+S) parallel manipulator for simulation of hip joint motion.
Shan, X L; Cheng, G; Liu, X Z
2016-07-01
In the paper, a novel 2(3HUS+S) parallel manipulator, which has two moving platforms, is proposed. The parallel manipulator is adopted to simulate hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the simulated motions. The experimental results indicate that the 2(3HUS+S) parallel manipulator can realize the simulation of many kinds of hip joint motions without changing the structure size. PMID:27475608
NASA Astrophysics Data System (ADS)
Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro
2016-08-01
We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H
2006-09-04
We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM
Simulating Capacitances to Silicon Quantum Dots: Breakdown of the Parallel Plate Capacitor Model
NASA Astrophysics Data System (ADS)
Thorbeck, Ted; Fujiwara, Akira; Zimmerman, Neil M.
2012-09-01
Many electrical applications of quantum dots rely on capacitively coupled gates; therefore, to make reliable devices we need those gate capacitances to be predictable and reproducible. We demonstrate in silicon nanowire quantum dots that gate capacitances are reproducible to within 10% for nominally identical devices. We demonstrate the experimentally that gate capacitances scale with device dimensions. We also demonstrate that a capacitance simulator can be used to predict measured gate capacitances to within 20%. A simple parallel plate capacitor model can be used to predict how the capacitances change with device dimensions; however, the parallel plate capacitor model fails for the smallest devices because the capacitances are dominated by fringing fields. We show how the capacitances due to fringing fields can be quickly estimated.
Steepening of parallel propagating hydromagnetic waves into magnetic pulsations - A simulation study
NASA Technical Reports Server (NTRS)
Akimoto, K.; Winske, D.; Onsager, T. G.; Thomsen, M. F.; Gary, S. P.
1991-01-01
The steepening mechanism of parallel propagating low-frequency MHD-like waves observed upstream of the earth's quasi-parallel bow shock has been investigated by means of electromagnetic hybrid simulations. It is shown that an ion beam through the resonant electromagnetic ion/ion instability excites large-amplitude waves, which consequently pitch angle scatter, decelerate, and eventually magnetically trap beam ions in regions where the wave amplitudes are largest. As a result, the beam ions become bunched in both space and gyrophase. As these higher-density, nongyrotropic beam segments are formed, the hydromagnetic waves rapidly steepen, resulting in magnetic pulsations, with properties generally in agreement with observations. This steepening process operates on the scale of the linear growth time of the resonant ion/ion instability. Many of the pulsations generated by this mechanism are left-hand polarized in the spacecraft frame.
Parallel traffic flow simulation of freeway networks: Phase 2. Final report 1994--1995
Chronopoulos, A.
1997-07-01
Explicit and implicit numerical methods for solving simple macroscopic traffic flow continuum models have been studied and efficiently implemented in traffic simulation codes in the past. The authors have already studied and implemented explicit methods for solving the high-order flow conservation traffic model. Implicit methods allow much larger time step size than explicit methods, for the same accuracy. However, at each time step a nonlinear system must be solved. They use the Newton method coupled with a linear iterative (Orthomin). They accelerate the convergence of Orthomin with parallel incomplete LU factorization preconditionings. The authors implemented this implicit method on a 16 processor nCUBE2 parallel computer and obtained significant execution time speedup.
Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems
Martinez, E.; Monasterio, P.R.; Marian, J.
2011-02-20
An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
Parallel Beam Dynamics Simulation Tools for Future Light SourceLinac Modeling
Qiang, Ji; Pogorelov, Ilya v.; Ryne, Robert D.
2007-06-25
Large-scale modeling on parallel computers is playing an increasingly important role in the design of future light sources. Such modeling provides a means to accurately and efficiently explore issues such as limits to beam brightness, emittance preservation, the growth of instabilities, etc. Recently the IMPACT codes suite was enhanced to be applicable to future light source design. Simulations with IMPACT-Z were performed using up to one billion simulation particles for the main linac of a future light source to study the microbunching instability. Combined with the time domain code IMPACT-T, it is now possible to perform large-scale start-to-end linac simulations for future light sources, including the injector, main linac, chicanes, and transfer lines. In this paper we provide an overview of the IMPACT code suite, its key capabilities, and recent enhancements pertinent to accelerator modeling for future linac-based light sources.
Xyce parallel electronic simulator design : mathematical formulation, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.
2004-06-01
This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.
Direct NOE simulation from long MD trajectories.
Chalmers, G; Glushka, J N; Foley, B L; Woods, R J; Prestegard, J H
2016-04-01
A software package, MD2NOE, is presented which calculates Nuclear Overhauser Effect (NOE) build-up curves directly from molecular dynamics (MD) trajectories. It differs from traditional approaches in that it calculates correlation functions directly from the trajectory instead of extracting inverse sixth power distance terms as an intermediate step in calculating NOEs. This is particularly important for molecules that sample conformational states on a timescale similar to molecular reorientation. The package is tested on sucrose and results are shown to differ in small but significant ways from those calculated using an inverse sixth power assumption. Results are also compared to experiment and found to be in reasonable agreement despite an expected underestimation of water viscosity by the water model selected. PMID:26826977
Direct NOE simulation from long MD trajectories
NASA Astrophysics Data System (ADS)
Chalmers, G.; Glushka, J. N.; Foley, B. L.; Woods, R. J.; Prestegard, J. H.
2016-04-01
A software package, MD2NOE, is presented which calculates Nuclear Overhauser Effect (NOE) build-up curves directly from molecular dynamics (MD) trajectories. It differs from traditional approaches in that it calculates correlation functions directly from the trajectory instead of extracting inverse sixth power distance terms as an intermediate step in calculating NOEs. This is particularly important for molecules that sample conformational states on a timescale similar to molecular reorientation. The package is tested on sucrose and results are shown to differ in small but significant ways from those calculated using an inverse sixth power assumption. Results are also compared to experiment and found to be in reasonable agreement despite an expected underestimation of water viscosity by the water model selected.
FLY. A parallel tree N-body code for cosmological simulations
NASA Astrophysics Data System (ADS)
Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.
2003-10-01
FLY is a parallel treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: parallel program. The user can select the number of processors >=1 Has the code been vectorized or parallelized?: parallelized Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: Parallel tree N-body code for cosmological simulations Nature of physical problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles simulation on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY
Direct Simulation of Shock Layer Plasmas
Farbar, E. D.; Boyd, I. D.
2011-05-20
Approximate models of the electric field used with the DSMC method all impose quasi-neutrality everywhere in the shock layer plasma. The shortcomings of these models are examined in this study by simulating a weak shock layer plasma with a coupled DSMC-Particle-In-Cell (PIC) method. The stagnation streamline of an axisymmetric shock layer is simulated for entry velocities in air that correspond to both lunar and Mars return trajectories. The atmospheric densities, particle diameters and chemical reaction rates are varied from the actual values to make the computations tractable while retaining the mean free path of air at 85 km altitude. In contrast to DSMC flow field predictions, regions of non-neutrality are predicted by the DSMC-PIC method, and the electrons are predicted to be isothermal. Perhaps the most important result of this study is that the DSMC-PIC results at both reentry energies yield a 14% increase in heat flux to the vehicle surface relative to the DSMC results. Rather unintuitively, this is mostly due to an increase in ion flux to the surface, rather than the potential energy gained by each ion as it traverses the plasma sheath. In this study, an approximate electric field model is presented, with the goal of accounting for this heat flux augmentation without the need for a computationally expensive DSMC-PIC calculation of the entire flow-field. Convective heat flux results obtained with new electric field model are compared to results from the rigorous DSMC-PIC calculations.
Direct Simulation of Shock Layer Plasmas
NASA Astrophysics Data System (ADS)
Farbar, E. D.; Boyd, I. D.
2011-05-01
Approximate models of the electric field used with the DSMC method all impose quasi-neutrality everywhere in the shock layer plasma. The shortcomings of these models are examined in this study by simulating a weak shock layer plasma with a coupled DSMC-Particle-In-Cell (PIC) method. The stagnation streamline of an axisymmetric shock layer is simulated for entry velocities in air that correspond to both lunar and Mars return trajectories. The atmospheric densities, particle diameters and chemical reaction rates are varied from the actual values to make the computations tractable while retaining the mean free path of air at 85 km altitude. In contrast to DSMC flow field predictions, regions of non-neutrality are predicted by the DSMC-PIC method, and the electrons are predicted to be isothermal. Perhaps the most important result of this study is that the DSMC-PIC results at both reentry energies yield a 14% increase in heat flux to the vehicle surface relative to the DSMC results. Rather unintuitively, this is mostly due to an increase in ion flux to the surface, rather than the potential energy gained by each ion as it traverses the plasma sheath. In this study, an approximate electric field model is presented, with the goal of accounting for this heat flux augmentation without the need for a computationally expensive DSMC-PIC calculation of the entire flow-field. Convective heat flux results obtained with new electric field model are compared to results from the rigorous DSMC-PIC calculations.
NASA Astrophysics Data System (ADS)
Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.
2016-01-01
Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
NASA Astrophysics Data System (ADS)
Honkonen, I.
2014-07-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for parallel programming is also provided by allowing users to select which simulation variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.
Accelerating groundwater flow simulation in MODFLOW using JASMIN-based parallel computing.
Cheng, Tangpei; Mo, Zeyao; Shao, Jingli
2014-01-01
To accelerate the groundwater flow simulation process, this paper reports our work on developing an efficient parallel simulator through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and parallel algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is simulating flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine times faster than the GMG-based MODFLOW program. The third test is a simplified transient flow problem with the order of tens of millions of cells to examine the scalability. Compared to 32 cores, parallel efficiency of 77 and 68% are obtained on 512 and 1024 cores, respectively, which indicates impressive scalability.
Simulation of Unsteady Combustion in a Ramjet Engine Using a Highly Parallel Computer
NASA Technical Reports Server (NTRS)
Menon, Suresh; Weeratunga, Sisira; Cooper, D. M. (Technical Monitor)
1994-01-01
Combustion instability in ramjets is a complex phenomenon that involve nonlinear interaction between acoustic waves, vortex motion and unsteady heat release in the combustor. To numerically simulate this 3-D, transient phenomenon, enormous computer resources (time, memory and disk storage) are required. Although current generation vector supercomputers are capable of providing adequate resources for simulations of this nature, their high cost and limited availability, makes such machines less than satisfactory for routine use. The primary focus of this study is to assess the feasibility of using highly parallel computer systems as a cost-effective alternative for conducting such unsteady flow simulations. Towards this end, a large-eddy simulation model for combustion instability was implemented on the Intel iPSC/860 and a careful study was conducted to determine the benefits and the problems associated with the use of such machines for transient simulations. Details of this study along with the results obtained from the unsteady combustion simulations carried out on the iPSC/860 are discussed in this paper.
A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers
NASA Technical Reports Server (NTRS)
Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl
2010-01-01
Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.
NASA Technical Reports Server (NTRS)
Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon
2014-01-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a
Visualization of parallel molecular dynamics simulation on a remote visualization platform
Lee, T.Y.; Raghavendra, C.S.; Nicholas, J.B.
1994-09-01
Visualization requires high performance computers. In order to use these shared high performance computers located at national centers, the authors need an environment for remote visualization. Remote visualization is a special process that uses computing resources and data that are physically distributed over long distances. In their experimental environment, a parallel raytracer is designed for the rendering task. It allows one to efficiently visualize molecular dynamics simulations represented by three dimensional ball-and-stick models. Different issues encountered in creating their platform are discussed, such as I/O, load balancing, and data distribution.
Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop
Ghosh, K K
2011-11-07
Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.
Parallel maturation of goal-directed behavior and dopaminergic systems during adolescence.
Naneix, Fabien; Marchand, Alain R; Di Scala, Georges; Pape, Jean-Rémi; Coutureau, Etienne
2012-11-14
Adolescence is a crucial developmental period characterized by specific behaviors reflecting the immaturity of decision-making abilities. However, the maturation of precise cognitive processes and their neurobiological correlates at this period remain poorly understood. Here, we investigate whether a differential developmental time course of dopamine (DA) pathways during late adolescence could explain the emergence of particular executive and motivational components of goal-directed behavior. First, using a contingency degradation protocol, we demonstrate that adolescent rats display a specific deficit when the causal relationship between their actions and their consequences is changed. When the rats become adults, this deficit disappears. In contrast, actions of adolescents remain sensitive to outcome devaluation or to the influence of a pavlovian-conditioned stimulus. This aspect of cognitive maturation parallels a delayed development of the DA system, especially the mesocortical pathway involved in action adaptation to rule changes. Unlike in striatal and nucleus accumbens regions, DA fibers and DA tissue content continue to increase in the medial prefrontal cortex from juvenile to adult age. Moreover, a sustained overexpression of DA receptors is observed in the prefrontal region until the end of adolescence. These findings highlight the relationship between the emergence of specific cognitive processes, in particular the adaptation to changes in action consequences, and the delayed maturation of the mesocortical DA pathway. Similar developmental processes in humans could contribute to the adolescent vulnerability to the emergence of several psychiatric disorders characterized by decision-making deficits. PMID:23152606
Direct Numerical Simulation of Automobile Cavity Tones
NASA Technical Reports Server (NTRS)
Kurbatskii, Konstantin; Tam, Christopher K. W.
2000-01-01
The Navier Stokes equation is solved computationally by the Dispersion-Relation-Preserving (DRP) scheme for the flow and acoustic fields associated with a laminar boundary layer flow over an automobile door cavity. In this work, the flow Reynolds number is restricted to R(sub delta*) < 3400; the range of Reynolds number for which laminar flow may be maintained. This investigation focuses on two aspects of the problem, namely, the effect of boundary layer thickness on the cavity tone frequency and intensity and the effect of the size of the computation domain on the accuracy of the numerical simulation. It is found that the tone frequency decreases with an increase in boundary layer thickness. When the boundary layer is thicker than a certain critical value, depending on the flow speed, no tone is emitted by the cavity. Computationally, solutions of aeroacoustics problems are known to be sensitive to the size of the computation domain. Numerical experiments indicate that the use of a small domain could result in normal mode type acoustic oscillations in the entire computation domain leading to an increase in tone frequency and intensity. When the computation domain is expanded so that the boundaries are at least one wavelength away from the noise source, the computed tone frequency and intensity are found to be computation domain size independent.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097
Parallel lattice Boltzmann simulation of bubble rising and coalescence in viscous flows
NASA Astrophysics Data System (ADS)
Shi, Dongyan; Wang, Zhikai
2015-07-01
A parallel three-dimensional lattice Boltzmann scheme for multicomponent immiscible fluids is proposed to simulate bubble rising and coalescence process in viscous flows. The lattice Boltzmann scheme is based on the free-energy model and is parallelized in the share-memory model by using the OpenMP. Bubble interface is described by a diffusion interface method solving the Cahn-Hilliard equation and both the surface tension force and the buoyancy are introduced in a form of discrete body force. To avoid the numerical instability caused by the interface deformation, the 18 point finite difference scheme is utilized to calculate the first- and second-order space derivative. The correction of the parallel scheme handling three-dimensional interfaces is verified by the Laplace law and the dynamic characteristics of an isolated bubble in stationary flows. Subsequently, effects of the initially relative position, accompanied by the size ratio on bubble-bubble interaction are studied. The results show that the present scheme can effectively describe the bubble interface dynamics, even if rupture and restructure occurs. In addition to the repulsion and coalescence phenomenon due to the relative position, the size ratio also plays an insignificant role in bubble deformation and trajectory.
Simulation of optical devices using parallel finite-difference time-domain method
NASA Astrophysics Data System (ADS)
Li, Kang; Kong, Fanmin; Mei, Liangmo; Liu, Xin
2005-11-01
This paper presents a new parallel finite-difference time-domain (FDTD) numerical method in a low-cost network environment to stimulate optical waveguide characteristics. The PC motherboard based cluster is used, as it is relatively low-cost, reliable and has high computing performance. Four clusters are networked by fast Ethernet technology. Due to the simplicity nature of FDTD algorithm, a native Ethernet packet communication mechanism is used to reduce the overhead of the communication between the adjacent clusters. To validate the method, a microcavity ring resonator based on semiconductor waveguides is chosen as an instance of FDTD parallel computation. Speed-up rate under different division density is calculated. From the result we can conclude that when the decomposing size reaches a certain point, a good parallel computing speed up will be maintained. This simulation shows that through the overlapping of computation and communication method and controlling the decomposing size, the overhead of the communication of the shared data will be conquered. The result indicates that the implementation can achieve significant speed up for the FDTD algorithm. This will enable us to tackle the larger real electromagnetic problem by the low-cost PC clusters.
Direct Numerical Simulations of Phytoplankton Blooms
NASA Astrophysics Data System (ADS)
Luna, Christopher; Tang, Wenbo
2013-04-01
Motivated by observations of phytoplankton blooms in the North Atlantic obtained through satellite imaging, and by the recent developments with objective extractions of flow topologies using Lagrangian Coherent Structures, we studied the Fisher-Kolmogorov equations inside a double-gyre system. We quantified the variabilities in biochemical reaction processes based on a natural coordinate system extracted from the Lagrangian topologies and examined how the initial placement of a biomass in this coordinate system correlated to its growth rate. The Lagrangian topologies are extracted as the extrema of the Finite-Time Lyapunov Exponent (FTLE) field for the flow, and the natural coordinate system used is based on the extracted invariant barriers. We found the dependence of reaction rates on the hyperbolic finite time invariant manifolds highlighting the largest stretching of scalars as well as the reaction rates in the transversal direction from eddy centers to their edges. It was observed that the biological reaction processes are heavily modulated by Coherent Structures in the flow. With initial placement in repelling structures, the biological species is helped to spread out much faster, hence allowing biochemical reactions to take place more quickly. With initial placement in attracting structures, the biological species is brought to be highly concentrated, hence suppressing the overall growth of the biomass.
NASA Astrophysics Data System (ADS)
Pinto, Thiago M.; Schilstra, Maria J.; Steuber, Volker; Roque, Antonio C.
2015-12-01
Long-term plasticity at parallel fibre (PF)-Purkinje cell (PC) synapses is thought to mediate cerebellar motor learning. It is known that calcium-calmodulin dependent protein kinase II (CaMKII) is essential for plasticity in the cerebellum. Recently, Van Woerden et al. demonstrated that the β isoform of CaMKII regulates the bidirectional inversion of PF-PC plasticity. Because the cellular events that underlie these experimental findings are still poorly understood, our work aims at unravelling how β CaMKII controls the direction of plasticity at PF-PC synapses. We developed a bidirectional plasticity model that replicates the experimental observations by Van Woerden et al. Simulation results obtained from this model indicate the mechanisms that underlie the bidirectional inversion of cerebellar plasticity. As suggested by Van Woerden et al., the filamentous actin binding enables β CaMKII to regulate the bidirectional plasticity at PF-PC synapses. Our model suggests that the reversal of long-term plasticity in PCs is based on a combination of mechanisms that occur at different calcium concentrations.
PCSIM: A Parallel Simulation Environment for Neural Circuits Fully Integrated with Python
Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus
2008-01-01
The Parallel Circuit SIMulator (PCSIM) is a software package for simulation of neural circuits. It is primarily designed for distributed simulation of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit simulator with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural simulations. PMID:19543450
Gait simulation via a 6-DOF parallel robot with iterative learning control.
Aubin, Patrick M; Cowley, Matthew S; Ledoux, William R
2008-03-01
We have developed a robotic gait simulator (RGS) by leveraging a 6-degree of freedom parallel robot, with the goal of overcoming three significant challenges of gait simulation, including: 1) operating at near physiologically correct velocities; 2) inputting full scale ground reaction forces; and 3) simulating motion in all three planes (sagittal, coronal and transverse). The robot will eventually be employed with cadaveric specimens, but as a means of exploring the capability of the system, we have first used it with a prosthetic foot. Gait data were recorded from one transtibial amputee using a motion analysis system and force plate. Using the same prosthetic foot as the subject, the RGS accurately reproduced the recorded kinematics and kinetics and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller. After six gait iterations the controller reduced the root mean square (RMS) error between the simulated and in situ; vertical ground reaction force to 35 N during a 1.5 s simulation of the stance phase of gait with a prosthetic foot. This paper addresses the design, methodology and validation of the novel RGS. PMID:18334421
L-PICOLA: A parallel code for fast dark matter simulation
NASA Astrophysics Data System (ADS)
Howlett, C.; Manera, M.; Percival, W. J.
2015-09-01
Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-parallel code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body simulation. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the simulation and simulate the past lightcone at run-time, with optional replication of the simulation volume. Through comparisons to fully non-linear N-Body simulations we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.
Gait simulation via a 6-DOF parallel robot with iterative learning control.
Aubin, Patrick M; Cowley, Matthew S; Ledoux, William R
2008-03-01
We have developed a robotic gait simulator (RGS) by leveraging a 6-degree of freedom parallel robot, with the goal of overcoming three significant challenges of gait simulation, including: 1) operating at near physiologically correct velocities; 2) inputting full scale ground reaction forces; and 3) simulating motion in all three planes (sagittal, coronal and transverse). The robot will eventually be employed with cadaveric specimens, but as a means of exploring the capability of the system, we have first used it with a prosthetic foot. Gait data were recorded from one transtibial amputee using a motion analysis system and force plate. Using the same prosthetic foot as the subject, the RGS accurately reproduced the recorded kinematics and kinetics and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller. After six gait iterations the controller reduced the root mean square (RMS) error between the simulated and in situ; vertical ground reaction force to 35 N during a 1.5 s simulation of the stance phase of gait with a prosthetic foot. This paper addresses the design, methodology and validation of the novel RGS.
Deiterding, Ralf; Wood, Stephen L
2013-01-01
We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.
NASA Astrophysics Data System (ADS)
van der Kaap, N. J.; Koster, L. J. A.
2016-02-01
A parallel, lattice based Kinetic Monte Carlo simulation is developed that runs on a GPGPU board and includes Coulomb like particle-particle interactions. The performance of this computationally expensive problem is improved by modifying the interaction potential due to nearby particle moves, instead of fully recalculating it. This modification is achieved by adding dipole correction terms that represent the particle move. Exact evaluation of these terms is guaranteed by representing all interactions as 32-bit floating numbers, where only the integers between -222 and 222 are used. We validate our method by modelling the charge transport in disordered organic semiconductors, including Coulomb interactions between charges. Performance is mainly governed by the particle density in the simulation volume, and improves for increasing densities. Our method allows calculations on large volumes including particle-particle interactions, which is important in the field of organic semiconductors.
Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development
NASA Technical Reports Server (NTRS)
Mangieri, Mark L.; Hoang, June
2014-01-01
NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.
Turbulent transport of a passive scalar in a channel by direct simulation
Rutledge, J.M.; Sleicher, C.A. . Dept. of Chemical Engineering)
1988-01-01
The authors discuss their study of scalar transport in a fluid with a Reynolds number of 360 based on friction velocity or 4600 based on average velocity and width between the plates. The Prandtl number is 0.72. The simulation method uses Fourier transforms to integrate parallel to boundaries and Chebyshev collocation to integrate perpendicular to boundaries. A Runge-Kutta-Wray scheme is used to integrate in time so that an adjustable time step size can be chosen to keep the scheme numerically stable. The flow and temperature fields are solved with 96 Fourier modes in the two parallel directions and 64 Chebyshev collocation points in the perpendicular direction for a total of about 600,000 grid points in the entire domain.
A Many-Task Parallel Approach for Multiscale Simulations of Subsurface Flow and Reactive Transport
Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.
2014-12-16
Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall simulation domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in parallel; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo simulations of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing parallel continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale simulations, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface simulations with minimal overhead.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software.
Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C
2015-08-01
The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance.
A scalable parallel Stokesian Dynamics method for the simulation of colloidal suspensions
NASA Astrophysics Data System (ADS)
Bülow, F.; Hamberger, P.; Nirschl, H.; Dörfler, W.
2016-07-01
We have developed a new method for the efficient numerical simulation of colloidal suspensions. This method is designed and especially well-suited for parallel code execution, but it can also be applied to single-core programs. It combines the Stokesian Dynamics method with a variant of the widely used Barnes-Hut algorithm in order to reduce computational costs. This combination and the inherent parallelization of the method make simulations of large numbers of particles within days possible. The level of accuracy can be determined by the user and is limited by the truncation of the used multipole expansion. Compared to the original Stokesian Dynamics method the complexity can be reduced from O(N2) to linear complexity for dilute suspensions of strongly clustered particles, N being the number of particles. In case of non-clustered particles in a dense suspension, the complexity depends on the particle configuration and is between O(N) and O(Pnp,max2) , where P is the number of used processes and np,max = ⌈ N / P ⌉ , respectively.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software.
Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C
2015-08-01
The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. PMID:26123630
Optimized simulations of Olami-Feder-Christensen systems using parallel algorithms
NASA Astrophysics Data System (ADS)
Dominguez, Rachele; Necaise, Rance; Montag, Eric
The sequential nature of the Olami-Feder-Christensen (OFC) model for earthquake simulations limits the benefits of parallel computing approaches because of the frequent communication required between processors. We developed a parallel version of the OFC algorithm for multi-core processors. Our data, even for relatively small system sizes and low numbers of processors, indicates that increasing the number of processors provides significantly faster simulations; producing more efficient results than previous attempts that used network-based Beowulf clusters. Our algorithm optimizes performance by exploiting the multi-core processor architecture, minimizing communication time in contrast to the networked Beowulf-cluster approaches. Our multi-core algorithm is the basis for a new algorithm using GPUs that will drastically increase the number of processors available. Previous studies incorporating realistic structural features of faults into OFC models have revealed spatial and temporal patterns observed in real earthquake systems. The computational advances presented here will allow for studying interacting networks of faults, rather than individual faults, further enhancing our understanding of the relationship between the earth's structure and the triggering process. Support for this project comes from the Chenery Research Fund, the Rashkind Family Endowment, the Walter Williams Craigie Teaching Endowment, and the Schapiro Undergraduate Research Fellowship.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software
Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan
2015-01-01
The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630
A New Parallel Processing Scheme Enabling Full Monte Carlo EAS Simulation in the GZK Energy Region
NASA Astrophysics Data System (ADS)
Kasahara, K.; Cohen, F.
We developed a new parallel processing method enabling full M.C EAS simulation (say, with minimum energy of 500 keV) without using thin sampling even at 1019 eV. Normally, distributed-parallel processing needs a specific software and programs must be organized to match with such system. During the computation such a scheme also requires complex communications among many computer hosts. Our scheme first creates a skeleton of a shower, and smashes it into n-peaces and distributes the peaces to n- cpu to flesh them. After each peace is completely fleshed, they are assembled to make a complete picture of the shower. Thus, during the computation need no communication. With n=50, a 1019 eV shower can be simulated in ~10 days. For a 1020 eV shower, we may randomly sample a fraction of n-peases (say, 100 for n=1000), and safely econstruct whole picture of the shower. The scheme dose not use any weight on each particle and very much stable. The scheme has been implemented in Cosmos code. To produce a number of showers with full fluctuations, we have also developed a new method which utilizes the present result. The latter is used for the TA experiment and is described in an accompanying paper.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.
Halic, Tansel; Ahn, Woojin; De, Suvranu
2014-01-01
This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.
Halic, Tansel; Ahn, Woojin; De, Suvranu
2014-01-01
This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions. PMID:24732497
Channel simulation for direct-detection optical communication systems
NASA Technical Reports Server (NTRS)
Tycz, M.; Fitzmaurice, M. W.
1974-01-01
A technique is described for simulating the random modulation imposed by atmospheric scintillation and transmitter pointing jitter on a direct-detection optical communication system. The system is capable of providing signal fading statistics which obey log-normal, beta, Rayleigh, Ricean, or chi-square density functions. Experimental tests of the performance of the channel simulator are presented.
Channel simulation for direct detection optical communication systems
NASA Technical Reports Server (NTRS)
Tycz, M.; Fitzmaurice, M. W.
1974-01-01
A technique is described for simulating the random modulation imposed by atmospheric scintillation and transmitter pointing jitter on a direct detection optical communication system. The system is capable of providing signal fading statistics which obey log normal, beta, Rayleigh, Ricean or chi-squared density functions. Experimental tests of the performance of the Channel Simulator are presented.
NASA Technical Reports Server (NTRS)
Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.
1994-01-01
Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a
Predictions of hydrodynamic simulations for direct dark matter detection
NASA Astrophysics Data System (ADS)
Bozorgnia, Nassim; Calore, Francesca; Schaller, Matthieu; Lovell, Mark; Bertone, Gianfranco; Frenk, Carlos S.; Crain, Robert A.; Navarro, Julio F.; Schaye, Joop; Theuns, Tom
2016-05-01
We study the effects of galaxy formation on dark matter direct detection using hydrodynamic simulations obtained from the “Evolution and Assembly of GaLaxies and their Environments” (EAGLE) and APOSTLE projects. We extract the local dark matter density and velocity distribution of the simulated Milky Way analogues, and use them directly to perform an analysis of current direct detection data. The local dark matter density of the Milky Way-like galaxies is 0.41–0.73 GeV/cm3, and a Maxwellian distribution (with best fit peak speed of 223–289 km/s) describes well the local dark matter speed distribution. We find that the consistency between the result of different direct detection experiments cannot be improved by using the dark matter distribution of the simulated haloes.
Weaver, R. P.; Gittings, M. L.
2004-01-01
The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively parallel hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-parallel versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest simulations we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these simulations. Key features of any massively parallel system
NASA Astrophysics Data System (ADS)
Zhou, Jun
The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with
A package of Linux scripts for the parallelization of Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Badal, Andreu; Sempau, Josep
2006-09-01
Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) simulation involves a prohibitively large amount of time. This limitation can be overcome by having recourse to parallel computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a parallelization scheme of a MC simulation that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the parallel calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to parallelize PENELOPE, a code for MC simulation of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in parallel easily, without requiring specific libraries or significant alterations of the
Ion Dynamics at a Rippled Quasi-parallel Shock: 2D Hybrid Simulations
NASA Astrophysics Data System (ADS)
Hao, Yufei; Lu, Quanming; Gao, Xinliang; Wang, Shui
2016-05-01
In this paper, two-dimensional hybrid simulations are performed to investigate ion dynamics at a rippled quasi-parallel shock. The results show that the ripples around the shock front are inherent structures of a quasi-parallel shock, and the re-formation of the shock is not synchronous along the surface of the shock front. By following the trajectories of the upstream ions, we find that these ions behave differently when they interact with the shock front at different positions along the shock surface. The upstream particles are transmitted more easily through the upper part of a ripple, and the corresponding bulk velocity downstream is larger, where a high-speed jet is formed. In the lower part of the ripple, the upstream particles tend to be reflected by the shock. Ions reflected by the shock may suffer multiple-stage acceleration when moving along the shock surface or trapped between the upstream waves and the shock front. Finally, these ions may escape further upstream or move downstream; therefore, superthermal ions can be found both upstream and downstream.
Experiences with serial and parallel algorithms for channel routing using simulated annealing
NASA Technical Reports Server (NTRS)
Brouwer, Randall Jay
1988-01-01
Two algorithms for channel routing using simulated annealing are presented. Simulated annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of simulated annealing. The algorithm was implemented as a serial program for a workstation, and a parallel program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.
A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters
NASA Astrophysics Data System (ADS)
Pathak, Ashish; Raessi, Mehdi
2015-11-01
We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.
Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices
Wang Jianguo; Chen Zaigao; Wang Yue; Zhang Dianhui; Qiao Hailiang; Fu Meiyan; Yuan Yuan; Liu Chunliang; Li Yongdong; Wang Hongguang
2010-07-15
This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2016-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.
2011-08-15
There has been a great concern about the origin of the parallel electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to simulate magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with simulation results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when parallel current flows. The electron convection term enables us to describe a situation in which a parallel electric field and parallel electron acceleration coexist, which is impossible for ideal or resistive MHD.
NASA Astrophysics Data System (ADS)
Zhao, Tao; Hwang, Feng-Nan; Cai, Xiao-Chuan
2016-07-01
We consider a quintic polynomial eigenvalue problem arising from the finite volume discretization of a quantum dot simulation problem. The problem is solved by the Jacobi-Davidson (JD) algorithm. Our focus is on how to achieve the quadratic convergence of JD in a way that is not only efficient but also scalable when the number of processor cores is large. For this purpose, we develop a projected two-level Schwarz preconditioned JD algorithm that exploits multilevel domain decomposition techniques. The pyramidal quantum dot calculation is carefully studied to illustrate the efficiency of the proposed method. Numerical experiments confirm that the proposed method has a good scalability for problems with hundreds of millions of unknowns on a parallel computer with more than 10,000 processor cores.
Shin, Hyun-Ho; Yoon, Woong-Sup
2008-07-01
An Adaptive-Spatial Decomposition parallel algorithm was developed to increase computation efficiency for molecular dynamics simulations of nano-fluids. Injection of a liquid argon jet with a scale of 17.6 molecular diameters was investigated. A solid annular platinum injector was also solved simultaneously with the liquid injectant by adopting a solid modeling technique which incorporates phantom atoms. The viscous heat was naturally discharged through the solids so the liquid boiling problem was avoided with no separate use of temperature controlling methods. Parametric investigations of injection speed, wall temperature, and injector length were made. A sudden pressure drop at the orifice exit causes flash boiling of the liquid departing the nozzle exit with strong evaporation on the surface of the liquids, while rendering a slender jet. The elevation of the injection speed and the wall temperature causes an activation of the surface evaporation concurrent with reduction in the jet breakup length and the drop size.
NASA Astrophysics Data System (ADS)
Adamyan, H. H.; Adamyan, N. H.; Gevorgyan, N. T.; Gevorgyan, T. V.; Kryuchkyan, G. Yu.
2008-05-01
We provide a software package for numerical simulations and modeling of complex quantum systems in the presence of dissipation and decoherence for a wider class of problems in the field of quantum technologies. This software is based on the method of quantum trajectories usually used for calculations of the density matrix. An important part of this Toolkit is the universal user interface, which is based on Tool Command Language (TCL) scripting language. It is elaborated in such a manner that the system description and system parameters should not be included in the source code. The core is implemented as a generic set of C++ classes, which can be efficiently reused for modeling of a wide range of photonic systems. The code has been written so as to facilitate optimization of the performance without breaking the object-orientedness of the design. We demonstrate that this software package is very useful for high performance parallel calculations on the Cluster.
NASA Astrophysics Data System (ADS)
Goldstein, Daniel; Thomas, Rollin; Kasen, Daniel
2015-01-01
Collaboration between the type Ia supernova (SN Ia) modeling and observation communities hinges on our ability to directly connect simulations to data. Here we introduce supernova emulation, a method for facilitating such a connection. Emulation allows us to instantaneously predict the observables (light curves, spectra, spectral time series) generated by arbitrary SN Ia radiative transfer simulations, with estimates of prediction error. Emulators learn the mapping between physically meaningful simulation inputs and the resulting synthetic observables from a training set of simulation input-output pairs. In our emulation framework, we model PCA-decomposed representations of simulated observables as an ensemble of Gaussian Processes. As a proof of concept, we train a bolometric light curve (BLC) emulator on a grid of 400 simulation inputs and BLCs synthesized with the publicly available, gray, time-dependent Monte Carlo expanding atmospheres code, SMOKE. We emulate SMOKE simulations evaluated at a set of 100 out-of-sample input parameters, and achieve excellent agreement between the emulator predictions and the simulated BLCs. In addition to predicting simulation outputs, emulators allow us to infer the regions of simulation input parameter space that correspond to observed SN Ia light curves and spectra. We present a Bayesian framework for solving this inverse problem using Markov Chain Monte Carlo sampling. We fit published bolometric light curves with our emulator and obtain reconstructed masses (nickel mass, total ejecta mass) in agreement with reconstructions from semi-analytic models. We discuss applications of emulation to supernova cosmology and physics, including how emulators can be used to identify and quantify astrophysical sources of systematic error affecting SNe Ia as distance indicators for cosmology.
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.
2013-05-23
Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach was evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.
NASA Astrophysics Data System (ADS)
Nagy, Carl
Field mapping and, structural, microstructural, and chronological analyses confirm the existence of a segment of the Gurla-Mandhata-Humla fault, an orogen-parallel strike-slip dominated shear zone in the upper Karnali valley of northwestern Nepal. This shear zone forms the upper contact of, and cuts obliquely across the Greater Himalayan Sequence (GHS). Data from this study reveal two phases of GHS deformation. Phase 1 is characterized by U-Th-Pb monazite crystallization ages (˜26--12 Ma, peak ˜18--15 Ma), consistent with typical Neohimalayan metamorphic ages, and the final stages of south-directed extrusion of the GHS. Phase 2 is characterized by south-dipping high-strain foliations and intensely developed ESE-WNW trending, shallowly plunging mineral elongation lineations, indicating orogen-parallel extension. Thermochronology of muscovite defining these fabrics implies that the area was cooling and experiencing orogen-parallel extension by ˜15--9 Ma. Mineral deformation mechanisms and quartz c-axis patterns of these fabrics record a rapid increase in temperature from ˜350°C along the shear zone, to ˜650°C at ˜2.5 structural km below the shear zone. Such temperature gradients may be remnants of telescoped and/or flattened isotherms generated during south-directed extrusion of the GHS. Overprinting ESE-WNW fabrics record progressive deformation of the GHS at lower temperatures. Progressive deformation included a significant component of pure shear, as indicated by symmetric high-temperature quartz c-axis fabrics and a lower-temperature vorticity estimate (˜59% pure shear). A transition in c-axis fabrics from type I to type II cross-girdles at ˜ 1.2 km below the fault could indicate a transition from plane strain towards constriction. Together, these data suggest orogen-parallel extension was occurring as a result of transtension. This study reveals a transition from south-directed extrusion of the GHS to orogen-parallel extension between ˜15--13 Ma
Hwang, F-N Wei, Z-H Huang, T-M Wang Weichung
2010-04-20
We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.
NASA Technical Reports Server (NTRS)
Bruno, John
1984-01-01
The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.
NASA Astrophysics Data System (ADS)
Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail
2013-12-01
Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.
NASA Astrophysics Data System (ADS)
Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji
2016-03-01
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
Wang, Wenlong; Machta, Jonathan; Katzgraber, Helmut G
2015-07-01
Population annealing is a Monte Carlo algorithm that marries features from simulated-annealing and parallel-tempering Monte Carlo. As such, it is ideal to overcome large energy barriers in the free-energy landscape while minimizing a Hamiltonian. Thus, population-annealing Monte Carlo can be used as a heuristic to solve combinatorial optimization problems. We illustrate the capabilities of population-annealing Monte Carlo by computing ground states of the three-dimensional Ising spin glass with Gaussian disorder, while comparing to simulated-annealing and parallel-tempering Monte Carlo. Our results suggest that population annealing Monte Carlo is significantly more efficient than simulated annealing but comparable to parallel-tempering Monte Carlo for finding spin-glass ground states.
Direct simulation of a turbulent oscillating boundary layer
NASA Technical Reports Server (NTRS)
Spalart, Philippe R.; Baldwin, Barrett S.
1987-01-01
The turbulent boundary layer driven by a freestream velocity that varies sinusoidally in time around a zero mean is considered. The flow has a rich behavior including strong pressure gradients, inflection points, and reversal. A theory for the velocity and stress profiles at high Reynolds number is formulated. Well-resolved direct Navier-Stokes simulations are conducted over a narrow range of Reynolds numbers, and the results are compared with the theoretical predictions. The flow is also computed over a wide range of Reynolds numbers using a new algebraic turbulence model; the results are compared with the direct simulations and the theory.
Simulating a Direction-Finder Search for an ELT
NASA Technical Reports Server (NTRS)
Bream, Bruce
2005-01-01
A computer program simulates the operation of direction-finding equipment engaged in a search for an emergency locator transmitter (ELT) aboard an aircraft that has crashed. The simulated equipment is patterned after the equipment used by the Civil Air Patrol to search for missing aircraft. The program is designed to be used for training in radio direction-finding and/or searching for missing aircraft without incurring the expense and risk of using real aircraft and ground search resources. The program places a hidden ELT on a map and enables the user to search for the location of the ELT by moving a 14 NASA Tech Briefs, March 2005 small aircraft image around the map while observing signal-strength and direction readings on a simulated direction- finding locator instrument. As the simulated aircraft is turned and moved on the map, the program updates the readings on the direction-finding instrument to reflect the current position and heading of the aircraft relative to the location of the ELT. The software is distributed in a zip file that contains an installation program. The software runs on the Microsoft Windows 9x, NT, and XP operating systems.
Tian, Yonghui; Zhang, Lei; Ji, Ruiqiang; Yang, Lin; Zhou, Ping; Chen, Hongtao; Ding, Jianfeng; Zhu, Weiwei; Lu, Yangyang; Jia, Lianxi; Fang, Qing; Yu, Mingbin
2011-05-01
We propose and demonstrate a directed OR/NOR and AND/NAND logic circuit consisting of two parallel microring resonators (MRRs). We use two electrical signals representing the two operands of the logical operation to modulate the two MRRs through the thermo-optic effect, respectively. The final operation results are represented by the output optical signals. Both OR/NOR and AND/NAND operations at 10 kbps are demonstrated.
Monte Carlo Simulation of an Ar RF Parallel Plate Discharge Plasma Employing LPWS
NASA Astrophysics Data System (ADS)
Horie, Ikuya; Suzuki, Takuma; Ohmori, Yoshiyuki; Kitamori, Kazutaka; Maruyama, Koichi
The geometry, pressure and power coupling conditions of most plasma sources for semiconductor manufacturing lend themselves to particle simulations such as Monte Carlo simulations(here after MCS). Usually the kinetics solvers are coupled to solvers for Poisson’s Equation. MCS usually employ averaging over discrete regions of parameter space. When the number of discrete regions is increased, two problems result: one is the instability of the solution because of a statistical change and the other is the increase of the calculation time. Ventzek and Kitamori (J. Appl. Phys., vol. 75, pp. 3785-3788, 1994) proposed Legendre Polynomial Weighted Sampling (here after LPWS) which aimed to optimize sampling statistics with an economy of particles. In this paper, we characterize an Ar RF parallel plate discharge using a MCS employing LPWS based on the Date’s model (T. IEE Japan, 111-A, 11, pp. 962-972, 1991). The method is shown to replicate the behavior of RF discharges with high fidelity.
A study of Gd-based parallel plate avalanche counter for thermal neutrons by MC simulation
NASA Astrophysics Data System (ADS)
Rhee, J. T.; Kim, H. G.; Ahmad, Farzana; Jeon, Y. J.; Jamil, M.
2013-12-01
In this work, we demonstrate the feasibility and characteristics of a single-gap parallel plate avalanche counter (PPAC) as a low energy neutron detector, based on Gd-converter coating. Upon falling on the Gd-converter surface, the incident low energy neutrons produce internal conversion electrons which are evaluated and detected. For estimating the performance of the Gd-based PPAC, a simulation study has been performed using GEANT4 Monte Carlo (MC) code. The detector response as a function of incident neutron energies in the range of 25-100 meV has been evaluated with two different physics lists. Using the QGSP_BIC_HP physics list and assuming 5 μm converter thickness, 11.8%, 18.48%, and 30.28% detection efficiencies have been achieved for the forward-, the backward-, and the total response of the converter-based PPAC. On the other hand, considering the same converter thickness and detector configuration, with the QGSP_BERT_HP physics list efficiencies of 12.19%, 18.62%, and 30.81%, respectively, were obtained. These simulation results are briefly discussed.
NASA Astrophysics Data System (ADS)
Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.
2016-02-01
We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-04-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-01-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331
NASA Astrophysics Data System (ADS)
Cohen, R.; Bodenheimer, P.; Asphaug, E.
2000-12-01
There is both theoretical and observational evidence that giant planets collided with objects with mass >= Mearth during their evolution. These impacts may help shorten planetary formation timescales by changing the opacity of the planetary atmosphere to allow quicker cooling. They may also redistribute heavy metals within giant planets, affect the core/envelope mass ratio, and help determine the ratio of emitted to absorbed energy within giant planets. Thus, the researchers propose to simulate the impact of a ~ Earth-mass object onto a proto-giant-planet with SPH. Results of the SPH collision models will be input into a steady-state planetary evolution code and the effect of impacts on formation timescales, core/envelope mass ratios, density profiles, and thermal emissions of giant planets will be quantified. The collision will be modelled using a modified version of an SPH routine which simulates the collision of two polytropes. The Saumon-Chabrier and Tillotson equations of state will replace the polytropic equation of state. The parallel tree algorithm of Olson & Packer will be used for the domain decomposition and neighbor search necessary to calculate pressure and self-gravity efficiently. This work is funded by the NASA Graduate Student Researchers Program.
Simulated Wake Characteristics Data for Closely Spaced Parallel Runway Operations Analysis
NASA Technical Reports Server (NTRS)
Guerreiro, Nelson M.; Neitzke, Kurt W.
2012-01-01
A simulation experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced parallel runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were simulated to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.
NASA Technical Reports Server (NTRS)
Nishikawa, K.-I.; Ganguli, G.; Lee, Y. C.; Palmadesso, P. J.
1989-01-01
A spatially two-dimensional electrostatic PIC simulation code was used to study the stability of a plasma equilibrium characterized by a localized transverse dc electric field and a field-aligned drift for L is much less than Lx, where Lx is the simulation length in the x direction and L is the scale length associated with the dc electric field. It is found that the dc electric field and the field-aligned current can together play a synergistic role to enable the excitation of electrostatic waves even when the threshold values of the field aligned drift and the E x B drift are individually subcritical. The simulation results show that the growing ion waves are associated with small vortices in the linear stage, which evolve to the nonlinear stage dominated by larger vortices with lower frequencies.
Direct spectral/hp element simulation of piloted jet non-premixed flames
NASA Astrophysics Data System (ADS)
Nastase, Cristian R.
2004-11-01
The spectral/hp element method is used for direct numerical simulation (DNS) of piloted non premixed methane jet flames. This method combines the accuracy of spectral methods with versatility of finite element methods, and allows accurate simulations of complex flows on structured and unstructured grids. Here, the methodology is extended for simulation of multi-species, reactive flows using the discontinuous Galerkin formulation. Parallel computation is performed via MPI standards coupled with a domain decomposition methodology. The overall computational scheme allows for an efficient partitioning of the flow configuration. Tests performed with up to 64 processors show quasi-linear parallel performance and scalability. The flame configurations are similar to the piloted jet non-premixed flame considered at the Combustion Research Facility at the Sandia National Laboratories. For a momentum dominated flame, the simulated results portray many of the features observed experimentally. This pertains to both the spatial and the compositional structures of the flow. For a buoyancy controlled flame (at elevated gravity levels), the results indicate an increase in both the turbulence levels and flow acceleration. Departure from equilibrium, including localized extinction is observed on a significant portion of this flame.
Parallel contact detection algorithm for transient solid dynamics simulations using PRONTO3D
Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.
1996-09-01
An efficient, scalable, parallel algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD parallel computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in parallel computation by devising a dynamic (adaptive) processor load balancing scheme.
Automated integration of genomic physical mapping data via parallel simulated annealing
Slezak, T.
1994-06-01
The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via simulated annealing performed on a network of 40+ Unix machines in parallel, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the parallel net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.
Serial and parallel Si, Ge, and SiGe direct-write with scanning probes and conducting stamps
Vasko, Stephanie E.; Kapetanovic, Adnan; Talla, Vamsi; Brasino, Michael D.; Zhu, Zihua; Scholl, Andreas; Torrey, Jessica D.; Rolandi, Marco
2011-05-16
Precise materials integration in nanostructures is fundamental for future electronic and photonic devices. We demonstrate Si, Ge, and SiGe nanostructure direct-write with deterministic size, geometry, and placement control. The biased probe of an atomic force microscope (AFM) reacts diphenylsilane or diphenylgermane to direct-write carbon-free Si, Ge, and SiGe nano and heterostructures. Parallel directwrite is available on large areas by substituting the AFM probe with conducting microstructured stamps. This facile strategy can be easily expanded to a broad variety of semiconductor materials through precursor selection.
A Comparative Analysis of Simulated and Direct Oral Proficiency Interviews.
ERIC Educational Resources Information Center
Stansfield, Charles W.
The simulated oral proficiency interview (SOPI) is a semi-direct speaking test that models the format of the oral proficiency interview (OPI). The OPI is a method of assessing general speaking proficiency in a second language. The SOPI is a tape-recorded test consisting of six parts: simple personal background questions posed in a simulated…
Hayward, Steven; Milner-White, E James
2011-11-01
α-sheet has been proposed to be the main constituent of the toxic amyloid intermediate. Molecular dynamics simulations on proteins known to be involved in amyloid diseases have demonstrated that β-sheet can, under certain conditions, spontaneously convert to α-sheet via ββ→α(R)α(L) peptide-plane flipping. Using torsion-angle driving to simulate this flip the transition has been investigated for parallel and antiparallel sheets. Concerted and sequential flipping processes were simulated, the former allowing direct calculation of helical parameters. For antiparallel sheet, the strands tend to splay apart during the transition. This can be understood by consideration of the geometry of repeating dipeptide conformations. At the end of the transition antiparallel α-sheet is slightly twisted, comprising gently curving strands. In parallel sheet, the strands maintain identical conformations and stay hydrogen bonded during the transition as they curl up to suggest a hitherto unseen structure, the multi-helix α-nanotube. Intriguingly, the α-nanotube has some of the characteristics of the parallel β-helix, a single-helix structure also implicated in amyloid. Unlike the β-helix, α-nanotube formation could involve identical strands aligning with each other in register as in most amyloids.
Guo, Hao; Tian, Yimei; Shen, Hailiang; Wang, Yi; Kang, Mengxin
2016-01-01
A design approach for determining the optimal flow pattern in a landscape lake is proposed based on FLUENT simulation, multiple objective optimization, and parallel computing. This paper formulates the design into a multi-objective optimization problem, with lake circulation effects and operation cost as two objectives, and solves the optimization problem with non-dominated sorting genetic algorithm II. The lake flow pattern is modelled in FLUENT. The parallelization aims at multiple FLUENT instance runs, which is different from the FLUENT internal parallel solver. This approach: (1) proposes lake flow pattern metrics, i.e. weighted average water flow velocity, water volume percentage of low flow velocity, and variance of flow velocity, (2) defines user defined functions for boundary setting, objective and constraints calculation, and (3) parallels the execution of multiple FLUENT instances runs to significantly reduce the optimization wall-clock time. The proposed approach is demonstrated through a case study for Meijiang Lake in Tianjin, China.
Guo, Hao; Tian, Yimei; Shen, Hailiang; Wang, Yi; Kang, Mengxin
2016-01-01
A design approach for determining the optimal flow pattern in a landscape lake is proposed based on FLUENT simulation, multiple objective optimization, and parallel computing. This paper formulates the design into a multi-objective optimization problem, with lake circulation effects and operation cost as two objectives, and solves the optimization problem with non-dominated sorting genetic algorithm II. The lake flow pattern is modelled in FLUENT. The parallelization aims at multiple FLUENT instance runs, which is different from the FLUENT internal parallel solver. This approach: (1) proposes lake flow pattern metrics, i.e. weighted average water flow velocity, water volume percentage of low flow velocity, and variance of flow velocity, (2) defines user defined functions for boundary setting, objective and constraints calculation, and (3) parallels the execution of multiple FLUENT instances runs to significantly reduce the optimization wall-clock time. The proposed approach is demonstrated through a case study for Meijiang Lake in Tianjin, China. PMID:27642835
Kadoya, Y.; Abe, H.
1988-04-01
A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied parallel to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The parallel electric field E/sub parallel/ perturbs the electron velocity v/sub parallel/ parallel to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product
Simulated Milky Way analogues: implications for dark matter direct searches
NASA Astrophysics Data System (ADS)
Bozorgnia, Nassim; Calore, Francesca; Schaller, Matthieu; Lovell, Mark; Bertone, Gianfranco; Frenk, Carlos S.; Crain, Robert A.; Navarro, Julio F.; Schaye, Joop; Theuns, Tom
2016-05-01
We study the implications of galaxy formation on dark matter direct detection using high resolution hydrodynamic simulations of Milky Way-like galaxies simulated within the EAGLE and APOSTLE projects. We identify Milky Way analogues that satisfy observational constraints on the Milky Way rotation curve and total stellar mass. We then extract the dark matter density and velocity distribution in the Solar neighbourhood for this set of Milky Way analogues, and use them to analyse the results of current direct detection experiments. For most Milky Way analogues, the event rates in direct detection experiments obtained from the best fit Maxwellian distribution (with peak speed of 223–289 km/s) are similar to those obtained directly from the simulations. As a consequence, the allowed regions and exclusion limits set by direct detection experiments in the dark matter mass and spin-independent cross section plane shift by a few GeV compared to the Standard Halo Model, at low dark matter masses. For each dark matter mass, the halo-to-halo variation of the local dark matter density results in an overall shift of the allowed regions and exclusion limits for the cross section. However, the compatibility of the possible hints for a dark matter signal from DAMA and CDMS-Si and null results from LUX and SuperCDMS is not improved.
Computer simulation in template-directed oligonucleotide synthesis
NASA Technical Reports Server (NTRS)
Kanavarioti, Anastassia; Benasconi, Claude F.
1990-01-01
It is commonly assumed that template-directed polymerizations have played a key role in prebiotic evolution. A computer simulation that models up to 33 competing reactions was used to investigate the product distribution in a template-directed oligonucleotide synthesis as a function of time and concentration of the reactants. The study focuses on the poly(C)-directed elongation reaction of oligoguanylates, and how it is affected by the competing processes of hydrolysis and dimerization of the activated monomer, which have the potential of severely curtailing the elongation and reducing the size and yield of the synthesized polymers. The simulations show that realistic and probably prebiotically plausible conditions can be found where hydrolysis and dimerization are either negligible or where a high degree of polymerization can be attained even in the face of substantial hydrolysis and/or dimerization.
Assessing Astrophysical Uncertainties in Direct Detection with Galaxy Simulations
NASA Astrophysics Data System (ADS)
Sloane, Jonathan D.; Buckley, Matthew R.; Brooks, Alyson M.; Governato, Fabio
2016-11-01
We study the local dark matter velocity distribution in simulated Milky Way-mass galaxies, generated at high resolution with both dark matter and baryons. We find that the dark matter in the solar neighborhood is influenced appreciably by the inclusion of baryons, increasing the speed of dark matter particles compared to dark matter-only simulations. The gravitational potential due to the presence of a baryonic disk increases the amount of high velocity dark matter, resulting in velocity distributions that are more similar to the Maxwellian Standard Halo Model than predicted from dark matter-only simulations. Furthermore, the velocity structures present in baryonic simulations possess a greater diversity than expected from dark matter-only simulations. We show that the impact on the direct detection experiments LUX, DAMA/Libra, and CoGeNT using our simulated velocity distributions, and explore how resolution and halo mass within the Milky Way’s estimated mass range impact the results. A Maxwellian fit to the velocity distribution tends to overpredict the amount of dark matter in the high velocity tail, even with baryons, and thus leads to overly optimistic direct detection bounds on models that are dependent on this region of phase space for an experimental signal. Our work further demonstrates that it is critical to transform simulated velocity distributions to the lab frame of reference, due to the fact that velocity structure in the solar neighborhood appears when baryons are included. There is more velocity structure present when baryons are included than in dark matter-only simulations. Even when baryons are included, the importance of the velocity structure is not as apparent in the Galactic frame of reference as in the Earth frame.
Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong
2010-10-01
Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies.
Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong
2010-10-01
Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. PMID:20674066
A Three-Dimensional Parallel Time-Accurate Turbopump Simulation Procedure Using Overset Grid System
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Chan, William; Kwak, Dochan
2002-01-01
The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete simulation of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate simulations with moving boundary capability are presented along with the performance of parallel versions of the code.
A Three Dimensional Parallel Time Accurate Turbopump Simulation Procedure Using Overset Grid Systems
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Chan, William; Kwak, Dochan
2001-01-01
The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete simulation of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate simulations with moving boundary capability will be presented along with the performance of parallel versions of the code.
Carter, Jonathan; Oliker, Leonid
2006-01-09
The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on such platforms has become major concern in high performance computing. The latest generation of custom-built parallel vector systems have the potential to address this concern for numerical algorithms with sufficient regularity in their computational structure. In this work, we explore two and three dimensional implementations of a lattice-Boltzmann magnetohydrodynamics (MHD) physics application, on some of today's most powerful supercomputing platforms. Results compare performance between the vector-based Cray X1, Earth Simulator, and newly-released NEC SX-8, with the commodity-based superscalar platforms of the IBM Power3, IntelItanium2, and AMD Opteron. Overall results show that the SX-8 attains unprecedented aggregate performance across our evaluated applications.
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.; Ko, K.; /SLAC
2009-06-19
Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) parallel higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale simulation of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell) approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.
Singhal, R.P.; Bhardwaj, A. )
1991-09-01
A Monte Carlo simulation of photoelectron energization and energy degradation in H{sub 2} gas in the presence of parallel electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.
[Simulation of TDLAS direct absorption based on HITRAN database].
Qi, Ru-birn; He, Shu-kai; Li, Xin-tian; Wang, Xian-zhong
2015-01-01
Simulating of the direct absorption TDLAS spectrum can help to comprehend the process of the absorbing and understand the influence on the absorption signal with each physical parameter. Firstly, the basic theory and algorithm of direct absorption TDLAS is studied and analyzed thoroughly, through giving the expressions and calculating steps of parameters based on Lambert-Beer's law, such as line intensity, absorption cross sections, concentration, line shape and gas total partition functions. The process of direct absorption TDLAS is simulated using MATLAB programs based on HITRAN spectra database, with which the absorptions under a certain temperature, pressure, concentration and other conditions were calculated, Water vapor is selected as the target gas, the absorptions of which under every line shapes were simulated. The results were compared with that of the commercial simulation software, Hitran-PC, which showed that, the deviation under Lorentz line shape is less than 0. 5%, and that under Gauss line shape is less than 2. 5%, while under Voigt line shape it is less than 1%. It verified that the algorithm and results of this work are correct and accurate. The absorption of H2O in v2 + v3 band under different pressure and temperature is also simulated. In low pressure range, the Doppler broadening dominant, so the line width changes little with varied.pressure, while the line peak increases with rising pressure. In high pressure range, the collision broadening dominant, so the line width changes wider with increasing pressure, while the line peak approaches to a constant value with rising pressure. And finally, the temperature correction curve in atmosphere detection is also given. The results of this work offer the reference and instruction for the application of TDLAS direct absorption. PMID:25993843
Evaluating the Accuracy of Hessian Approximations for Direct Dynamics Simulations.
Zhuang, Yu; Siebert, Matthew R; Hase, William L; Kay, Kenneth G; Ceotto, Michele
2013-01-01
Direct dynamics simulations are a very useful and general approach for studying the atomistic properties of complex chemical systems, since an electronic structure theory representation of a system's potential energy surface is possible without the need for fitting an analytic potential energy function. In this paper, recently introduced compact finite difference (CFD) schemes for approximating the Hessian [J. Chem. Phys.2010, 133, 074101] are tested by employing the monodromy matrix equations of motion. Several systems, including carbon dioxide and benzene, are simulated, using both analytic potential energy surfaces and on-the-fly direct dynamics. The results show, depending on the molecular system, that electronic structure theory Hessian direct dynamics can be accelerated up to 2 orders of magnitude. The CFD approximation is found to be robust enough to deal with chaotic motion, concomitant with floppy and stiff mode dynamics, Fermi resonances, and other kinds of molecular couplings. Finally, the CFD approximations allow parametrical tuning of different CFD parameters to attain the best possible accuracy for different molecular systems. Thus, a direct dynamics simulation requiring the Hessian at every integration step may be replaced with an approximate Hessian updating by tuning the appropriate accuracy. PMID:26589009
Direct numerical simulation of wall turbulent flows with microbubbles
NASA Astrophysics Data System (ADS)
Kanai, Akihiro; Miyata, Hideaki
2001-03-01
The marker-density-function (MDF) method has been developed to conduct direct numerical simulation (DNS) for bubbly flows. The method is applied to turbulent bubbly channel flows to elucidate the interaction between bubbles and wall turbulence. The simulation is designed to clarify the structure of the turbulent boundary layer containing microbubbles and the mechanism of frictional drag reduction. It is deduced from the numerical tests that the interaction between bubbles and wall turbulence depends on the Weber and Froude numbers. The reduction of the frictional resistance on the wall is attained and its mechanism is explained from the modulation of the three-dimensional structure of the turbulent flow. Copyright
Direct simulation of the turbulent boundary layer on a plate
NASA Astrophysics Data System (ADS)
Krupa, V. G.
2016-08-01
A numerical method for the integration of three-dimensional Navier-Stokes equations for compressible fluid as applied to direct numerical simulation is proposed. By way of example, the boundary layer on a plate is simulated. The computations were carried out for Reθ = 1500. The computational grid consisted of a half billion nodes. The flow region includes the laminar, transitional, and turbulent zones. The numerically obtained distributions of average velocity, friction, and pulsations are compared with experimental data and available numerical solutions.
NASA Astrophysics Data System (ADS)
Ma, H.; Fan, C.; Zhang, P.; Zhang, J.; Qiao, C.; Wang, H.
2012-03-01
An adaptive optics system utilizing a Shack-Hartmann wavefront sensor and a deformable mirror can successfully correct a distorted wavefront by the conjugation principle. However, if a wave propagates over such a path that scintillation is not negligible, the appearance of branch points makes least-squares reconstruction fail to estimate the wavefront effectively. An adaptive optics technique based on the stochastic parallel gradient descent (SPGD) control algorithm is an alternative approach which does not need wavefront information but optimizes the performance metric directly. Performance was evaluated by simulating a SPGD control system and conventional adaptive correction with least-squares reconstruction in the context of a laser beam projection system. We also examined the relative performance of coping with branch points by the SPGD technique through an example. All studies were carried out under the conditions of assuming the systems have noise-free measurements and infinite time control bandwidth. Results indicate that the SPGD adaptive system always performs better than the system based on the least-squares wavefront reconstruction technique in the presence of relatively serious intensity scintillations. The reason is that the SPGD adaptive system has the ability of compensating a discontinuous phase, although the phase is not detected and reconstructed.
NASA Astrophysics Data System (ADS)
Liakos, Anastasios; Malamataris, Nikolaos A.
2014-05-01
The topology and evolution of flow around a surface mounted cubical object in three dimensional channel flow is examined for low to moderate Reynolds numbers. Direct numerical simulations were performed via a home made parallel finite element code. The computational domain has been designed according to actual laboratory experiment conditions. Analysis of the results is performed using the three dimensional theory of separation. Our findings indicate that a tornado-like vortex by the side of the cube is present for all Reynolds numbers for which flow was simulated. A horseshoe vortex upstream from the cube was formed at Reynolds number approximately 1266. Pressure distributions are shown along with three dimensional images of the tornado-like vortex and the horseshoe vortex at selected Reynolds numbers. Finally, and in accordance to previous work, our results indicate that the upper limit for the Reynolds number for which steady state results are physically realizable is roughly 2000.
Direct numerical simulation of turbulent flow in a rotating square duct
Dai, Yi-Jun; Huang, Wei-Xi Xu, Chun-Xiao; Cui, Gui-Xiang
2015-06-15
A fully developed turbulent flow in a rotating straight square duct is simulated by direct numerical simulations at Re{sub τ} = 300 and 0 ≤ Ro{sub τ} ≤ 40. The rotating axis is parallel to two opposite walls of the duct and normal to the main flow. Variations of the turbulence statistics with the rotation rate are presented, and a comparison with the rotating turbulent channel flow is discussed. Rich secondary flow patterns in the cross section are observed by varying the rotation rate. The appearance of a pair of additional vortices above the pressure wall is carefully examined, and the underlying mechanism is explained according to the budget analysis of the mean momentum equations.
Massively Parallel Simulation of Uranium Migration at the Hanford 300 Area
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.
2009-12-01
Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to simulate uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to simulate geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to simulating radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration direction, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components simulated, 3D simulation of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.
Direct simulations of chemically reacting turbulent mixing layers, part 2
NASA Technical Reports Server (NTRS)
Metcalfe, Ralph W.; Mcmurtry, Patrick A.; Jou, Wen-Huei; Riley, James J.; Givi, Peyman
1988-01-01
The results of direct numerical simulations of chemically reacting turbulent mixing layers are presented. This is an extension of earlier work to a more detailed study of previous three dimensional simulations of cold reacting flows plus the development, validation, and use of codes to simulate chemically reacting shear layers with heat release. Additional analysis of earlier simulations showed good agreement with self similarity theory and laboratory data. Simulations with a two dimensional code including the effects of heat release showed that the rate of chemical product formation, the thickness of the mixing layer, and the amount of mass entrained into the layer all decrease with increasing rates of heat release. Subsequent three dimensional simulations showed similar behavior, in agreement with laboratory observations. Baroclinic torques and thermal expansion in the mixing layer were found to produce changes in the flame vortex structure that act to diffuse the pairing vortices, resulting in a net reduction in vorticity. Previously unexplained anomalies observed in the mean velocity profiles of reacting jets and mixing layers were shown to result from vorticity generation by baroclinic torques.
Direct simulations of chemically reacting turbulent mixing layers
NASA Technical Reports Server (NTRS)
Riley, J. J.; Metcalfe, R. W.
1984-01-01
The report presents the results of direct numerical simulations of chemically reacting turbulent mixing layers. The work consists of two parts: (1) the development and testing of a spectral numerical computer code that treats the diffusion reaction equations; and (2) the simulation of a series of cases of chemical reactions occurring on mixing layers. The reaction considered is a binary, irreversible reaction with no heat release. The reacting species are nonpremixed. The results of the numerical tests indicate that the high accuracy of the spectral methods observed for rigid body rotation are also obtained when diffusion, reaction, and more complex flows are considered. In the simulations, the effects of vortex rollup and smaller scale turbulence on the overall reaction rates are investigated. The simulation results are found to be in approximate agreement with similarity theory. Comparisons of simulation results with certain modeling hypotheses indicate limitations in these hypotheses. The nondimensional product thickness computed from the simulations is compared with laboratory values and is found to be in reasonable agreement, especially since there are no adjustable constants in the method.
Chaining direct memory access data transfer operations for compute nodes in a parallel computer
Archer, Charles J.; Blocksome, Michael A.
2010-09-28
Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.
Self-pacing direct memory access data transfer operations for compute nodes in a parallel computer
Blocksome, Michael A
2015-02-17
Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a parallel computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor.
A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations
NASA Astrophysics Data System (ADS)
Smith, Luke; Liang, Qiuhua
2015-04-01
Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed
Computer simulation of macrosegregation in directionally solidified circular ingots
NASA Technical Reports Server (NTRS)
Yeum, K. S.; Poirier, D. R.
1988-01-01
The formulation and employment of a computer code designed to simulate the directional solidification of lead-rich Pb-Sn alloys in the form of an ingot with a uniform and circular cross-section are described. The formulation is for steady-state solidification in which convection in the all-liquid zone is ignored. Particular attention was given to designing a code to simulate the effect of a subtle variation of temperature in the radial direction. This is important because a very small temperature difference between the center and the surface of the ingot (e.g., less than 0.5 C ) is enough to cause substantial convection within the mushy-zone when the solidification rate is approximately 0.001 to 0.0001 cm/s.
NASA Astrophysics Data System (ADS)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-01
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up
NASA Technical Reports Server (NTRS)
Hill, Gary; Duval, Ronald W.; Green, John A.; Huynh, Loc C.
1991-01-01
A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A simulation development and analysis tool, FLIGHTLAB, was used to implement these models in real time using parallel processing technology. Pilot comments and quantitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing simulation facilities can be upgraded using parallel processing, and the importance of these upgrades to simulation fidelity.
Procedure for Adapting Direct Simulation Monte Carlo Meshes
NASA Technical Reports Server (NTRS)
Woronowicz, Michael S.; Wilmoth, Richard G.; Carlson, Ann B.; Rault, Didier F. G.
1992-01-01
A technique is presented for adapting computational meshes used in the G2 version of the direct simulation Monte Carlo method. The physical ideas underlying the technique are discussed, and adaptation formulas are developed for use on solutions generated from an initial mesh. The effect of statistical scatter on adaptation is addressed, and results demonstrate the ability of this technique to achieve more accurate results without increasing necessary computational resources.
Direct teaching of thinking skills using clinical simulation.
Su, Whei Ming; Juestel, Marne J
2010-01-01
Because of the nursing faculty shortage, many academic leaders hire clinicians who are not formally prepared for an academic role. Novice faculty face an immediate need to develop teaching skills. One of the most crucial areas is the ability to help students develop critical thinking. To address this need, the authors describe how they used clinical simulation to incorporate direct teaching of thinking skills into course content and how this resulted in a faculty mentoring experience.
NASA Astrophysics Data System (ADS)
Vidal, David Jean-Emmanuel
Two different parallel lattice Boltzmann (LBM) algorithms have been devised for the simulation of flow through complex porous media. They are based on memory efficient LBM algorithms, namely the one-lattice and shift algorithms, combined with vector data structure, even fluid node vector partitioning domain decomposition and efficient data transfer layouts. The shift implementation also includes a single unit relaxation scheme that allows additional memory savings, but limits its validity to Newtonian fluids. They both provide high parallel performance by balancing the workload among the processors and reducing the amount of data that need to be transferred, and reduce significantly the memory usage as compared to previous parallel LBM codes presented in the literature. Theoretical parallel performance and memory usage models developed show that they also offer a good evolutivity and efficiencies as high as 79% for simulations made of several billions of fluid nodes on 128 processors are reported. The application of one of these algorithms for the simulation of flow through compressed packings made of highly polydisperse spheres has demonstrated the remarkable precision and efficiency of the algorithm proposed. As a result, a modified Carman-Kozeny correlation taking into account the compression level and the particle polydispersity has been formulated.
NASA Technical Reports Server (NTRS)
Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke
1989-01-01
Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Large-Scale High-Resolution Simulations of High Gain Direct-Drive ICF targets
NASA Astrophysics Data System (ADS)
Schmitt, Andrew J.
2003-10-01
High gain directly-driven targets have been designed using new concepts that mitigate the Richtmyer-Meshkov (RM) and Rayleigh-Taylor (RT) instabilities. Two-dimensional simulations of pellets using these techniques (e.g., "picket" laser pulses) show that high (>100) gain can survive in the face of the hydro instabilities seeded by laser and pellet imperfections. These new designs appear to be substantially more robust than earlier designs. We are using the highly-parallelized sliding-zone Eulerian FAST radiation hydrocode to study yield degradation in these designs. The special challenge in performing these simulations for direct-drive laser ICF is that both high resolution and large dynamic range are needed. High resolution of the whole target is needed to represent all of the scales important during the implosion. A large dynamic range is required to resolve the initially tiny surface and imprint perturbations that grow exponentially during acceleration. We find that the rapid growth of the shell perturbations during the acceleration phase is in good agreement with simple RT modeling before significant nonlinearity occurs. However, the Richtmyer-Meshkov growth during the early pellet compression phase poses a challenge particularly for multimode simulations because of the extremely small initial amplitude for each mode. We will present the results from large-scale pellet implosion simulations, and discuss the challenges and progress achieved in the numerical modeling of these high gain designs.
Final Report for 'ParSEC-Parallel Simulation of Electron Cooling"
David L Bruhwiler
2005-09-16
The Department of Energy has plans, during the next two or three years, to design an electron cooling section for the collider ring at RHIC (Relativistic Heavy Ion Collider) [1]. Located at Brookhaven National Laboratory (BNL), RHIC is the premier nuclear physics facility. The new cooling section would be part of a proposed luminosity upgrade [2] for RHIC. This electron cooling section will be different from previous electron cooling facilities in three fundamental ways. First, the electron energy will be 50 MeV, as opposed to 100's of keV (or 4 MeV for the electron cooling system now operating at Fermilab [3]). Second, both the electron beam and the ion beam will be bunched, rather than being essentially continuous. Third, the cooling will take place in a collider rather than in a storage ring. Analytical work, in combination with the use and further development of the semi-analytical codes BETACOOL [4,5] and SimCool [6,7] are being pursued at BNL [8] and at other laboratories around the world. However, there is a growing consensus in the field that high-fidelity 3-D particle simulations are required to fully understand the critical cooling physics issues in this new regime. Simulations of the friction coefficient, using the VORPAL code [9], for single gold ions passing once through the interaction region, have been compared with theoretical calculations [10,11], and the results have been presented in conference proceedings papers [8,12,13,14] and presentations [15,16,17]. Charged particles are advanced using a fourth-order Hermite predictor corrector algorithm [18]. The fields in the beam frame are obtained from direct calculation of Coulomb's law, which is more efficient than multipole-type algorithms for less than {approx} 10{sup 6} particles. Because the interaction time is so short, it is necessary to suppress the diffusive aspect of the ion dynamics through the careful use of positrons in the simulations, and to run 100's of simulations with the same
Tokamak profile prediction using direct gyrokinetic and neoclassical simulation
Candy, Jeff; Holland, Chris; Waltz, R. E.; Fahey, Mark R; Belli, E
2009-01-01
okamak transport modeling scenarios, including ITER ITER Physics Basis Editors, Nucl. Fusion 39, 2137 1999 performance predictions, are based exclusively on reduced models for core thermal and particle transport. The reason for this is simple: computational cost. A typical modeling scenario may require the evaluation of thousands of individual transport fluxes local transport models calculate the energy and particle fluxes across a specified flux surface given fixed profiles . Despite continuous advances in direct gyrokinetic simulation, the cost of an individual simulation remains so high that direct gyrokinetic transport calculations have been avoided. By developing a steady-state iteration scheme suitable for direct gyrokinetic and neoclassical simulations, we can now compute steady-state temperature profiles for DIII-D J. L. Luxon, Nucl. Fusion 42, 614 2002 plasmas given known plasma sources. The new code, TGYRO, encapsulates the GYRO J. Candy and R. E. Waltz, J. Comput. Phys. 186, 545 2003 code, for turbulent transport, and the NEO E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 50, 095010 2008 code, for kinetic neoclassical transport. Results for DIII-D L-mode discharge 128913 are given, with computational and experimental results consistent in the region 0 <= r/a <= 0.8.
Tokamak profile prediction using direct gyrokinetic and neoclassical simulation
Candy, J.; Waltz, R. E.; Belli, E.; Holland, C.; Fahey, M. R.
2009-06-15
Tokamak transport modeling scenarios, including ITER [ITER Physics Basis Editors, Nucl. Fusion 39, 2137 (1999)] performance predictions, are based exclusively on reduced models for core thermal and particle transport. The reason for this is simple: computational cost. A typical modeling scenario may require the evaluation of thousands of individual transport fluxes (local transport models calculate the energy and particle fluxes across a specified flux surface given fixed profiles). Despite continuous advances in direct gyrokinetic simulation, the cost of an individual simulation remains so high that direct gyrokinetic transport calculations have been avoided. By developing a steady-state iteration scheme suitable for direct gyrokinetic and neoclassical simulations, we can now compute steady-state temperature profiles for DIII-D [J. L. Luxon, Nucl. Fusion 42, 614 (2002)] plasmas given known plasma sources. The new code, TGYRO, encapsulates the GYRO[J. Candy and R. E. Waltz, J. Comput. Phys. 186, 545 (2003)] code, for turbulent transport, and the NEO[E. A. Belli and J. Candy, Plasma Phys. Controlled Fusion 50, 095010 (2008)] code, for kinetic neoclassical transport. Results for DIII-D L-mode discharge 128913 are given, with computational and experimental results consistent in the region 0{<=}r/a{<=}0.8.
NASA Astrophysics Data System (ADS)
Hara, Tatsuhiko
2004-08-01
We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.
3-D Direct Simulation Monte Carlo modeling of comet 67P/Churyumov-Gerasimenko
NASA Astrophysics Data System (ADS)
Liao, Y.; Su, C.; Finklenburg, S.; Rubin, M.; Ip, W.; Keller, H.; Knollenberg, J.; Kührt, E.; Lai, I.; Skorov, Y.; Thomas, N.; Wu, J.; Chen, Y.
2014-07-01
After deep-space hibernation, ESA's Rosetta spacecraft has been successfully woken up and obtained the first images of comet 67P /Churyumov-Gerasimenko (C-G) in March 2014. It is expected that Rosetta will rendezvous with comet 67P and start to observe the nucleus and coma of the comet in the middle of 2014. As the comet approaches the Sun, a significant increase in activity is expected. Our aim is to understand the physical processes in the coma with the help of modeling in order to interpret the resulting measurements and establish observational and data analysis strategies. DSMC (Direct Simulation Monte Carlo) [1] is a very powerful numerical method to study rarefied gas flows such as cometary comae and has been used by several authors over the past decade to study cometary outflow [2,3]. Comparisons between DSMC and fluid techniques have also been performed to establish the limits of these techniques [2,4]. The drawback with 3D DSMC is that it is computationally highly intensive and thus time consuming. However, the performance can be dramatically increased with parallel computing on Graphic Processor Units (GPUs) [5]. We have already studied a case with comet 9P/Tempel 1 where the Deep Impact observations were used to define the shape of the nucleus and the outflow was simulated with the DSMC approach [6,7]. For comet 67P, we intend to determine the gas flow field in the innermost coma and the surface outgassing properties from analyses of the flow field, to investigate dust acceleration by gas drag, and to compare with observations (including time variability). The boundary conditions are implemented with a nucleus shape model [8] and thermal models which are based on the surface heat-balance equation. Several different parameter sets have been investigated. The calculations have been performed using the PDSC^{++} (Parallel Direct Simulation Monte Carlo) code [9] developed by Wu and his coworkers [10-12]. Simulation tasks can be accomplished within 24
Direct simulation of hypersonic flows over blunt slender bodies
NASA Technical Reports Server (NTRS)
Moss, J. N.; Cuda, V., Jr.
1986-01-01
Results of a numerical study of low-density hypersonic flow about cylindrically blunted wedges and spherically blunted cones with body half angles of 0, 5, and 10 deg are presented. Most of the transitional flow regime encountered during entry between the free molecule and continuum regimes is simulated for a reentry velocity of 7.5 km/s by including freestream conditions of 70 to 100 km. The bodies are at zero angle of incidence and have diffuse and finite catalytic surfaces. Translational, thermodynamic, and chemical nonequilibrium effects are considered in the numerical simulation by utilizing the direct simulation Monte Carlo (DSMC) method. The numerical simulations show that noncontinuum effects such as surface temperature jump, and velocity slip are evident for all cases considered. The onset of chemical dissociation occurs at a simulated altitude of 96 km for the two-dimensional configurations. Comparisons between the DSMC and continuum viscous shock-layer calculations highlight the significant difference in flowfield structure predicted by the two methods.
Bylaska, Eric J; Weare, Jonathan Q; Weare, John H
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using a variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts
Oxygen transport properties estimation by classical trajectory–direct simulation Monte Carlo
Bruno, Domenico; Frezzotti, Aldo Ghiroldi, Gian Pietro
2015-05-15
Coupling direct simulation Monte Carlo (DSMC) simulations with classical trajectory calculations is a powerful tool to improve predictive capabilities of computational dilute gas dynamics. The considerable increase in computational effort outlined in early applications of the method can be compensated by running simulations on massively parallel computers. In particular, Graphics Processing Unit acceleration has been found quite effective in reducing computing time of classical trajectory (CT)-DSMC simulations. The aim of the present work is to study dilute molecular oxygen flows by modeling binary collisions, in the rigid rotor approximation, through an accurate Potential Energy Surface (PES), obtained by molecular beams scattering. The PES accuracy is assessed by calculating molecular oxygen transport properties by different equilibrium and non-equilibrium CT-DSMC based simulations that provide close values of the transport properties. Comparisons with available experimental data are presented and discussed in the temperature range 300–900 K, where vibrational degrees of freedom are expected to play a limited (but not always negligible) role.
Oxygen transport properties estimation by classical trajectory-direct simulation Monte Carlo
NASA Astrophysics Data System (ADS)
Bruno, Domenico; Frezzotti, Aldo; Ghiroldi, Gian Pietro
2015-05-01
Coupling direct simulation Monte Carlo (DSMC) simulations with classical trajectory calculations is a powerful tool to improve predictive capabilities of computational dilute gas dynamics. The considerable increase in computational effort outlined in early applications of the method can be compensated by running simulations on massively parallel computers. In particular, Graphics Processing Unit acceleration has been found quite effective in reducing computing time of classical trajectory (CT)-DSMC simulations. The aim of the present work is to study dilute molecular oxygen flows by modeling binary collisions, in the rigid rotor approximation, through an accurate Potential Energy Surface (PES), obtained by molecular beams scattering. The PES accuracy is assessed by calculating molecular oxygen transport properties by different equilibrium and non-equilibrium CT-DSMC based simulations that provide close values of the transport properties. Comparisons with available experimental data are presented and discussed in the temperature range 300-900 K, where vibrational degrees of freedom are expected to play a limited (but not always negligible) role.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-01-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
NASA Astrophysics Data System (ADS)
Schroeder, Matthias; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim
2014-05-01
Thousands of numerical tsunami simulations allow the computation of inundation and run-up along the coast for vulnerable areas over the time. A so-called Matching Scenario Database (MSDB) [1] contains this large number of simulations in text file format. In order to visualize these wave propagations the scenarios have to be reprocessed automatically. In the TRIDEC project funded by the seventh Framework Programme of the European Union a Virtual Scenario Database (VSDB) and a Matching Scenario Database (MSDB) were established amongst others by the working group of the University of Bologna (UniBo) [1]. One part of TRIDEC was the developing of a new generation of a Decision Support System (DSS) for tsunami Early Warning Systems (TEWS) [2]. A working group of the GFZ German Research Centre for Geosciences was responsible for developing the Command and Control User Interface (CCUI) as central software application which support operator activities, incident management and message disseminations. For the integration and visualization in the CCUI, the numerical tsunami simulations from MSDB must be converted into the shapefiles format. The usage of shapefiles enables a much easier integration into standard Geographic Information Systems (GIS). Since also the CCUI is based on two widely used open source products (GeoTools library and uDig), whereby the integration of shapefiles is provided by these libraries a priori. In this case, for an example area around the Western Iberian margin several thousand tsunami variations were processed. Due to the mass of data only a program-controlled process was conceivable. In order to optimize the computing efforts and operating time the use of an existing GFZ High Performance Computing Cluster (HPC) had been chosen. Thus, a geospatial software was sought after that is capable for parallel processing. The FOSS tool Geospatial Data Abstraction Library (GDAL/OGR) was used to match the coordinates with the wave heights and generates the
NASA Astrophysics Data System (ADS)
Tian, Shuling; Wu, Yizhao; Xia, Jian
A parallel Navier-Stokes solver based on dynamic overset unstructured grids method is presented to simulate the unsteady turbulent flow field around helicopter in forward flight. The grid method has the advantages of unstructured grid and Chimera grid and is suitable to deal with multiple bodies in relatively moving. Unsteady Navier-Stokes equations are solved on overset unstructured grids by an explicit dual time-stepping, finite volume method. Preconditioning method applied to inner iteration of the dual-time stepping is used to speed up the convergence of numerical simulation. The Spalart-Allmaras one-equation turbulence model is used to evaluate the turbulent viscosity. Parallel computation is based on the dynamic domain decomposition method in overset unstructured grids system at each physical time step. A generic helicopter Robin with a four-blade rotor in forward flight is considered to validate the method presented in this paper. Numerical simulation results show that the parallel dynamic overset unstructured grids method is very efficient for the simulation of helicopter flow field and the results are reliable.
NASA Astrophysics Data System (ADS)
Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.
2016-01-01
We have developed an algorithm, which we call HexMT, for 3-D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permit incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used throughout, including the forward solution, parameter Jacobians and model parameter update. In Part I, the forward simulator and Jacobian calculations are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequencies or small material admittivities, the E-field requires divergence correction. With the help of Hodge decomposition, the correction may be applied in one step after the forward solution is calculated. This allows accurate E-field solutions in dielectric air. The system matrix factorization and source vector solutions are computed using the MKL PARDISO library, which shows good scalability through 24 processor cores. The factorized matrix is used to calculate the forward response as well as the Jacobians of electromagnetic (EM) field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure, several synthetic topographic models and the natural topography of Mount Erebus in Antarctica. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of EM waves normal to the slopes at high frequencies. Run-time tests of the parallelized algorithm indicate that for meshes as large as 176 × 176 × 70 elements, MT forward responses and Jacobians can be calculated in ˜1.5 hr per frequency. Together with an efficient inversion parameter step described in Part II, MT inversion problems of 200-300 stations are computable with total run times
Terascale direct numerical simulations of turbulent combustion using S3D
NASA Astrophysics Data System (ADS)
Chen, J. H.; Choudhary, A.; de Supinski, B.; DeVries, M.; Hawkes, E. R.; Klasky, S.; Liao, W. K.; Ma, K. L.; Mellor-Crummey, J.; Podhorszki, N.; Sankaran, R.; Shende, S.; Yoo, C. S.
2009-01-01
Computational science is paramount to the understanding of underlying processes in internal combustion engines of the future that will utilize non-petroleum-based alternative fuels, including carbon-neutral biofuels, and burn in new combustion regimes that will attain high efficiency while minimizing emissions of particulates and nitrogen oxides. Next-generation engines will likely operate at higher pressures, with greater amounts of dilution and utilize alternative fuels that exhibit a wide range of chemical and physical properties. Therefore, there is a significant role for high-fidelity simulations, direct numerical simulations (DNS), specifically designed to capture key turbulence-chemistry interactions in these relatively uncharted combustion regimes, and in particular, that can discriminate the effects of differences in fuel properties. In DNS, all of the relevant turbulence and flame scales are resolved numerically using high-order accurate numerical algorithms. As a consequence terascale DNS are computationally intensive, require massive amounts of computing power and generate tens of terabytes of data. Recent results from terascale DNS of turbulent flames are presented here, illustrating its role in elucidating flame stabilization mechanisms in a lifted turbulent hydrogen/air jet flame in a hot air coflow, and the flame structure of a fuel-lean turbulent premixed jet flame. Computing at this scale requires close collaborations between computer and combustion scientists to provide optimized scaleable algorithms and software for terascale simulations, efficient collective parallel I/O, tools for volume visualization of multiscale, multivariate data and automating the combustion workflow. The enabling computer science, applied to combustion science, is also required in many other terascale physics and engineering simulations. In particular, performance monitoring is used to identify the performance of key kernels in the DNS code, S3D and especially memory
Terascale direct numerical simulations of turbulent combustion using S3D.
Sankaran, Ramanan; Mellor-Crummy, J.; DeVries, M.; Yoo, Chun Sang; Ma, K. L.; Podhorski, N.; Liao, W. K.; Klasky, S.; de Supinski, B.; Choudhary, A.; Hawkes, Evatt R.; Chen, Jacqueline H.; Shende, Sameer
2008-08-01
Computational science is paramount to the understanding of underlying processes in internal combustion engines of the future that will utilize non-petroleum-based alternative fuels, including carbon-neutral biofuels, and burn in new combustion regimes that will attain high efficiency while minimizing emissions of particulates and nitrogen oxides. Next-generation engines will likely operate at higher pressures, with greater amounts of dilution and utilize alternative fuels that exhibit a wide range of chemical and physical properties. Therefore, there is a significant role for high-fidelity simulations, direct numerical simulations (DNS), specifically designed to capture key turbulence-chemistry interactions in these relatively uncharted combustion regimes, and in particular, that can discriminate the effects of differences in fuel properties. In DNS, all of the relevant turbulence and flame scales are resolved numerically using high-order accurate numerical algorithms. As a consequence terascale DNS are computationally intensive, require massive amounts of computing power and generate tens of terabytes of data. Recent results from terascale DNS of turbulent flames are presented here, illustrating its role in elucidating flame stabilization mechanisms in a lifted turbulent hydrogen/air jet flame in a hot air co-flow, and the flame structure of a fuel-lean turbulent premixed jet flame. Computing at this scale requires close collaborations between computer and combustion scientists to provide optimized scaleable algorithms and software for terascale simulations, efficient collective parallel I/O, tools for volume visualization of multiscale, multivariate data and automating the combustion workflow. The enabling computer science, applied to combustion science, is also required in many other terascale physics and engineering simulations. In particular, performance monitoring is used to identify the performance of key kernels in the DNS code, S3D and especially memory
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T; Hammond, Glenn; Mahinthakumar, Kumar
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.
The shape of the invisible halo: N-body simulations on parallel supercomputers
Warren, M.S.; Zurek, W.H. ); Quinn, P.J. . Mount Stromlo and Siding Spring Observatories); Salmon, J.K. )
1990-01-01
We study the shapes of halos and the relationship to their angular momentum content by means of N-body (N {approximately} 10{sup 6}) simulations. Results indicate that in relaxed halos with no apparent substructure: (i) the shape and orientation of the isodensity contours tends to persist throughout the virialised portion of the halo; (ii) most ({approx}70%) of the halos are prolate; (iii) the approximate direction of the angular momentum vector tends to persist throughout the halo; (iv) for spherical shells centered on the core of the halo the magnitude of the specific angular momentum is approximately proportional to their radius; (v) the shortest axis of the ellipsoid which approximates the shape of the halo tends to align with the rotation axis of the halo. This tendency is strongest in the fastest rotating halos. 13 refs., 4 figs.
NASA Technical Reports Server (NTRS)
Gutmann, R. J.; Borrego, J. M.
1978-01-01
Rectenna conversion efficiencies (RF to dc) approximating 85 percent were demonstrated on a small scale, clearly indicating the feasibility and potential of efficiency of microwave power to dc. The overall cost estimates of the solar power satellite indicate that the baseline rectenna subsystem will be between 25 to 40 percent of the system cost. The directional receiving elements and element extensions were studied, along with power combining evaluation and evaluation extensions.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
Abreu Velez, Ana Maria; Upegui Zapata, Yulieth Alexandra; Howard, Michael S
2016-01-01
Background: In many countries and laboratories, techniques such as direct immunofluorescence (DIF) are not available for the diagnosis of skin diseases. Thus, these laboratories are limited in the full diagnoses of autoimmune skin diseases, vasculitis, and rheumatologic diseases. In our experience with these diseases and the patient's skin biopsies, we have noted a positive correlation between periodic acid-Schiff (PAS) staining and immunofluorescence patterns; however, these were just empiric observations. In the current study, we aim to confirm these observations, given the concept that the majority of autoantibodies are glycoproteins and should thus be recognized by PAS staining. Aims: To compare direct immunofluorescent and PAS staining, in multiple autoimmune diseases that are known to exhibit specific direct immunofluorescent patterns. Materials and Methods: We studied multiple autoimmune skin diseases: Five cases of bullous pemphigoid, five cases of pemphigus vulgaris, ten cases of cutaneous lupus, ten cases of autoimmune vasculitis, ten cases of lichen planus (LP), and five cases of cutaneous drug reactions (including one case of erythema multiforme). In addition, we utilized 45 normal skin control specimens from plastic surgery reductions. Results: We found a 98% positive correlation between DIF and PAS staining patterns over all the disease samples. Conclusion: We recommend that laboratories without access to DIF always perform PAS staining in addition to hematoxylin and eosin (H&E) staining, for a review of the reactivity pattern. PMID:27114972
Direct numerical simulation of sharkskin denticles in turbulent channel flow
NASA Astrophysics Data System (ADS)
Boomsma, A.; Sotiropoulos, F.
2016-03-01
The hydrodynamic function of sharkskin has been under investigation for the past 30 years. Current literature conflicts on whether sharkskin is able to reduce skin friction similar to riblets. To contribute insights toward reconciling these conflicting views, direct numerical simulations are carried out to obtain detailed flow fields around realistic denticles. A sharp interface immersed boundary method is employed to simulate two arrangements of actual sharkskin denticles (from Isurus oxyrinchus) in a turbulent boundary layer at Reτ ≈ 180. For comparison, turbulent flow over drag-reducing scalloped riblets is also simulated with similar flow conditions and with the same numerical method. Although the denticles resemble riblets, both sharkskin arrangements increase total drag by 44%-50%, while the riblets reduce drag by 5%. Analysis of the simulated flow fields shows that the turbulent flow around denticles is highly three-dimensional and separated, with 25% of the total drag being form drag. The complex three-dimensional shape of the denticles gives rise to a mean flow dominated by strong secondary flows in sharp contrast with the mean flow generated by riblets, which is largely two-dimensional. The so resulting three-dimensionality of sharkskin flows leads to an increase in the magnitude of the turbulent statistics near the denticles, which further contributes to increasing the total drag. The simulations also show that, at least for the simulated arrangements, sharkskin, in sharp contrast with drag-reducing riblets, is unable to isolate high shear stress near denticle ridges causing a significant portion of the denticle surface to be exposed to high mean shear.
Direct numerical simulation of a turbulent premixed flame
Hasegawa, Tatsuya; Morifuji, Tetsuya; Borghi, R.
1999-07-01
Direct numerical simulation of a stationary turbulent premixed flame with a single-step irreversible Arrhenius-type reaction is performed in order to understand detail physics of turbulent premixed flames and to evaluate modeling of turbulent premixed flame. The 6th-order central finite difference method is used in the streamwise direction with non-periodic boundaries, giving enough grid points in the domain to assure reasonable accuracy. The pseudo spectral method is used for transversal directions with periodic boundaries. The results obtained by their preliminary simulation is presented here with the initial turbulent intensity of u{prime}{sub 0}/u{sub L} = 4.8, the initial integral scale of l{sub t0}/{delta} = 8, and the density ratio of {rho}{sub u}/{rho}{sub b} = 7.53. The obtained flame is a developing wrinkled flame. Mean temperature, mean mass fraction, mean reaction rate and mean velocity components in space show a thickened flame region where the reaction rate appears at all points. Turbulent kinetic energy decays along the stream, but it increases somewhat in the flame region due to the increase of streamwise component of velocity fluctuation. The energy spectra in front of the flame region and behind it show that small scale decays by combustion and that the microscale and the Kolmogorov scale increase several times behind the flame region. Local structure of the turbulent flame is then analyzed.
Direct numerical simulation of turbulent channel flow with permeable walls
NASA Astrophysics Data System (ADS)
Hahn, Seonghyeon; Je, Jongdoo; Choi, Haecheon
2002-01-01
The main objectives of this study are to suggest a proper boundary condition at the interface between a permeable block and turbulent channel flow and to investigate the characteristics of turbulent channel flow with permeable walls. The boundary condition suggested is an extended version of that applied to laminar channel flow by Beavers & Joseph (1967) and describes the behaviour of slip velocities in the streamwise and spanwise directions at the interface between the permeable block and turbulent channel flow. With the proposed boundary condition, direct numerical simulations of turbulent channel flow that is bounded by the permeable wall are performed and significant skin-friction reductions at the permeable wall are obtained with modification of overall flow structures. The viscous sublayer thickness is decreased and the near-wall vortical structures are significantly weakened by the permeable wall. The permeable wall also reduces the turbulence intensities, Reynolds shear stress, and pressure and vorticity fluctuations throughout the channel except very near the wall. The increase of some turbulence quantities there is due to the slip-velocity fluctuations at the wall. The boundary condition proposed for the permeable wall is validated by comparing solutions with those obtained from a separate direct numerical simulation using both the Brinkman equation for the interior of a permeable block and the Navier Stokes equation for the main channel bounded by a permeable block.
Direct numerical simulation of nonpremixed flame-wall interactions
Wang, Yi; Trouve, Arnaud
2006-02-01
The objective of the present study is to use detailed numerical modeling to obtain basic information on the interaction of nonpremixed flames with cold wall surfaces. The questions of turbulent fuel-air-temperature mixing, flame extinction, and wall-surface heat transfer are studied using direct numerical simulation (DNS). The DNS configuration corresponds to an ethylene-air diffusion flame stabilized in the near-wall region of a chemically inert solid surface. Simulations are performed with adiabatic or isothermal wall boundary conditions and with different turbulence intensities. The simulations feature flame extinction events resulting from excessive wall cooling and convective heat transfer rates up to 90 kW/m{sup 2}. The structure of the simulated wall flames is studied in terms of a classical mass-mixing variable, the fuel-air based mixture fraction, and a less familiar heat loss variable, the excess enthalpy variable, introduced to provide a measure of nonadiabatic behavior due to wall cooling. In addition to the flame structure, extinction events are also studied in detail and a modified flame extinction criterion that combines the concepts of mixture fraction and excess enthalpy is proposed and then tested against the DNS data. (author)
Autoignition of hydrogen and air using direct numerical simulation
NASA Astrophysics Data System (ADS)
Doom, Jeffrey; Mahesh, Krishnan
2008-11-01
Direct numerical simulation (DNS) is used to study to auto--ignition in laminar vortex rings and turbulent diffusion flames. A novel, all--Mach number algorithm developed by Doom et al (J. Comput. Phys. 2007) is used. The chemical mechanism is a nine species, nineteen reaction mechanism for H2 and Air from Mueller at el (Int. J. Chem. Kinet. 1999). The vortex ring simulations inject diluted H2 at ambient temperature into hot air, and study the effects of stroke ratio, air to fuel ratio and Lewis number. At smaller stroke ratios, ignition occurs in the wake of the vortex ring and propagates into the vortex core. At larger stroke ratios, ignition occurs along the edges of the trailing column before propagating towards the vortex core. The turbulent diffusion flame simulations are three--dimensional and consider the interaction of initially isotropic turbulence with an unstrained diffusion flame. The simulations examine the nature of distinct ignition kernels, the relative roles of chemical reactions, and the relation between the observed behavior and laminar flames and the perfectly stirred reactor problem. These results will be discussed.
NASA Astrophysics Data System (ADS)
Pfund, R. E. W.; Lichters, R.; Meyer-ter-Vehn, J.
1998-02-01
We report on a recently developed electromagnetic relativistic 1D3V (one spatial, three velocity dimensions) Particle-In-Cell code for simulating laser-plasma interaction at normal and oblique incidence. The code is written in C++ and easy to extend. The data structure is characterized by the use of chained lists for the grid cells as well as particles belonging to one cell. The parallel version of the code is based on PVM. It splits the grid into several spatial domains each belonging to one processor. Since particles can cross boundaries of cells as well as domains, the processor loads will generally change in time. This is counteracted by adjusting the domain sizes dynamically, for which the use of chained lists has proven to be very convenient. Moreover, an option for restarting the simulation from intermediate stages of the time evolution has been implemented even in the parallel version. The code will be published and distributed freely.
Direct numerical simulation of double-diffusive gravity currents
NASA Astrophysics Data System (ADS)
Penney, Jared; Stastna, Marek
2016-08-01
This paper presents three-dimensional direct numerical simulations of laboratory-scale double-diffusive gravity currents. Flow is governed by the incompressible Navier-Stokes equations under the Boussinesq approximation, with salinity and temperature coupled to the equations of motion using a nonlinear approximation to the UNESCO equation of state. The effects of vertical boundary conditions and current volume are examined, with focus on flow pattern development, current propagation speed, three-dimensionalization, dissipation, and stirring and mixing. It was observed that no-slip boundaries cause the gravity current head to take the standard lobe-and-cleft shape and encourage both a greater degree and an earlier onset of three-dimensionalization when compared to what occurs in the case of a free-slip boundary. Additionally, numerical simulations with no-slip boundary conditions experience greater viscous dissipation, stirring, and mixing when compared to similar configurations using free-slip conditions.
Direct simulation Monte Carlo method with a focal mechanism algorithm
NASA Astrophysics Data System (ADS)
Rachman, Asep Nur; Chung, Tae Woong; Yoshimoto, Kazuo; Yun, Sukyoung
2015-01-01
To simulate the observation of the radiation pattern of an earthquake, the direct simulation Monte Carlo (DSMC) method is modified by implanting a focal mechanism algorithm. We compare the results of the modified DSMC method (DSMC-2) with those of the original DSMC method (DSMC-1). DSMC-2 shows more or similarly reliable results compared to those of DSMC-1, for events with 12 or more recorded stations, by weighting twice for hypocentral distance of less than 80 km. Not only the number of stations, but also other factors such as rough topography, magnitude of event, and the analysis method influence the reliability of DSMC-2. The most reliable result by DSMC-2 is obtained by the best azimuthal coverage by the largest number of stations. The DSMC-2 method requires shorter time steps and a larger number of particles than those of DSMC-1 to capture a sufficient number of arrived particles in the small-sized receiver.
Large eddy simulations and direct numerical simulations of high speed turbulent reacting flows
NASA Technical Reports Server (NTRS)
Givi, Peyman; Madnia, Cyrus K.; Steinberger, Craig J.
1990-01-01
This research is involved with the implementation of advanced computational schemes based on large eddy simulations (LES) and direct numerical simulations (DNS) to study the phenomenon of mixing and its coupling with chemical reactions in compressible turbulent flows. In the efforts related to LES, a research program to extend the present capabilities of this method was initiated for the treatment of chemically reacting flows. In the DNS efforts, the focus is on detailed investigations of the effects of compressibility, heat release, and non-equilibrium kinetics modelings in high speed reacting flows. Emphasis was on the simulations of simple flows, namely homogeneous compressible flows, and temporally developing high speed mixing layers.
Ganesan, Narayan; Li, Jie; Sharma, Vishakha; Jiang, Hanyu; Compagnoni, Adriana
2016-01-01
Biological systems encompass complexity that far surpasses many artificial systems. Modeling and simulation of large and complex biochemical pathways is a computationally intensive challenge. Traditional tools, such as ordinary differential equations, partial differential equations, stochastic master equations, and Gillespie type methods, are all limited either by their modeling fidelity or computational efficiency or both. In this work, we present a scalable computational framework based on modeling biochemical reactions in explicit 3D space, that is suitable for studying the behavior of large and complex biological pathways. The framework is designed to exploit parallelism and scalability offered by commodity massively parallel processors such as the graphics processing units (GPUs) and other parallel computing platforms. The reaction modeling in 3D space is aimed at enhancing the realism of the model compared to traditional modeling tools and framework. We introduce the Parallel Select algorithm that is key to breaking the sequential bottleneck limiting the performance of most other tools designed to study biochemical interactions. The algorithm is designed to be computationally tractable, handle hundreds of interacting chemical species and millions of independent agents by considering all-particle interactions within the system. We also present an implementation of the framework on the popular graphics processing units and apply it to the simulation study of JAK-STAT Signal Transduction Pathway. The computational framework will offer a deeper insight into various biological processes within the cell and help us observe key events as they unfold in space and time. This will advance the current state-of-the-art in simulation study of large scale biological systems and also enable the realistic simulation study of macro-biological cultures, where inter-cellular interactions are prevalent.
NASA Astrophysics Data System (ADS)
Blake, Douglas Clifton
A new methodology is presented for conducting numerical simulations of electromagnetic scattering and wave-propagation phenomena on massively parallel computing platforms. A process is constructed which is rooted in the Finite-Volume Time-Domain (FVTD) technique to create a simulation capability that is both versatile and practical. In terms of versatility, the method is platform independent, is easily modifiable, and is capable of solving a large number of problems with no alterations. In terms of practicality, the method is sophisticated enough to solve problems of engineering significance and is not limited to mere academic exercises. In order to achieve this capability, techniques are integrated from several scientific disciplines including computational fluid dynamics, computational electromagnetics, and parallel computing. The end result is the first FVTD solver capable of utilizing the highly flexible overset-gridding process in a distributed-memory computing environment. In the process of creating this capability, work is accomplished to conduct the first study designed to quantify the effects of domain-decomposition dimensionality on the parallel performance of hyperbolic partial differential equations solvers; to develop a new method of partitioning a computational domain comprised of overset grids; and to provide the first detailed assessment of the applicability of overset grids to the field of computational electromagnetics. Using these new methods and capabilities, results from a large number of wave propagation and scattering simulations are presented. The overset-grid FVTD algorithm is demonstrated to produce results of comparable accuracy to single-grid simulations while simultaneously shortening the grid-generation process and increasing the flexibility and utility of the FVTD technique. Furthermore, the new domain-decomposition approaches developed for overset grids are shown to be capable of producing partitions that are better load balanced and
Direct magnetocaloric characterization and simulation of thermomagnetic cycles.
Porcari, G; Buzzi, M; Cugini, F; Pellicelli, R; Pernechele, C; Caron, L; Brück, E; Solzi, M
2013-07-01
An experimental setup for the direct measurement of the magnetocaloric effect capable of simulating high frequency magnetothermal cycles on laboratory-scale samples is described. The study of the magnetocaloric properties of working materials under operative conditions is fundamental for the development of innovative devices. Frequency and time dependent characterization can provide essential information on intrinsic features such as magnetic field induced fatigue in materials undergoing first order magnetic phase transitions. A full characterization of the adiabatic temperature change performed for a sample of Gadolinium across its Curie transition shows the good agreement between our results and literature data and in-field differential scanning calorimetry. PMID:23902084
Application of Direct Simulation Monte Carlo to Satellite Contamination Studies
NASA Technical Reports Server (NTRS)
Rault, Didier F. G.; Woronwicz, Michael S.
1995-01-01
A novel method is presented to estimate contaminant levels around spacecraft and satellites of arbitrarily complex geometry. The method uses a three-dimensional direct simulation Monte Carlo algorithm to characterize the contaminant cloud surrounding the space platform, and a computer-assisted design preprocessor to define the space-platform geometry. The method is applied to the Upper Atmosphere Research Satellite to estimate the contaminant flux incident on the optics of the halogen occultation experiment (HALOE) telescope. Results are presented in terms of contaminant cloud structure, molecular velocity distribution at HALOE aperture, and code performance.
Direct simulations of turbulent flow using finite-difference schemes
NASA Technical Reports Server (NTRS)
Rai, Man Mohan; Moin, Parviz
1989-01-01
A high-order accurate finite-difference approach is presented for calculating incompressible turbulent flow. The methods used include a kinetic energy conserving central difference scheme and an upwind difference scheme. The methods are evaluated in test cases for the evolution of small-amplitude disturbances and fully developed turbulent channel flow. It is suggested that the finite-difference approach can be applied to complex geometries more easilty than highly accurate spectral methods. It is concluded that the upwind scheme is a good candidate for direct simulations of turbulent flows over complex geometries.
Direct magnetocaloric characterization and simulation of thermomagnetic cycles
NASA Astrophysics Data System (ADS)
Porcari, G.; Buzzi, M.; Cugini, F.; Pellicelli, R.; Pernechele, C.; Caron, L.; Brück, E.; Solzi, M.
2013-07-01
An experimental setup for the direct measurement of the magnetocaloric effect capable of simulating high frequency magnetothermal cycles on laboratory-scale samples is described. The study of the magnetocaloric properties of working materials under operative conditions is fundamental for the development of innovative devices. Frequency and time dependent characterization can provide essential information on intrinsic features such as magnetic field induced fatigue in materials undergoing first order magnetic phase transitions. A full characterization of the adiabatic temperature change performed for a sample of Gadolinium across its Curie transition shows the good agreement between our results and literature data and in-field differential scanning calorimetry.
Lu, Yujie; Chatziioannou, Arion F.
2009-01-01
Whole-body optical molecular imaging of mouse models in preclinical research is rapidly developing in recent years. In this context, it is essential and necessary to develop novel simulation methods of light propagation for optical imaging, especially when a priori knowledge, large-volume domain and a wide-range of optical properties need to be considered in the reconstruction algorithm. In this paper, we propose a three dimensional parallel adaptive finite element method with simplified spherical harmonics (SPN) approximation to simulate optical photon propagation in large-volumes of heterogenous tissues. The simulation speed is significantly improved by a posteriori parallel adaptive mesh refinement and dynamic mesh repartitioning. Compared with the diffusion equation and the Monte Carlo methods, the SPN method shows improved performance and the necessity of high-order approximation in heterogeneous domains. Optimal solver selection and time-costing analysis in real mouse geometry further improve the performance of the proposed algorithm and show the superiority of the proposed parallel adaptive framework for whole-body optical molecular imaging in murine models. PMID:20052300