NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Weening, J.S.
1988-05-01
CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.
Xyce parallel electronic simulator.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Parallel Dislocation Simulator
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Guo, Li; Xu, Yan; Xu, Zhengfu; Jiang, Jingfeng
2015-01-01
Obtaining accurate ultrasonically-estimated displacements along both axial (parallel to the acoustic beam) and lateral (perpendicular to the beam) directions is an important task for various clinical elastography applications (e.g. modulus reconstruction and temperature imaging). In this study, a partial differential equation (PDE)-based regularization algorithm was proposed to enhance motion tracking accuracy. More specifically, the proposed PDE-based algorithm, utilizing two-dimensional displacement estimates from a conventional elastography system, attempted to iteratively reduce noise contained in the original displacement estimates by mathematical regularization. In this study, the physical constraint used by the above-mentioned mathematical regularization was tissue incompressibility. This proposed algorithm was tested using computer-simulated data, a tissue-mimicking phantom and in vivo breast lesion data. Computer simulation results showed that the method significantly improved the accuracy of lateral tracking (e.g. 17X at 0.5% compression). From in vivo breast lesion data investigated, we have found that, as compared to the conventional method, higher quality axial and lateral strain images (e.g. at least 78% improvements among the estimated contrast-to-noise ratios of lateral strain images) were obtained. Our initial results demonstrated that this conceptually and computationally simple method could be useful to improve the image quality for ultrasound elastography with current clinical equipment as a post-processing tool. PMID:25452434
Parallel Power Grid Simulation Toolkit
Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Tutorial: Parallel Simulation on Supercomputers
Perumalla, Kalyan S
2012-01-01
This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.
Parallel Markov chain Monte Carlo simulations.
Ren, Ruichao; Orkoulas, G
2007-06-07
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Parallel Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Ren, Ruichao; Orkoulas, G.
2007-06-01
With strict detailed balance, parallel Monte Carlo simulation through domain decomposition cannot be validated with conventional Markov chain theory, which describes an intrinsically serial stochastic process. In this work, the parallel version of Markov chain theory and its role in accelerating Monte Carlo simulations via cluster computing is explored. It is shown that sequential updating is the key to improving efficiency in parallel simulations through domain decomposition. A parallel scheme is proposed to reduce interprocessor communication or synchronization, which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time for systems of moderate and large size.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Parallel network simulations with NEURON.
Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L
2006-10-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedrón, F; Pazos, A
2016-11-04
The human brain is the most complex system in the known universe, but it is the most unknown system. It allows the human beings to possess extraordinary capacities. However, we don´t understand yet how and why most of these capacities are produced. For decades, it have been tried that the computers reproduces these capacities. On one hand, to help understanding the nervous system. On the other hand, to process the data in a more efficient way than before. It is intended to make the computers process the information like the brain does it. The important technological developments and the big multidisciplinary projects have allowed create the first simulation with a number of neurons similar to the human brain neurons number. This paper presents an update review about the main research projects that are trying of simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the actual applications of these works and also the future trends. We have reviewed some works that look for a step forward in Neuroscience and other ones that look for a breakthrough in Computer Science (neuromorphic hardware, machine learning techniques). We summarize the most outstanding characteristics of them and present the latest advances and future plans. In addition, this review remarks the importance of considering not only neurons: the computational models of the brain should include glial cells, given the proven importance of the astrocytes in the information processing.
NASA Astrophysics Data System (ADS)
Srivastava, Rajeev; Gupta, JRP; Parthasarthy, Harish
2010-05-01
In this paper, the partial differential equation (PDE) based homomorphic filtering technique is proposed for speckle reduction from digitally reconstructed holographic images based on the concepts of complex diffusion processes. For digital implementations, the proposed scheme was discretized using finite differences scheme. Further, the performance of the proposed PDE-based technique is compared with other speckle reduction techniques such as homomorphic anisotropic diffusion filter based on extended concept of Perona and Malik (1990) [2], homomorphic Weiner filter, Lee filter, Frost filter, Kuan filter, speckle reducing anisotropic diffusion (SRAD) filter and hybrid filter in the context of digital holography. For the comparison of various speckle reduction techniques, the performance is evaluated quantitatively in terms of all possible parameters that justify the applicability of a scheme for a specific application. The chosen parameters are mean-square-error (MSE), normalized mean-square-error (NMSE), peak signal-to-noise ratio (PSNR), speckle index, average signal-to-noise ratio (SNR), effective number of looks (ENL), correlation parameter (CP), mean structure similarity index map (MSSIM) and execution time in seconds. For experimentation and computer simulation MATLAB 7.0 has been used and the performance is evaluated and tested for various sample holographic images for varying amount of speckle variance. The results obtained justify the applicability of proposed schemes.
Simulation Exploration through Immersive Parallel Planes: Preprint
Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve
2016-03-01
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Parallel methods for the flight simulation model
Xiong, Wei Zhong; Swietlik, C.
1994-06-01
The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.
Structured building model reduction toward parallel simulation
Dobbs, Justin R.; Hencey, Brondon M.
2013-08-26
Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Simulating Billion-Task Parallel Programs
Perumalla, Kalyan S; Park, Alfred J
2014-01-01
In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique
Parallelization of a Compositional Reservoir Simulator
NASA Astrophysics Data System (ADS)
Reme, Hilde; Åge Øye, Geir; Espedal, Magne S.; Fladmark, Gunnar E.
A finite volume dicretization has been used to solve compositional flow in porous media. Secondary migration in fractured rocks has been the main motivation for the work. Multipoint flux approximation has been implemented and adaptive local grid refinement, based on domain decomposition, is used at fractures and faults. The parallelization method, which is described in this paper, strongly promotes code reuse and gives a very high level of parallelization despite low implementation costs. The programming framework is also portable to other platforms or other applications. We have presented computer experiments to examine the parallel efficiency of the implemented parallel simulator with respect to scalability and speedup. Keywords: porous media, multipoint flux approximation, domain decomposition, parallelization
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel processing of a rotating shaft simulation
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.
1989-01-01
A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.
The Xyce Parallel Electronic Simulator - An Overview
HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.
2000-12-08
The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.
Unsteady flow simulation on a parallel computer
NASA Astrophysics Data System (ADS)
Faden, M.; Pokorny, S.; Engel, K.
For the simulation of the flow through compressor stages, an interactive flow simulation system is set up on an MIMD-type parallel computer. An explicit scheme is used in order to resolve the time-dependent interaction between the blades. The 2D Navier-Stokes equations are transformed into their general moving coordinates. The parallelization of the solver is based on the idea of domain decomposition. Results are presented for a problem of fixed size (4096 grid nodes for the Hakkinen case).
Compositional reservoir simulation in parallel supercomputing environments
Briens, F.J.L. ); Wu, C.H. ); Gazdag, J.; Wang, H.H. )
1991-09-01
A large-scale compositional reservoir simulation ({gt}1,000 cells) is not often run on a conventional mainframe computer owing to excessive turnaround times. This paper presents programming and computational techniques that fully exploit the capabilities of parallel supercomputers for a large-scale compositional simulation. A novel algorithm called sequential staging of tasks (SST) that can take full advantage of parallel-vector processing to speed up the solution of a large linear system is introduced. The effectiveness of SST is illustrated with results from computer experiments conducted on an IBM 3090-600E.
Xyce parallel electronic simulator release notes.
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
Plasma simulation using the massively parallel processor
NASA Technical Reports Server (NTRS)
Lin, C. S.; Thring, A. L.; Koga, J.; Janetzke, R. W.
1987-01-01
Two dimensional electrostatic simulation codes using the particle-in-cell model are developed on the Massively Parallel Processor (MPP). The conventional plasma simulation procedure that computes electric fields at particle positions by means of a gridded system is found inefficient on the MPP. The MPP simulation code is thus based on the gridless system in which particles are assigned to processing elements and electric fields are computed directly via Discrete Fourier Transform. Currently, the gridless model on the MPP in two dimensions is about nine times slower that the gridded system on the CRAY X-MP without considering I/O time. However, the gridless system on the MPP can be improved by incorporating a faster I/O between the staging memory and Array Unit and a more efficient procedure for taking floating point sums over processing elements. The initial results suggest that the parallel processors have the potential for performing large scale plasma simulations.
Parallel Performance of a Combustion Chemistry Simulation
Skinner, Gregg; Eigenmann, Rudolf
1995-01-01
We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Parallel algorithm strategies for circuit simulation.
Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard
2010-01-01
Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.
Inflated speedups in parallel simulations via malloc()
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.
Xyce parallel electronic simulator : reference guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.
Xyce() Parallel Electronic Simulator
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.
Massively-Parallel Dislocation Dynamics Simulations
Cai, W; Bulatov, V V; Pierce, T G; Hiratani, M; Rhee, M; Bartelt, M; Tang, M
2003-06-18
Prediction of the plastic strength of single crystals based on the collective dynamics of dislocations has been a challenge for computational materials science for a number of years. The difficulty lies in the inability of the existing dislocation dynamics (DD) codes to handle a sufficiently large number of dislocation lines, in order to be statistically representative and to reproduce experimentally observed microstructures. A new massively-parallel DD code is developed that is capable of modeling million-dislocation systems by employing thousands of processors. We discuss the general aspects of this code that make such large scale simulations possible, as well as a few initial simulation results.
Parallel Strategies for Crash and Impact Simulations
Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.
1998-12-07
We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.
Parallel beam dynamics simulation of linear accelerators
Qiang, Ji; Ryne, Robert D.
2002-01-31
In this paper we describe parallel particle-in-cell methods for the large scale simulation of beam dynamics in linear accelerators. These techniques have been implemented in the IMPACT (Integrated Map and Particle Accelerator Tracking) code. IMPACT is being used to study the behavior of intense charged particle beams and as a tool for the design of next-generation linear accelerators. As examples, we present applications of the code to the study of emittance exchange in high intensity beams and to the study of beam transport in a proposed accelerator for the development of accelerator-driven waste transmutation technologies.
A note on parallel efficiency of fire simulation on cluster
NASA Astrophysics Data System (ADS)
Valasek, L.; Glasa, J.
2016-08-01
Current HPC clusters are capable to reduce execution time of parallelized tasks significantly. The paper discusses the use of two selected strategies of cluster computational resources allocation and their impact on parallel efficiency of fire simulation. Simulation of a simple corridor fire scenario by Fire Dynamics Simulator parallelized by the MPI programming model is tested on the HPC cluster at the Institute of Informatics of Slovak Academy of Sciences in Bratislava (Slovakia). The tests confirm that parallelization has a great potential to reduce execution times achieving promising values of parallel efficiency of the simulation, however, the results also show that the use of increasing numbers of computational meshes resulting in increasing numbers of used computational cores does not necessarily decrease the execution time nor the parallel efficiency of simulation. The results obtained indicate that the simulation achieves different values of the execution time and the parallel efficiency in regard of the used strategy for cluster computational resources allocation.
Parallel Proximity Detection for Computer Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1998-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel Proximity Detection for Computer Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1997-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκαr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1997-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be
Parallelization and automatic data distribution for nuclear reactor simulations
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallel methods for dynamic simulation of multiple manipulator systems
NASA Technical Reports Server (NTRS)
Mcmillan, Scott; Sadayappan, P.; Orin, David E.
1993-01-01
In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.
NASA Astrophysics Data System (ADS)
Schaa, R.; Gross, L.; du Plessis, J.
2016-04-01
We present a general finite-element solver, escript, tailored to solve geophysical forward and inverse modeling problems in terms of partial differential equations (PDEs) with suitable boundary conditions. Escript’s abstract interface allows geoscientists to focus on solving the actual problem without being experts in numerical modeling. General-purpose finite element solvers have found wide use especially in engineering fields and find increasing application in the geophysical disciplines as these offer a single interface to tackle different geophysical problems. These solvers are useful for data interpretation and for research, but can also be a useful tool in educational settings. This paper serves as an introduction into PDE-based modeling with escript where we demonstrate in detail how escript is used to solve two different forward modeling problems from applied geophysics (3D DC resistivity and 2D magnetotellurics). Based on these two different cases, other geophysical modeling work can easily be realized. The escript package is implemented as a Python library and allows the solution of coupled, linear or non-linear, time-dependent PDEs. Parallel execution for both shared and distributed memory architectures is supported and can be used without modifications to the scripts.
The effects of parallel processing architectures on discrete event simulation
NASA Astrophysics Data System (ADS)
Cave, William; Slatt, Edward; Wassmer, Robert E.
2005-05-01
As systems become more complex, particularly those containing embedded decision algorithms, mathematical modeling presents a rigid framework that often impedes representation to a sufficient level of detail. Using discrete event simulation, one can build models that more closely represent physical reality, with actual algorithms incorporated in the simulations. Higher levels of detail increase simulation run time. Hardware designers have succeeded in producing parallel and distributed processor computers with theoretical speeds well into the teraflop range. However, the practical use of these machines on all but some very special problems is extremely limited. The inability to use this power is due to great difficulties encountered when trying to translate real world problems into software that makes effective use of highly parallel machines. This paper addresses the application of parallel processing to simulations of real world systems of varying inherent parallelism. It provides a brief background in modeling and simulation validity and describes a parameter that can be used in discrete event simulation to vary opportunities for parallel processing at the expense of absolute time synchronization and is constrained by validity. It focuses on the effects of model architecture, run-time software architecture, and parallel processor architecture on speed, while providing an environment where modelers can achieve sufficient model accuracy to produce valid simulation results. It describes an approach to simulation development that captures subject area expert knowledge to leverage inherent parallelism in systems in the following ways: * Data structures are separated from instructions to track which instruction sets share what data. This is used to determine independence and thus the potential for concurrent processing at run-time. * Model connectivity (independence) can be inspected visually to determine if the inherent parallelism of a physical system is properly
Parallel Monte Carlo simulation of multilattice thin film growth
NASA Astrophysics Data System (ADS)
Shu, J. W.; Lu, Qin; Wong, Wai-on; Huang, Han-chen
2001-07-01
This paper describe a new parallel algorithm for the multi-lattice Monte Carlo atomistic simulator for thin film deposition (ADEPT), implemented on parallel computer using the PVM (Parallel Virtual Machine) message passing library. This parallel algorithm is based on domain decomposition with overlapping and asynchronous communication. Multiple lattices are represented by a single reference lattice through one-to-one mappings, with resulting computational demands being comparable to those in the single-lattice Monte Carlo model. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. Results show that the algorithm is highly efficient with large number of processors. The algorithm was implemented on a parallel machine with 50 processors, and it is suitable for parallel Monte Carlo simulation of thin film growth with either a distributed memory parallel computer or a shared memory machine with message passing libraries. In this paper, the significant communication time in parallel MC simulation of thin film growth is effectively reduced by adopting domain decomposition with overlapping between sub-domains and asynchronous communication among processors. The overhead of communication does not increase evidently and speedup shows an ascending tendency when the number of processor increases. A near linear increase in computing speed was achieved with number of processors increases and there is no theoretical limit on the number of processors to be used. The techniques developed in this work are also suitable for the implementation of the Monte Carlo code on other parallel systems.
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.
1990-01-01
Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Parallel discrete-event simulation of FCFS stochastic queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical
Xyce parallel electronic simulator : users' guide. Version 5.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical
Iterative Schemes for Time Parallelization with Application to Reservoir Simulation
Garrido, I; Fladmark, G E; Espedal, M S; Lee, B
2005-04-18
Parallel methods are usually not applied to the time domain because of the inherit sequentialness of time evolution. But for many evolutionary problems, computer simulation can benefit substantially from time parallelization methods. In this paper, they present several such algorithms that actually exploit the sequential nature of time evolution through a predictor-corrector procedure. This sequentialness ensures convergence of a parallel predictor-corrector scheme within a fixed number of iterations. The performance of these novel algorithms, which are derived from the classical alternating Schwarz method, are illustrated through several numerical examples using the reservoir simulator Athena.
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.
Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting; Russo, Thomas V.; Schiek, Richard L.; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.
2016-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Broadband monitoring simulation with massively parallel processors
NASA Astrophysics Data System (ADS)
Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander
2011-09-01
Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.
Running Parallel Discrete Event Simulators on Sierra
Barnes, P. D.; Jefferson, D. R.
2015-12-03
In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.
Parallel discrete event simulation: A shared memory approach
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Traffic simulations on parallel computers using domain decomposition techniques
Hanebutte, U.R.; Tentner, A.M.
1995-12-31
Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
A CUDA based parallel multi-phase oil reservoir simulator
NASA Astrophysics Data System (ADS)
Zaza, Ayham; Awotunde, Abeeb A.; Fairag, Faisal A.; Al-Mouhamed, Mayez A.
2016-09-01
Forward Reservoir Simulation (FRS) is a challenging process that models fluid flow and mass transfer in porous media to draw conclusions about the behavior of certain flow variables and well responses. Besides the operational cost associated with matrix assembly, FRS repeatedly solves huge and computationally expensive sparse, ill-conditioned and unsymmetrical linear system. Moreover, as the computation for practical reservoir dimensions lasts for long times, speeding up the process by taking advantage of parallel platforms is indispensable. By considering the state of art advances in massively parallel computing and the accompanying parallel architecture, this work aims primarily at developing a CUDA-based parallel simulator for oil reservoir. In addition to the initial reported 33 times speed gain compared to the serial version, running experiments showed that BiCGSTAB is a stable and fast solver which could be incorporated in such simulations instead of the more expensive, storage demanding and usually utilized GMRES.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.
Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D
2016-10-01
Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix.
Direct simulation Monte Carlo analysis on parallel processors
NASA Technical Reports Server (NTRS)
Wilmoth, Richard G.
1989-01-01
A method is presented for executing a direct simulation Monte Carlo (DSMC) analysis using parallel processing. The method is based on using domain decomposition to distribute the work load among multiple processors, and the DSMC analysis is performed completely in parallel. Message passing is used to transfer molecules between processors and to provide the synchronization necessary for the correct physical simulation. Benchmark problems are described for testing the method and results are presented which demonstrate the performance on two commercially available multicomputers. The results show that reasonable parallel speedup and efficiency can be obtained if the problem is properly sized to the number of processors. It is projected that with a massively parallel system, performance exceeding that of current supercomputers is possible.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10
A curvature filter and PDE based non-uniformity correction algorithm
NASA Astrophysics Data System (ADS)
Cheng, Kuanhong; Zhou, Huixin; Qin, Hanlin; Zhao, Dong; Qian, Kun; Rong, Shenghui; Yin, Shimin
2016-10-01
In this paper, a curvature filter and PDE based non-uniformity correction algorithm is proposed, the key point of this algorithm is the way to estimate FPN. We use anisotropic diffusion to smooth noise and Gaussian curvature filter to extract the details of original image. Then combine these two parts together by guided image filter and subtract the result from original image to get the crude approximation of FPN. After that, a Temporal Low Pass Filter (TLPF) is utilized to filter out random noise and get the accurate FPN. Finally, subtract the FPN from original image to achieve non-uniformity correction. The performance of this algorithm is tested with two infrared image sequences, and the experimental results show that the proposed method achieves a better non-uniformity correction performance.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.
A tool for simulating parallel branch-and-bound methods
NASA Astrophysics Data System (ADS)
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
A hybrid parallel framework for the cellular Potts model simulations
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Parallel Monte Carlo Simulation for control system design
NASA Technical Reports Server (NTRS)
Schubert, Wolfgang M.
1995-01-01
The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.
Parallel runway requirement analysis study. Volume 2: Simulation manual
NASA Technical Reports Server (NTRS)
Ebrahimi, Yaghoob S.; Chun, Ken S.
1993-01-01
This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.
Parallel Mass Transfer Simulation of Nanoparticles Using Nonblocking Communications
NASA Astrophysics Data System (ADS)
Chantrapornchai (Phonpensri), Chantana; Dolwithayakul, Banpot; Gorlatch, Sergei
This paper presents experiences and results obtained in optimizing parallelization of the mass transfer simulation in the High Gradient Magnetic Separation (HGMS) of nanoparticles using nonblocking communication techniques in the point-to-point and collective model. We study the dynamics of mass transfer statistically in terms of particle volume concentration and the continuity equation, which is solved numerically by using the finite-difference method to compute concentration distribution in the simulation domain at a given time. In the parallel simulation, total concentration data in the simulation domain are divided row-wise and distributed equally to a group of processes. We propose two parallel algorithms based on the row-wise partitioning: algorithms with nonblocking send/receive and nonblocking scatter/gather using the NBC library. We compare the performance of both versions by measuring their parallel speedup and efficiency. We also investigate the communication overhead in both versions. Our results show that the nonblocking collective communication can improve the performance of the simulation when the number of processes is large.
Parallel canonical Monte Carlo simulations through sequential updating of particles
NASA Astrophysics Data System (ADS)
O'Keeffe, C. J.; Orkoulas, G.
2009-04-01
In canonical Monte Carlo simulations, sequential updating of particles is equivalent to random updating due to particle indistinguishability. In contrast, in grand canonical Monte Carlo simulations, sequential implementation of the particle transfer steps in a dense grid of distinct points in space improves both the serial and the parallel efficiency of the simulation. The main advantage of sequential updating in parallel canonical Monte Carlo simulations is the reduction in interprocessor communication, which is usually a slow process. In this work, we propose a parallelization method for canonical Monte Carlo simulations via domain decomposition techniques and sequential updating of particles. Each domain is further divided into a middle and two outer sections. Information exchange is required after the completion of the updating of the outer regions. During the updating of the middle section, communication does not occur unless a particle moves out of this section. Results on two- and three-dimensional Lennard-Jones fluids indicate a nearly perfect improvement in parallel efficiency for large systems.
Parallel canonical Monte Carlo simulations through sequential updating of particles.
O'Keeffe, C J; Orkoulas, G
2009-04-07
In canonical Monte Carlo simulations, sequential updating of particles is equivalent to random updating due to particle indistinguishability. In contrast, in grand canonical Monte Carlo simulations, sequential implementation of the particle transfer steps in a dense grid of distinct points in space improves both the serial and the parallel efficiency of the simulation. The main advantage of sequential updating in parallel canonical Monte Carlo simulations is the reduction in interprocessor communication, which is usually a slow process. In this work, we propose a parallelization method for canonical Monte Carlo simulations via domain decomposition techniques and sequential updating of particles. Each domain is further divided into a middle and two outer sections. Information exchange is required after the completion of the updating of the outer regions. During the updating of the middle section, communication does not occur unless a particle moves out of this section. Results on two- and three-dimensional Lennard-Jones fluids indicate a nearly perfect improvement in parallel efficiency for large systems.
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Efficient parallel CFD-DEM simulations using OpenMP
NASA Astrophysics Data System (ADS)
Amritkar, Amit; Deb, Surya; Tafti, Danesh
2014-01-01
The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.
Xyce Parallel Electronic Simulator : reference guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Xyce parallel electronic simulator reference guide, version 6.0.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce™ Parallel Electronic Simulator: Reference Guide, Version 5.1
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Santarelli, Keith R.; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S.; Pawlowski, Roger P.
2009-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide.
Xyce parallel electronic simulator reference guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator : reference guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
A Framework to Simulate Semiconductor Devices Using Parallel Computer Architecture
NASA Astrophysics Data System (ADS)
Kumar, Gaurav; Singh, Mandeep; Bulusu, Anand; Trivedi, Gaurav
2016-10-01
Device simulations have become an integral part of semiconductor technology to address many issues (short channel effects, narrow width effects, hot-electron effect) as it goes into nano regime, helping us to continue further with the Moore's Law. TCAD provides a simulation environment to design and develop novel devices, thus a leap forward to study their electrical behaviour in advance. In this paper, a parallel 2D simulator for semiconductor devices using Discontinuous Galerkin Finite Element Method (DG-FEM) is presented. Discontinuous Galerkin (DG) method is used to discretize essential device equations and later these equations are analyzed by using a suitable methodology to find the solution. DG method is characterized to provide more accurate solution as it efficiently conserve the flux and easily handles complex geometries. OpenMP is used to parallelize solution of device equations on manycore processors and a speed of 1.4x is achieved during assembly process of discretization. This study is important for more accurate analysis of novel devices (such as FinFET, GAAFET etc.) on a parallel computing platform and will help us to develop a parallel device simulator which will be able to address this issue efficiently. A case study of PN junction diode is presented to show the effectiveness of proposed approach.
Time parallelization of plasma simulations using the parareal algorithm
Samaddar, D.; Houlberg, Wayne A; Berry, Lee A; Elwasif, Wael R; Huysmans, G; Batchelor, Donald B
2011-01-01
Simulation of fusion plasmas involve a broad range of timescales. In magnetically confined plasmas, such as in ITER, the timescale associated with the microturbulence responsible for transport and confinement timescales vary by an order of 10^6 10^9. Simulating this entire range of timescales is currently impossible, even on the most powerful supercomputers available. Space parallelization has so far been the most common approach to solve partial differential equations. Space parallelization alone has led to computational saturation for fluid codes, which means that the walltime for computaion does not linearly decrease with the increasing number of processors used. The application of the parareal algorithm to simulations of fusion plasmas ushers in a new avenue of parallelization, namely temporal parallelization. The algorithm has been successfully applied to plasma turbulence simulations, prior to which it has been applied to other relatively simpler problems. This work explores the extension of the applicability of the parareal algorithm to ITER relevant problems, starting with a diffusion-convection model.
Parallelization Strategies for Large Particle Simulations in Astrophysics
NASA Astrophysics Data System (ADS)
Pattabiraman, Bharath
The modeling of collisional N-body stellar systems is a topic of great current interest in several branches of astrophysics and cosmology. These systems are dominated by the physics of relaxation, the collective effect of many weak, random gravitational encounters between stars. They connect directly to our understanding of star clusters, and to the formation of exotic objects such as X-ray binaries, pulsars, and massive black holes. As a prototypical multi-physics, multi-scale problem, the numerical simulation of such systems is computationally intensive, and can only be achieved through high-performance computing. The goal of this thesis is to present parallelization and optimization strategies that can be used to develop efficient computational tools for simulating collisional N-body systems. This leads to major advances: 1) From an astrophysics perspective, these tools enable the study of new physical regimes out of reach by previous simulations. They also lead to much more complete parameter space exploration, allowing direct comparison of numerical results to observational data. 2) On the high-performance computing front, efficient parallelization of a multi-component application requires the meticulous redesign of the various components, as well as innovative parallelization techniques. Many of the challenges faced in this process lie at the very heart of high-performance computing research, including achieving optimal load balancing, maximizing utilization of computational resources, and making effective use of different parallel platforms. For modeling collisional N-body systems, a Monte Carlo approach provides ideal balance between speed and accuracy, as opposed to the more accurate but less scalable direct N-body method. We describe the development of a new version of the Cluster Monte Carlo (CMC) code capable of simulating systems with a realistic number of stars, while accounting for all important physical processes. This efficient and scalable
The parallel subdomain-levelset deflation method in reservoir simulation
NASA Astrophysics Data System (ADS)
van der Linden, J. H.; Jönsthövel, T. B.; Lukyanov, A. A.; Vuik, C.
2016-01-01
Extreme and isolated eigenvalues are known to be harmful to the convergence of an iterative solver. These eigenvalues can be produced by strong heterogeneity in the underlying physics. We can improve the quality of the spectrum by 'deflating' the harmful eigenvalues. In this work, deflation is applied to linear systems in reservoir simulation. In particular, large, sudden differences in the permeability produce extreme eigenvalues. The number and magnitude of these eigenvalues is linked to the number and magnitude of the permeability jumps. Two deflation methods are discussed. Firstly, we state that harmonic Ritz eigenvector deflation, which computes the deflation vectors from the information produced by the linear solver, is unfeasible in modern reservoir simulation due to high costs and lack of parallelism. Secondly, we test a physics-based subdomain-levelset deflation algorithm that constructs the deflation vectors a priori. Numerical experiments show that both methods can improve the performance of the linear solver. We highlight the fact that subdomain-levelset deflation is particularly suitable for a parallel implementation. For cases with well-defined permeability jumps of a factor 104 or higher, parallel physics-based deflation has potential in commercial applications. In particular, the good scalability of parallel subdomain-levelset deflation combined with the robust parallel preconditioner for deflated system suggests the use of this method as an alternative for AMG.
Parallel Performance Optimization of the Direct Simulation Monte Carlo Method
NASA Astrophysics Data System (ADS)
Gao, Da; Zhang, Chonglin; Schwartzentruber, Thomas
2009-11-01
Although the direct simulation Monte Carlo (DSMC) particle method is more computationally intensive compared to continuum methods, it is accurate for conditions ranging from continuum to free-molecular, accurate in highly non-equilibrium flow regions, and holds potential for incorporating advanced molecular-based models for gas-phase and gas-surface interactions. As available computer resources continue their rapid growth, the DSMC method is continually being applied to increasingly complex flow problems. Although processor clock speed continues to increase, a trend of increasing multi-core-per-node parallel architectures is emerging. To effectively utilize such current and future parallel computing systems, a combined shared/distributed memory parallel implementation (using both Open Multi-Processing (OpenMP) and Message Passing Interface (MPI)) of the DSMC method is under development. The parallel implementation of a new state-of-the-art 3D DSMC code employing an embedded 3-level Cartesian mesh will be outlined. The presentation will focus on performance optimization strategies for DSMC, which includes, but is not limited to, modified algorithm designs, practical code-tuning techniques, and parallel performance optimization. Specifically, key issues important to the DSMC shared memory (OpenMP) parallel performance are identified as (1) granularity (2) load balancing (3) locality and (4) synchronization. Challenges and solutions associated with these issues as they pertain to the DSMC method will be discussed.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Reusable Component Model Development Approach for Parallel and Distributed Simulation
Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng
2014-01-01
Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Potts-model grain growth simulations: Parallel algorithms and applications
Wright, S.A.; Plimpton, S.J.; Swiler, T.P.
1997-08-01
Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.
Xyce Parallel Electronic Simulator Users Guide Version 6.2.
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-09-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
The cost of conservative synchronization in parallel discrete event simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
A Comparison of PDE-based Non-Linear Anisotropic Diffusion Techniques for Image Denoising
Weeratunga, S K; Kamath, C
2003-01-06
PDE-based, non-linear diffusion techniques are an effective way to denoise images. In a previous study, we investigated the effects of different parameters in the implementation of isotropic, non-linear diffusion. Using synthetic and real images, we showed that for images corrupted with additive Gaussian noise, such methods are quite effective, leading to lower mean-squared-error values in comparison with spatial filters and wavelet-based approaches. In this paper, we extend this work to include anisotropic diffusion, where the diffusivity is a tensor valued function which can be adapted to local edge orientation. This allows smoothing along the edges, but not perpendicular to it. We consider several anisotropic diffusivity functions as well as approaches for discretizing the diffusion operator that minimize the mesh orientation effects. We investigate how these tensor-valued diffusivity functions compare in image quality, ease of use, and computational costs relative to simple spatial filters, the more complex bilateral filters, wavelet-based methods, and isotropic non-linear diffusion based techniques.
Comparison of PDE-based non-linear anistropic diffusion techniques for image denoising
NASA Astrophysics Data System (ADS)
Weeratunga, Sisira K.; Kamath, Chandrika
2003-05-01
PDE-based, non-linear diffusion techniques are an effective way to denoise images.In a previous study, we investigated the effects of different parameters in the implementation of isotropic, non-linear diffusion. Using synthetic and real images, we showed that for images corrupted with additive Gaussian noise, such methods are quite effective, leading to lower mean-squared-error values in comparison with spatial filters and wavelet-based approaches. In this paper, we extend this work to include anisotropic diffusion, where the diffusivity is a tensor valued function which can be adapted to local edge orientation. This allows smoothing along the edges, but not perpendicular to it. We consider several anisotropic diffusivity functions as well as approaches for discretizing the diffusion operator that minimize the mesh orientation effects. We investigate how these tensor-valued diffusivity functions compare in image quality, ease of use, and computational costs relative to simple spatial filters, the more complex bilateral filters, wavelet-based methods, and isotropic non-linear diffusion based techniques.
Low-complexity PDE-based approach for automatic microarray image processing.
Belean, Bogdan; Terebes, Romulus; Bot, Adrian
2015-02-01
Microarray image processing is known as a valuable tool for gene expression estimation, a crucial step in understanding biological processes within living organisms. Automation and reliability are open subjects in microarray image processing, where grid alignment and spot segmentation are essential processes that can influence the quality of gene expression information. The paper proposes a novel partial differential equation (PDE)-based approach for fully automatic grid alignment in case of microarray images. Our approach can handle image distortions and performs grid alignment using the vertical and horizontal luminance function profiles. These profiles are evolved using a hyperbolic shock filter PDE and then refined using the autocorrelation function. The results are compared with the ones delivered by state-of-the-art approaches for grid alignment in terms of accuracy and computational complexity. Using the same PDE formalism and curve fitting, automatic spot segmentation is achieved and visual results are presented. Considering microarray images with different spots layouts, reliable results in terms of accuracy and reduced computational complexity are achieved, compared with existing software platforms and state-of-the-art methods for microarray image processing.
Modularized Parallel Neutron Instrument Simulation on the TeraGrid
Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E
2007-01-01
In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
Parallel algorithms for simulating continuous time Markov chains
NASA Technical Reports Server (NTRS)
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
Xyce™ Parallel Electronic Simulator Reference Guide, Version 6.5
Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting; Russo, Thomas V.; Schiek, Richard L.; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.
2016-06-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Time parallelization of advanced operation scenario simulations of ITER plasma
Samaddar, D.; Casper, T. A.; Kim, S. H.; Berry, Lee A; Elwasif, Wael R; Batchelor, Donald B; Houlberg, Wayne A
2013-01-01
This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA - an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.
Synchronous Parallel System for Emulation and Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
2001-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
Synchronous parallel system for emulation and discrete event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1992-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.
Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.
2005-06-01
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to
An automated parallel simulation execution and analysis approach
NASA Astrophysics Data System (ADS)
Dallaire, Joel D.; Green, David M.; Reaper, Jerome H.
2004-08-01
State-of-the-art simulation computing requirements are continually approaching and then exceeding the performance capabilities of existing computers. This trend remains true even with huge yearly gains in processing power and general computing capabilities; simulation scope and fidelity often increases as well. Accordingly, simulation studies often expend days or weeks executing a single test case. Compounding the problem, stochastic models often require execution of each test case with multiple random number seeds to provide valid results. Many techniques have been developed to improve the performance of simulations without sacrificing model fidelity: optimistic simulation, distributed simulation, parallel multi-processing, and the use of supercomputers such as Beowulf clusters. An approach and prototype toolset has been developed that augments existing optimization techniques to improve multiple-execution timelines. This approach, similar in concept to the SETI @ home experiment, makes maximum use of unused licenses and computers, which can be geographically distributed. Using a publish/subscribe architecture, simulation executions are dispatched to distributed machines for execution. Simulation results are then processed, collated, and transferred to a single site for analysis.
Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman
2014-11-01
Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos
2015-11-01
The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.
Conservative parallel simulation of priority class queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.
Xyce Parallel Electronic Simulator Users' Guide Version 6.6.
Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason
2016-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c 2012
Xyce Parallel Electronic Simulator Users Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
MRISIMUL: a GPU-based parallel approach to MRI simulations.
Xanthis, Christos G; Venetis, Ioannis E; Chalkias, A V; Aletras, Anthony H
2014-03-01
A new step-by-step comprehensive MR physics simulator (MRISIMUL) of the Bloch equations is presented. The aim was to develop a magnetic resonance imaging (MRI) simulator that makes no assumptions with respect to the underlying pulse sequence and also allows for complex large-scale analysis on a single computer without requiring simplifications of the MRI model. We hypothesized that such a simulation platform could be developed with parallel acceleration of the executable core within the graphic processing unit (GPU) environment. MRISIMUL integrates realistic aspects of the MRI experiment from signal generation to image formation and solves the entire complex problem for densely spaced isochromats and for a densely spaced time axis. The simulation platform was developed in MATLAB whereas the computationally demanding core services were developed in CUDA-C. The MRISIMUL simulator imaged three different computer models: a user-defined phantom, a human brain model and a human heart model. The high computational power of GPU-based simulations was compared against other computer configurations. A speedup of about 228 times was achieved when compared to serially executed C-code on the CPU whereas a speedup between 31 to 115 times was achieved when compared to the OpenMP parallel executed C-code on the CPU, depending on the number of threads used in multithreading (2-8 threads). The high performance of MRISIMUL allows its application in large-scale analysis and can bring the computational power of a supercomputer or a large computer cluster to a single GPU personal computer.
Development of magnetron sputtering simulator with GPU parallel computing
NASA Astrophysics Data System (ADS)
Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil
2014-12-01
Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that
Numerical Simulation of Flow Field Within Parallel Plate Plastometer
NASA Technical Reports Server (NTRS)
Antar, Basil N.
2002-01-01
Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation
NASA Astrophysics Data System (ADS)
Schneider, Evan E.; Robertson, Brant E.
2015-04-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION
Schneider, Evan E.; Robertson, Brant E.
2015-04-15
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
High Performance Parallel Methods for Space Weather Simulations
NASA Technical Reports Server (NTRS)
Hunter, Paul (Technical Monitor); Gombosi, Tamas I.
2003-01-01
This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.
Massively parallel algorithms for trace-driven cache simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.
1991-01-01
Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
Plimpton, Steve; Thompson, Aidan; Crozier, Paul
LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.
Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines
NASA Technical Reports Server (NTRS)
Kiris, Cetin C.; Kwak, Dochan; Chan, William
2000-01-01
This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.
Parallel collisionless shocks forming in simulations of the LAPD experiment
NASA Astrophysics Data System (ADS)
Weidl, Martin S.; Jenko, Frank; Niemann, Chris; Winske, Dan
2016-10-01
Research on parallel collisionless shocks, most prominently occurring in the Earth's bow shock region, has so far been limited to satellite measurements and simulations. However, the formation of collisionless shocks depends on a wide range of parameters and scales, which can be accessed more easily in a laboratory experiment. Using a kJ-class laser, an ongoing experimental campaign at the Large Plasma Device (LAPD) at UCLA is expected to produce the first laboratory measurements of the formation of a parallel collisionless shock. We present hybrid kinetic/MHD simulations that show how beam instabilities in the background plasma can be driven by ablating carbon ions from a target, causing non-linear density oscillations which develop into a propagating shock front. The free-streaming carbon ions can excite both the resonant right-hand instability and the non-resonant firehose mode. We analyze their respective roles and discuss optimizing their growth rates to speed up the process of shock formation.
Parallel electromagnetic simulator based on the Finite-Difference Time Domain method
NASA Astrophysics Data System (ADS)
Walendziuk, Wojciech
2006-03-01
In the following paper the parallel tool for electromagnetic field distribution analysis is presented. The main simulation programme is based on the parallel algorithm of the Finite-Difference Time-Domain method and use Message Passing Interface as a communication library. In the paper also ways of communications among computation nodes in a parallel environment and efficiency of the parallel algorithm are presented.
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.
Roadmap for efficient parallelization of breast anatomy simulation
NASA Astrophysics Data System (ADS)
Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.
2012-03-01
A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.
Parallel grid library for rapid and flexible simulation development
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2013-04-01
We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Efficient solid state NMR powder simulations using SMP and MPP parallel computation
NASA Astrophysics Data System (ADS)
Kristensen, Jørgen Holm; Farnan, Ian
2003-04-01
Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations.
Ion dynamics at supercritical quasi-parallel shocks: Hybrid simulations
NASA Astrophysics Data System (ADS)
Su, Yanqing; Lu, Quanming; Gao, Xinliang; Huang, Can; Wang, Shui
2012-09-01
By separating the incident ions into directly transmitted, downstream thermalized, and diffuse ions, we perform one-dimensional (1D) hybrid simulations to investigate ion dynamics at a supercritical quasi-parallel shock. In the simulations, the angle between the upstream magnetic field and shock nominal direction is θBn=30°, and the Alfven Mach number is MA˜5.5. The shock exhibits a periodic reformation process. The ion reflection occurs at the beginning of the reformation cycle. Part of the reflected ions is trapped between the old and new shock fronts for an extended time period. These particles eventually form superthermal diffuse ions after they escape to the upstream of the new shock front at the end of the reformation cycle. The other reflected ions may return to the shock immediately or be trapped between the old and new shock fronts for a short time period. When the amplitude of the new shock front exceeds that of the old shock front and the reformation cycle is finished, these ions become thermalized ions in the downstream. No noticeable heating can be found in the directly transmitted ions. The relevance of our simulations to the satellite observations is also discussed in the paper.
Parallel finite element simulation of large ram-air parachutes
NASA Astrophysics Data System (ADS)
Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.
1997-06-01
In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.
Parallelizing N-Body Simulations on a Heterogeneous Cluster
NASA Astrophysics Data System (ADS)
Stenborg, T. N.
2009-10-01
This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-28
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Simulation of Parallel Interacting Faults and Earthquake Predictability
NASA Astrophysics Data System (ADS)
Mora, P.; Weatherley, D.; Klein, B.
2003-04-01
Numerical shear experiments of a granular region using the lattice solid model often exhibit accelerating energy release in the lead-up to large events (Mora et al, 2000) and a growth in correlation lengths in the stress field (Mora and Place, 2002). While these results provide evidence for a Critical Point-like mechanism in elasto-dynamic systems and the possibility of earthquake forecasting but they do not prove such a mechanism occurs in the crust. Cellular automaton models simulations exhibit accelerating energy release prior to large events or unpredictable behaviour in which large events may occur at any time depending on tuning parameters such as dissipation ratio and stress transfer ratio (Weatherley and Mora, 2003). The mean stress plots from the particle simulations are most similar to the CA mean stress plots near the boundary of the predictable and unpredictable regimes suggesting that elasto-dynamic systems may be close to the borderline of predictable and unpredictable. To progress in resolving the question of whether more realistic fault system models exhibit predictable behaviour and to determine whether they also have an unpredictable and predictable regime depending on tuning parameters like that seen in CA simulations, we developed a 2D elasto-dynamic model of parallel interacting faults. The friction is slip weakening until a critical slip distance. Henceforth, the friction is at the dynamic value until the slip rate drops below the value it attained when the critical slip distance was exceeded. As the slip rate continues to drop, the friction increases back to the static value as a function of slip rate. Numerical shear experiments are conducted in a model with 41 parallel interacting faults. Calculations of the inverse metric defined in Klein et al (2000) indicate that the system is non-ergodic. Furthermore, by calculating the correllation between the stress fields at different times we determine that the system exhibits so called ``glassy
A massively parallel solution strategy for efficient thermal radiation simulation
NASA Astrophysics Data System (ADS)
Nguyen, P. D.; Moureau, V.; Vervisch, L.; Perret, N.
2012-06-01
A novel and efficient methodology to solve the Radiative Transfer Equations (RTE) in thermal radiation is discussed. The BiCGStab(2) iterative solution method, as designed for the non-symmetric linear equation systems, is used to solve the discretized RTE. The numerical upwind and central schemes are blended to provide a stable numerical scheme (MUCS) for interpolation of the cell facial radiation intensities in finite volume formulation. The combination of the BiCGStab(2) and MUCS methods proved to be very efficient when coupling with the DOM approach to solve the RTE. A cost-effective tabulation technique for the gaseous radiative property model SNB-FSCK using 7-point Gauss-Labatto quadrature scheme is also introduced. The whole methodology is implemented into a massively parallel unstructured CFD code where the radiative and fluid flow solutions share the same domain decomposition, which is the bottleneck in current radiative solvers. The dual mesh decomposition at the cell groups level and processors level is adopted to optimize the CFD code for massively parallel computing. The whole method is applied to simulate the radiation heat-transfer in a 3D rectangular enclosure containing non-isothermal CO2 and H2O mixtures. Two test cases are studied for homogeneous and inhomogeneous distributions of CO2 and H2O in the enclosure. The result is reported for the heat flux and radiation energy source and the comparison is also made between the present methodology BiCGStab(2)/MUCS/tabulated SNB-FSCK, the benchmark method SNB-CK (implemented at 25cm-1 narrow-band) and some other methods available in the literature. The present method (BiCGStab(2)/MUCS/tabulated SNB-FSCK) yields more accurate predictions particularly for the radiation source term. When comparing with the benchmark solution, the relative error of the radiation source term is remarkably reduced to less than 4% and the CPU time is drastically diminished.
Xyce Parallel Electronic Simulator Reference Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our
NASA Astrophysics Data System (ADS)
Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan
2016-01-01
3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.
Particle/Continuum Hybrid Simulation in a Parallel Computing Environment
NASA Technical Reports Server (NTRS)
Baganoff, Donald
1996-01-01
The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Contact-impact simulations on massively parallel SIMD supercomputers
Plaskacz, E.J. ); Belytscko, T.; Chiang, H.Y. )
1992-01-01
The implementation of explicit finite element methods with contact-impact on massively parallel SIMD computers is described. The basic parallel finite element algorithm employs an exchange process which minimizes interprocessor communication at the expense of redundant computations and storage. The contact-impact algorithm is based on the pinball method in which compatibility is enforced by preventing interpenetration on spheres embedded in elements adjacent to surfaces. The enhancements to the pinball algorithm include a parallel assembled surface normal algorithm and a parallel detection of interpenetrating pairs. Some timings with and without contact-impact are given.
Xyce Parallel Electronic Simulator Reference Guide Version 6.6.
Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason
2016-11-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c 2012 and 2016, Regents of the University of California. The BSIM-SOI is Copyright c 1990, Regents of the University of California. All rights reserved. The Mextram model has been developed by NXP Semiconductors until 2007, Delft University of Technology from 2007 to 2014, and Auburn University since April 2015. Copyrights c of Mextram are with Delft University of Technology, NXP Semiconductors and Auburn University. The MIT VS Model Research Group developed the MIT Virtual Source (MVS) model. Copyright c 2013 Massachusetts Institute of Technology (MIT). The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are
NASA Technical Reports Server (NTRS)
Banks, H. T.; Brown, D. E.; Metcalf, Vern L.; Silcox, R. J.; Smith, Ralph C.; Wang, Yun
1994-01-01
A problem of continued interest concerns the control of vibrations in a flexible structure and the related problem of reducing structure-borne noise in structural acoustic systems. In both cases, piezoceramic patches bonded to the structures have been successfully used as control actuators. Through the application of a controlling voltage, the patches can be used to reduce structural vibrations which in turn lead to methods for reducing structure-borne noise. A PDE-based methodology for modeling, estimating physical parameters, and implementing a feedback control scheme for problems of this type is discussed. While the illustrating example is a circular plate, the methodology is sufficiently general so as to be applicable in a variety of structural and structural acoustic systems.
Parallel Simulation of Subsonic Fluid Dynamics on a Cluster of Workstations.
1994-11-01
inside wind musical instruments. Typical simulations achieve $80\\%$ parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed...TERMS AI, MIT, Artificial Intelligence, Distributed Computing, Workstation Cluster, Network, Fluid Dynamics, Musical Instruments 17. SECURITY...for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP
Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution
Yoginath, Srikanth B; Perumalla, Kalyan S
2008-01-01
Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Parallel direct numerical simulation of three-dimensional spray formation
NASA Astrophysics Data System (ADS)
Chergui, Jalel; Juric, Damir; Shin, Seungwon; Kahouadji, Lyes; Matar, Omar
2015-11-01
We present numerical results for the breakup mechanism of a liquid jet surrounded by a fast coaxial flow of air with density ratio (water/air) ~ 1000 and kinematic viscosity ratio ~ 60. We use code BLUE, a three-dimensional, two-phase, high performance, parallel numerical code based on a hybrid Front-Tracking/Level Set algorithm for Lagrangian tracking of arbitrarily deformable phase interfaces and a precise treatment of surface tension forces. The parallelization of the code is based on the technique of domain decomposition where the velocity field is solved by a parallel GMRes method for the viscous terms and the pressure by a parallel multigrid/GMRes method. Communication is handled by MPI message passing procedures. The interface method is also parallelized and defines the interface both by a discontinuous density field as well as by a triangular Lagrangian mesh and allows the interface to undergo large deformations including the rupture and/or coalescence of interfaces. EPSRC Programme Grant, MEMPHIS, EP/K0039761/1.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Comparison of serial and parallel simulations of a corridor fire using FDS
NASA Astrophysics Data System (ADS)
Valasek, L.
2015-09-01
Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.
Static and dynamic load-balancing strategies for parallel reservoir simulation
Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.
1995-12-31
Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Lee, Byung Il; Lee, Suk-Ho; Kim, Tae-Seong; Kwon, Ohin; Woo, Eung Je; Seo, Jin Keun
2005-11-01
Recent progress in magnetic resonance electrical impedance tomography (MREIT) research via simulation and biological tissue phantom studies have shown that conductivity images with higher spatial resolution and accuracy are achievable. In order to apply MREIT to human subjects, one of the important remaining problems to be solved is to reduce the amount of the injection current such that it meets the electrical safety regulations. However, by limiting the amount of the injection current according to the safety regulations, the measured MR data such as the z-component of magnetic flux density Bz in MREIT tend to have low SNR and get usually degraded in their accuracy due to the nonideal data acquisition system of an MR scanner. Furthermore, numerical differentiations of the measured Bz required by the conductivity image reconstruction algorithms tend to further deteriorate the quality and accuracy of the reconstructed conductivity images. In this paper, we propose a denoising technique that incorporates a harmonic decomposition. The harmonic decomposition is especially suitable for MREIT due to the physical characteristics of Bz. It effectively removes systematic and random noises, while preserving important key features in the MR measurements, so that improved conductivity images can be obtained. The simulation and experimental results demonstrate that the proposed denoising technique is effective for MREIT, producing significantly improved quality of conductivity images. The denoising technique will be a valuable tool in MREIT to reduce the amount of the injection current when it is combined with an improved MREIT pulse sequence.
Computer simulation program for parallel SITAN. [Sandia Inertia Terrain-Aided Navigation, in FORTRAN
Andreas, R.D.; Sheives, T.C.
1980-11-01
This computer program simulates the operation of parallel SITAN using digitized terrain data. An actual trajectory is modeled including the effects of inertial navigation errors and radar altimeter measurements.
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0
NASA Astrophysics Data System (ADS)
Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin
2016-04-01
This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
The IDES framework: A case study in development of a parallel discrete-event simulation system
Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.
1997-12-31
This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.
Parallel implementation of the FETI-DPEM algorithm for general 3D EM simulations
NASA Astrophysics Data System (ADS)
Li, Yu-Jia; Jin, Jian-Ming
2009-05-01
A parallel implementation of the electromagnetic dual-primal finite element tearing and interconnecting algorithm (FETI-DPEM) is designed for general three-dimensional (3D) electromagnetic large-scale simulations. As a domain decomposition implementation of the finite element method, the FETI-DPEM algorithm provides fully decoupled subdomain problems and an excellent numerical scalability, and thus is well suited for parallel computation. The parallel implementation of the FETI-DPEM algorithm on a distributed-memory system using the message passing interface (MPI) is discussed in detail along with a few practical guidelines obtained from numerical experiments. Numerical examples are provided to demonstrate the efficiency of the parallel implementation.
Parallel Unsteady Turbopump Flow Simulations for Reusable Launch Vehicles
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2000-01-01
An efficient solution procedure for time-accurate solutions of Incompressible Navier-Stokes equation is obtained. Artificial compressibility method requires a fast convergence scheme. Pressure projection method is efficient when small time-step is required. The number of sub-iteration is reduced significantly when Poisson solver employed with the continuity equation. Both computing time and memory usage are reduced (at least 3 times). Other work includes Multi Level Parallelism (MLP) of INS3D, overset connectivity for the validation case, experimental measurements, and computational model for boost pump.
Parallel simulation of compressible flow using automatic differentiation and PETSc.
Hovland, P. D.; McInnes, L. C.; Mathematics and Computer Science
2001-03-01
Many aerospace applications require parallel implicit solution strategies and software. The use of two computational tools, the Portable, Extensible Toolkit for Scientific computing (PETSc) and ADIFOR, to implement a Newton-Krylov-Schwarz method with pseudo-transient continuation for a particular application, namely, a steady-state, fully implicit, three-dimensional compressible Euler model of flow over an M6 wing is considered. How automatic differentiation (AD) can be used within the PETSc framework to compute the required derivatives is described. Performance data demonstrating the suitability of AD and PETSc for this problem are presented. A synopsis of results and a description of opportunities for future work concludes this paper.
Parallel computations using a cluster of workstations to simulate elasticity problems
NASA Astrophysics Data System (ADS)
Darmawan, J. B. B.; Mungkasi, S.
2016-11-01
Computational physics has played important roles in real world problems. This paper is within the applied computational physics area. The aim of this study is to observe the performance of parallel computations using a cluster of workstations (COW) to simulate elasticity problems. Parallel computations with the COW configuration are conducted using the Message Passing Interface (MPI) standard. In parallel computations with COW, we consider five scenarios with twenty simulations. In addition to the execution time, efficiency is used to evaluate programming algorithm scenarios. Sequential and parallel programming performances are evaluated based on their execution time and efficiency. Results show that the one-dimensional elasticity equations are not appropriate to be solved in parallel with MPI_Send and MPI_Recv technique in the MPI standard, because the total amount of time to exchange data is considered more dominant compared with the total amount of time to conduct the basic elasticity computation.
Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.
1997-03-01
We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.
NASA Astrophysics Data System (ADS)
Shim, Yunsic; Amar, Jacques G.
2005-03-01
The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semirigorous synchronous sublattice algorithm for parallel kinetic Monte Carlo simulations. The accuracy and parallel efficiency are studied as a function of diffusion rate, processor size, and number of processors for a variety of simple models of epitaxial growth. The effects of fluctuations on the parallel efficiency are also studied. Since only local communications are required, linear scaling behavior is observed, e.g., the parallel efficiency is independent of the number of processors for fixed processor size.
Xyce parallel electronic simulator users' guide, Version 6.0.1.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.0.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2013-08-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
A Parallel Simulated Annealing Approach to Solve for Earthquake Rupture Rates
NASA Astrophysics Data System (ADS)
Milner, K.; Page, M. T.; Field, E. H.
2011-12-01
We present a parallel approach to the classic simulated annealing algorithm (Kirkpatrick 1983) in order to solve for the rates of earthquake ruptures in California's complex fault system, being developed for the 3rd Uniform California Earthquake Rupture Forecast (UCERF3). Through the use of distributed computing, we have achieved substantial speedup when compared to serial simulated annealing. We will describe the parallel simulated annealing algorithm in detail, as well as the parallelization parameters used and their effect on speedup (time to convergence, or alternatively a specified energy level) and communications efficiency. Additionally we will discuss the correlation between performance of the parallel algorithm and the degree of constraints on the solution. We will present scaling results to thousands of processors, and experiences with the MPJ Express Java Message Passing Library (Baker 2006) on the University of Southern California's High Performance Computing and Communications cluster.
Wang, Zhiteng; Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang
2014-01-01
Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow.
NASA Astrophysics Data System (ADS)
Vashishta, Priya; Bachlechner, Martina; Nakano, Aiichiro; Campbell, Timothy J.; Kalia, Rajiv K.; Kodiyalam, Sanjay; Ogata, Shuji; Shimojo, Fuyuki; Walsh, Phillip
2001-10-01
We have developed scalable space-time multiresolution algorithms to enable molecular dynamics simulations involving up to a billion atoms on massively parallel computers. Large-scale molecular dynamics simulations have been used to study stress domains and interfacial fracture in semiconductor/dielectric nanopixels, nanoindentation, and oxidation of metallic nanoparticles.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Partitioning and packing mathematical simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Arpasi, D. J.; Milner, E. J.
1986-01-01
The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
CODE BLUE: Three dimensional massively-parallel simulation of multi-scale configurations
NASA Astrophysics Data System (ADS)
Juric, Damir; Kahouadji, Lyes; Chergui, Jalel; Shin, Seungwon; Craster, Richard; Matar, Omar
2016-11-01
We present recent progress on BLUE, a solver for massively parallel simulations of fully three-dimensional multiphase flows which runs on a variety of computer architectures from laptops to supercomputers and on 131072 threads or more (limited only by the availability to us of more threads). The code is wholly written in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We developed parallel GMRES and multigrid iterative solvers suited to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. Particular attention is drawn to the details and performance of the parallel Multigrid solver. EPSRC UK Programme Grant MEMPHIS (EP/K003976/1).
NASA Astrophysics Data System (ADS)
Krappel, Timo; Riedelbauch, Stefan; Jester-Zuerker, Roland; Jung, Alexander; Flurl, Benedikt; Unger, Friedeman; Galpin, Paul
2016-11-01
The operation of Francis turbines in part load conditions causes high fluctuations and dynamic loads in the turbine and especially in the draft tube. At the hub of the runner outlet a rotating vortex rope within a low pressure zone arises and propagates into the draft tube cone. The investigated part load operating point is at about 72% discharge of best efficiency. To reduce the possible influence of boundary conditions on the solution, a flow simulation of a complete Francis turbine is conducted consisting of spiral case, stay and guide vanes, runner and draft tube. As the flow has a strong swirling component for the chosen operating point, it is very challenging to accurately predict the flow and in particular the flow losses in the diffusor. The goal of this study is to reach significantly better numerical prediction of this flow type. This is achieved by an improved resolution of small turbulent structures. Therefore, the Scale Adaptive Simulation SAS-SST turbulence model - a scale resolving turbulence model - is applied and compared to the widely used RANS-SST turbulence model. The largest mesh contains 300 million elements, which achieves LES-like resolution throughout much of the computational domain. The simulations are evaluated in terms of the hydraulic losses in the machine, evaluation of the velocity field, pressure oscillations in the draft tube and visual comparisons of turbulent flow structures. A pre-release version of ANSYS CFX 17.0 is used in this paper, as this CFD solver has a parallel performance up to several thousands of cores for this application which includes a transient rotor-stator interface to support the relative motion between the runner and the stationary portions of the water turbine.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)
NASA Astrophysics Data System (ADS)
Kum, Oyeon
2004-11-01
Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
DL_POLY_2.0: a general-purpose parallel molecular dynamics simulation package.
Smith, W; Forester, T R
1996-06-01
DL_POLY_2.0 is a general-purpose parallel molecular dynamics simulation package developed at Daresbury Laboratory under the auspices of the Council for the Central Laboratory of the Research Councils. Written to support academic research, it has a wide range of applications and is designed to run on a wide range of computers: from single processor workstations to parallel supercomputers. Its structure, functionality, performance, and availability are described.
Compositional reservoir simulation on CM-5 and KSR-1 parallel machines
Ghori, S.G.; Wang, C.H.; Lim, M.T.; Pope, G.A.; Sepehrnoori, K.; Wheeler, M.F.
1995-12-31
Recently, use of parallel machines in reservoir simulation has received considerable attention from the petroleum industry. This paper presents parallelization of a 3D compositional, equation-of-state reservoir simulator on the CM-5 and KSR-1. To the best of the authors` knowledge, this is the first time that the parallelization of a compositional reservoir simulator has been performed on both the CM-5 and KSR-1. For new users of the CM-5 machines, the software and hardware of CM-5 architecture is presented, as well as details of the parallelization techniques. For example, domain decomposition, I/O`s, phase equilibrium computations, and well model are described. The parallelism techniques on the KSR-1 are presented with the emphasis on the porting of the phase equilibrium calculation. The performance of each machine is evaluated by showing the speedup on different sets of processing nodes. Two test problems were used to explore the capability of the parallelized version of the code; one is a waterflood problem and the other is a CO{sub 2} multiple contact miscible flood, both in a West Texas oil field. These field problems were run on 1, 2, 4, 8, 16, and 32 processors to get insight into the locations of communication bottlenecks, generally occurring in the programming with distributed memory machines. The problems of latency and bandwidth which are associated with communication efficiency of the CM-5 are also addressed.
Simulation of optically encoded multiplexing for parallel multipoint sensing.
Babu Rao, C; Chelliah, Pandian; Sahoo, Trilochan
2015-06-20
Spectral emission/absorption-based sensors are commonly used to monitor explosives, narcotics, and other restricted materials in high-security zones such as airports. Monitoring a broad range of spectral wavelengths with high spectral resolution would increase the repertoire of chemicals that can be monitored. However, a portable unit will have limitations in meeting these requirements. Optical fibers can be employed for collecting and transmitting spectral signals from portable sensor heads (PSHs) to a sensitive central spectral analyzer. However, simultaneous detection by sensors in multiple PSHs needs to be differentiated for identifying individual PSHs. An optical encoding method is presented in this paper for use of a portable unit for highly sensitive measurement. The methodology is demonstrated through a simulation using MATLAB Simulink.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Parallelization issues of a code for physically-based simulation of fabrics
NASA Astrophysics Data System (ADS)
Romero, Sergio; Gutiérrez, Eladio; Romero, Luis F.; Plata, Oscar; Zapata, Emilio L.
2004-10-01
The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.
Robust large-scale parallel nonlinear solvers for simulations.
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write
Simulation of reflooding on two parallel heated channel by TRACE
NASA Astrophysics Data System (ADS)
Zakir, Md. Ghulam
2016-07-01
In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.
Parallel Cartesian grid refinement for 3D complex flow simulations
NASA Astrophysics Data System (ADS)
Angelidis, Dionysios; Sotiropoulos, Fotis
2013-11-01
A second order accurate method for discretizing the Navier-Stokes equations on 3D unstructured Cartesian grids is presented. Although the grid generator is based on the oct-tree hierarchical method, fully unstructured data-structure is adopted enabling robust calculations for incompressible flows, avoiding both the need of synchronization of the solution between different levels of refinement and usage of prolongation/restriction operators. The current solver implements a hybrid staggered/non-staggered grid layout, employing the implicit fractional step method to satisfy the continuity equation. The pressure-Poisson equation is discretized by using a novel second order fully implicit scheme for unstructured Cartesian grids and solved using an efficient Krylov subspace solver. The momentum equation is also discretized with second order accuracy and the high performance Newton-Krylov method is used for integrating them in time. Neumann and Dirichlet conditions are used to validate the Poisson solver against analytical functions and grid refinement results to a significant reduction of the solution error. The effectiveness of the fractional step method results in the stability of the overall algorithm and enables the performance of accurate multi-resolution real life simulations. This material is based upon work supported by the Department of Energy under Award Number DE-EE0005482.
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis
NASA Technical Reports Server (NTRS)
Sawyer, Darren Charles
1994-01-01
The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
NASA Technical Reports Server (NTRS)
Krosel, S. M.; Milner, E. J.
1982-01-01
The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
NASA Astrophysics Data System (ADS)
Quillen, Alice C.; Moore, A.
2008-09-01
Planetesimal and dust dynamical simulations require collision and nearest neighbor detection. A brute force implementation for sorting interparticle distances requires O(N2) computations for N particles, limiting the numbers of particles that have been simulated. Parallel algorithms recently developed for the GPU (graphics processing unit), such as the radix sort, can run as fast as O(N) and sort distances between a million particles in a few hundred milliseconds. We introduce improvements in collision and nearest neighbor detection algorithms and how we have incorporated them into our efficient parallel 2nd order democratic heliocentric method symplectic integrator written in NVIDIA's CUDA for the GPU.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Pacheco, P; Miller, P; Kim, J; Leese, T; Zabiyaka, Y
2003-05-07
Object-oriented NeuroSys (ooNeuroSys) is a collection of programs for simulating very large networks of biologically accurate neurons on distributed memory parallel computers. It includes two principle programs: ooNeuroSys, a parallel program for solving the large systems of ordinary differential equations arising from the interconnected neurons, and Neurondiz, a parallel program for visualizing the results of ooNeuroSys. Both programs are designed to be run on clusters and use the MPI library to obtain parallelism. ooNeuroSys also includes an easy-to-use Python interface. This interface allows neuroscientists to quickly develop and test complex neuron models. Both ooNeuroSys and Neurondiz have a design that allows for both high performance and relative ease of maintenance.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
OOPSE: an object-oriented parallel simulation engine for molecular dynamics.
Meineke, Matthew A; Vardeman, Charles F; Lin, Teng; Fennell, Christopher J; Gezelter, J Daniel
2005-02-01
OOPSE is a new molecular dynamics simulation program that is capable of efficiently integrating equations of motion for atom types with orientational degrees of freedom (e.g. "sticky" atoms and point dipoles). Transition metals can also be simulated using the embedded atom method (EAM) potential included in the code. Parallel simulations are carried out using the force-based decomposition method. Simulations are specified using a very simple C-based meta-data language. A number of advanced integrators are included, and the basic integrator for orientational dynamics provides substantial improvements over older quaternion-based schemes.
Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation
NASA Technical Reports Server (NTRS)
Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.
2009-01-01
A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.
Zhao, Jinkui
2011-01-01
IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.
Xyce parallel electronic simulator reference guide, Version 6.0.1.
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
NASA Astrophysics Data System (ADS)
Decker, K. M.; Jayewardena, C.; Rehmann, R.
We describe the library lgtlib, and lgttool, the corresponding development environment for Monte Carlo simulations of lattice gauge theory on multiprocessor vector computers with shared memory. We explain why distributed memory parallel processor (DMPP) architectures are particularly appealing for compute-intensive scientific applications, and introduce the design of a general application and program development environment system for scientific applications on DMPP architectures.
Direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
Carroll, C.C.; Owen, J.E.
1988-05-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Komarov, Ivan; D'Souza, Roshan M
2012-01-01
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Massively parallel simulation of flow and transport in variably saturated porous and fractured media
Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten
2002-01-15
This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP
NASA Astrophysics Data System (ADS)
Fan, Houfu; Li, Shaofan
2017-04-01
In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.
Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP
NASA Astrophysics Data System (ADS)
Fan, Houfu; Li, Shaofan
2016-06-01
In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.
Parallel Simulation of Wave Propagation in Three-Dimensional Poroelastic Media
NASA Astrophysics Data System (ADS)
Sheen, D.; Baag, C.; Tuncay, K.; Ortoleva, P. J.
2003-12-01
Parallelized velocity-stress staggered-grid finite-difference method to simulate wave propagation in 3-D heterogeneous poroelastic media is presented. Biot_s poroelasticity theory is used to study the behavior of wavefield in fluid saturated media. In the poroelasticity theory, the fluid velocities and pressure are included as additional field variables to those for the pure elasticity in order to describe the interaction between pore fluid and solid. Discretization of governing equations for finite-difference approximation is performed for total of 13 components of field variables in 3-D Cartesian coordinates: six components for velocity, six components for solid stress, and a component for fluid pressure. The scheme has fourth-order accuracy in space and second-order accuracy in time. Also, to simulate wave propagation in an unbounded medium, the perfectly matched layer (PML) method is used as an absorbing boundary condition. In contrast with the pure elastic problem, the larger number of components to describe the poroelasticity requires a huge sum of core memory inevitably. In the case of modeling in a realistic scale, the computation is hardly to run on serial platforms. Therefore, the computationally efficient scheme to run on a large parallel environment is required. The parallel implementation is achieved by using a spatial decomposition and the portable message passing interface (MPI) for communication between neighboring processors. Direct comparisons are made for serial and parallel computations. The inevitability and efficiency of parallelization for the poroelastic wave modeling are also demonstrated using model examples.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.
Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code
NASA Technical Reports Server (NTRS)
Yamakov, Vesselin I.
2016-01-01
This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.
Higginson, J S; Neptune, R R; Anderson, F C
2005-09-01
Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
A parallel 3D poisson solver for space charge simulation in cylindrical coordinates.
Xu, J.; Ostroumov, P. N.; Nolen, J.; Physics
2008-02-01
This paper presents the development of a parallel three-dimensional Poisson solver in cylindrical coordinate system for the electrostatic potential of a charged particle beam in a circular tube. The Poisson solver uses Fourier expansions in the longitudinal and azimuthal directions, and Spectral Element discretization in the radial direction. A Dirichlet boundary condition is used on the cylinder wall, a natural boundary condition is used on the cylinder axis and a Dirichlet or periodic boundary condition is used in the longitudinal direction. A parallel 2D domain decomposition was implemented in the (r,{theta}) plane. This solver was incorporated into the parallel code PTRACK for beam dynamics simulations. Detailed benchmark results for the parallel solver and a beam dynamics simulation in a high-intensity proton LINAC are presented. When the transverse beam size is small relative to the aperture of the accelerator line, using the Poisson solver in a Cartesian coordinate system and a Cylindrical coordinate system produced similar results. When the transverse beam size is large or beam center located off-axis, the result from Poisson solver in Cartesian coordinate system is not accurate because different boundary condition used. While using the new solver, we can apply circular boundary condition easily and accurately for beam dynamic simulations in accelerator devices.
Application of parallel computing to seismic damage process simulation of an arch dam
NASA Astrophysics Data System (ADS)
Zhong, Hong; Lin, Gao; Li, Jianbo
2010-06-01
The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
NASA Technical Reports Server (NTRS)
Lyons, Daniel T.; Desai, Prasun N.
2005-01-01
This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.
A method for data handling numerical results in parallel OpenFOAM simulations
Anton, Alin; Muntean, Sebastian
2015-12-31
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
2015-08-01
Atomic /Molecular Massively Parallel Simulator (LAMMPS) Software by N Scott Weingarten and James P Larentzos Approved for...0687 ● AUG 2015 US Army Research Laboratory Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic /Molecular...Shifted Periodic Boundary Conditions in the Large-Scale Atomic /Molecular Massively Parallel Simulator (LAMMPS) Software 5a. CONTRACT NUMBER 5b
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.
2001-08-31
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN.
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.; Mills, R. T.
2014-01-01
To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C
2010-12-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.
Long-time atomistic simulations with the Parallel Replica Dynamics method
NASA Astrophysics Data System (ADS)
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Application of Parallel Discrete Event Simulation to the Space Surveillance Network
NASA Astrophysics Data System (ADS)
Jefferson, D.; Leek, J.
2010-09-01
In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors
Nishiura, Daisuke; Sakaguchi, Hide
2011-03-01
Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
NASA Technical Reports Server (NTRS)
Hesse, Michael; Birn, Joachim
1989-01-01
Properties of the electric field component parallel to the magnetic field (E sub parallel) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component were observed. Particularly emphasized was the spatial location of E(sub parallel), the concept of a diffusion zone and the role of E(sub parallel) in accelerating electrons. A localization of the region of enhanced E(sub parallel) in all space directions with a strong concentration in the z direction was found. This region was identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B(sub y) implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B(sub y) field is strong enough to force particles to follow field lines through the diffusion region. It is estimated that for a typical net B(sub y) field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the Northern (Southern) Hemisphere should dominate for duskward (dawnward) net B(sub y). In addition, a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras was observed.
Modeling of Weakly Collisional Parallel Electron Transport for Edge Plasma Simulations
NASA Astrophysics Data System (ADS)
Umansky, M. V.; Dimits, A. M.; Joseph, I.; Omotani, J. T.; Rognlien, T. D.
2014-10-01
The parallel electron heat transport in a weakly collisional regime can be represented in the framework of the Landau-fluid (LF) model. Practical implementation of LF-based transport models has become possible due to the recent invention of an efficient non- spectral method for the non-local closure operators. Here the implementation of a LF based model for the parallel plasma transport is described, and the model is tested for different collisionality regimes against a Fokker-Plank code. The new method appears to represent weakly collisional parallel electron transport more accurately than the conventional flux-limiter based models; on the other hand it is computationally efficient enough to be used in tokamak edge plasma simulations. Implementation of an LF-based model for the parallel plasma transport in the UEDGE code is described, and applications to realistic divertor simulations are discussed. Work performed for U.S. DoE by LLNL under Contract DE-AC52-07NA27344.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations
NASA Astrophysics Data System (ADS)
Hunsaker, Isaac L.
Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
Parallel solutions for voxel-based simulations of reaction-diffusion systems.
D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan
2014-01-01
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures.
Adaptive finite element simulation of flow and transport applications on parallel computers
NASA Astrophysics Data System (ADS)
Kirk, Benjamin Shelton
The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the
PPM A highly efficient parallel particle mesh library for the simulation of continuum systems
NASA Astrophysics Data System (ADS)
Sbalzarini, I. F.; Walther, J. H.; Bergdorf, M.; Hieber, S. E.; Kotsalis, E. M.; Koumoutsakos, P.
2006-07-01
This paper presents a highly efficient parallel particle-mesh (PPM) library, based on a unifying particle formulation for the simulation of continuous systems. In this formulation, the grid-free character of particle methods is relaxed by the introduction of a mesh for the reinitialization of the particles, the computation of the field equations, and the discretization of differential operators. The present utilization of the mesh does not detract from the adaptivity, the efficient handling of complex geometries, the minimal dissipation, and the good stability properties of particle methods. The coexistence of meshes and particles, allows for the development of a consistent and adaptive numerical method, but it presents a set of challenging parallelization issues that have hindered in the past the broader use of particle methods. The present library solves the key parallelization issues involving particle-mesh interpolations and the balancing of processor particle loading, using a novel adaptive tree for mixed domain decompositions along with a coloring scheme for the particle-mesh interpolation. The high parallel efficiency of the library is demonstrated in a series of benchmark tests on distributed memory and on a shared-memory vector architecture. The modularity of the method is shown by a range of simulations, from compressible vortex rings using a novel formulation of smooth particle hydrodynamics, to simulations of diffusion in real biological cell organelles. The present library enables large scale simulations of diverse physical problems using adaptive particle methods and provides a computational tool that is a viable alternative to mesh-based methods.
Construction of a parallel processor for simulating manipulators and other mechanical systems
NASA Technical Reports Server (NTRS)
Hannauer, George
1991-01-01
This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.
Khan, Md Ashfaquzzaman; Herbordt, Martin C
2011-07-20
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Parallel discrete molecular dynamics simulation with speculation and in-order commitment
NASA Astrophysics Data System (ADS)
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-07-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations
NASA Technical Reports Server (NTRS)
Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.
2013-01-01
Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
NASA Astrophysics Data System (ADS)
Lee, Nicholas Jabari Ouma
Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1998-01-01
The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.
Holkundkar, Amol R.
2013-11-15
The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.
NASA Astrophysics Data System (ADS)
Kim, Eui Joong
Large scale ground motion simulation requires supercomputing systems in order to obtain reliable and useful results within reasonable elapsed time. In this study, we develop a framework for terascale ground motion simulations in highly heterogeneous basins. As part of the development, we present a parallel octree-based multiresolution finite element methodology for the elastodynamic wave propagation problem. The octree-based multiresolution finite element method reduces memory use significantly and improves overall computational performance. The framework is comprised of three parts; (1) an octree-based mesh generator, Euclid developed by TV and O'Hallaron, (2) a parallel mesh partitioner, ParMETIS developed by Karypis et al.[2], and (3) a parallel octree-based multiresolution finite element solver, QUAKE developed in this study. Realistic earthquakes parameters, soil material properties, and sedimentary basins dimensions will produce extremely large meshes. The out-of-core versional octree-based mesh generator, Euclid overcomes the resulting severe memory limitations. By using a parallel, distributed-memory graph partitioning algorithm, ParMETIS partitions large meshes, overcoming the memory and cost problem. Despite capability of the Octree-Based Multiresolution Mesh Method ( OBM3), large problem sizes necessitate parallelism to handle large memory and work requirements. The parallel OBM 3 elastic wave propagation code, QUAKE has been developed to address these issues. The numerical methodology and the framework have been used to simulate the seismic response of both idealized systems and of the Greater Los Angeles basin to simple pulses and to a mainshock of the 1994 Northridge Earthquake, for frequencies of up to 1 Hz and domain size of 80 km x 80 km x 30 km. In the idealized models, QUAKE shows good agreement with the analytical Green's function solutions. In the realistic models for the Northridge earthquake mainshock, QUAKE qualitatively agrees, with at most
NASA Astrophysics Data System (ADS)
Honkonen, I.
2015-03-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN
2009-06-19
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
NASA Astrophysics Data System (ADS)
Shim, Yunsic; Amar, Jacques G.; Uberuaga, B. P.; Voter, A. F.
2007-11-01
We present a method for performing parallel temperature-accelerated dynamics (TAD) simulations over extended length scales. In our method, a two-dimensional spatial decomposition is used along with the recently proposed semirigorous synchronous sublattice algorithm of Shim and Amar [Phys. Rev. B 71, 125432 (2005)]. The scaling behavior of the simulation time as a function of system size is studied and compared with serial TAD in simulations of the early stages of Cu/Cu(100) growth as well as for a simple case of surface relaxation. In contrast to the corresponding serial TAD simulations, for which the simulation time tser increases as a power of the system size N (tser˜Nx) with an exponent x that can be as large as three, in our parallel simulations the simulation time increases only logarithmically with system size. As a result, even for relatively small system sizes our parallel TAD simulations are significantly faster than the corresponding serial TAD simulations. The significantly improved scaling behavior of our parallel TAD simulations over the corresponding serial simulations indicates that our parallel TAD method may be useful in performing simulations over significantly larger length scales than serial TAD, while preserving all the atomistic details provided by the TAD method.
NASA Astrophysics Data System (ADS)
Vashishta, P.; Bachlechner, M. E.; Campbell, T.; Kalia, R. K.; Kikuchi, H.; Kodiyalam, S.; Nakano, A.; Ogata, S.; Shimojo, F.; Walsh, P.
Multiresolution molecular-dynamics approach for multimillion atom simulations has been used to investigate structural properties, mechanical failure in ceramic materials, and atomic-level stresses in nanoscale semiconductor/ceramic mesas (Si/Si3N4). Crack propagation and fracture in silicon nitride, silicon carbide, gallium arsenide, and nanophase ceramics are investigated. We observe a crossover from slow to rapid fracture and a correlation between the speed of crack propagation and morphology of fracture surface. A 100 million atom simulation is carried out to study crack propagation in GaAs. Mechanical failure in the Si/Si3N4 interface is studied by applying tensile strain parallel to the interface. Ten million atom molecular dynamics simulations are performed to determine atomic-level stress distributions in a 54 nm nanopixel on a 0.1 μm silicon substrate. Multimillion atom simulations of oxidation of aluminum nanoclusters and nanoindentation in silicon nitride are also discussed.
Massively parallel Monte Carlo for many-particle simulations on GPUs
Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.
2013-12-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.
Use of Parallel Micro-Platform for the Simulation the Space Exploration
NASA Astrophysics Data System (ADS)
Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen
The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
A novel parallel-rotation algorithm for atomistic Monte Carlo simulation of dense polymer systems
NASA Astrophysics Data System (ADS)
Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.
2001-06-01
We develop and test a new elementary Monte Carlo move for use in the off-lattice simulation of polymer systems. This novel Parallel-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The parallel-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo simulation. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo simulation of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice simulation of realistic models of dense polymer systems.
Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations
Perumalla, Kalyan S
2009-01-01
The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.
Note: Application of a novel 2(3HUS+S) parallel manipulator for simulation of hip joint motion
NASA Astrophysics Data System (ADS)
Shan, X. L.; Cheng, G.; Liu, X. Z.
2016-07-01
In the paper, a novel 2(3HUS+S) parallel manipulator, which has two moving platforms, is proposed. The parallel manipulator is adopted to simulate hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the simulated motions. The experimental results indicate that the 2(3HUS+S) parallel manipulator can realize the simulation of many kinds of hip joint motions without changing the structure size.
Data Parallel Execution Challenges and Runtime Performance of Agent Simulations on GPUs
Perumalla, Kalyan S; Aaby, Brandon G
2008-01-01
Programmable graphics processing units (GPUs) have emerged as excellent computational platforms for certain general-purpose applications. The data parallel execution capabilities of GPUs specifically point to the potential for effective use in simulations of agent-based models (ABM). In this paper, the computational efficiency of ABM simulation on GPUs is evaluated on representative ABM benchmarks. The runtime speed of GPU-based models is compared to that of traditional CPU-based implementation, and also to that of equivalent models in traditional ABM toolkits (Repast and NetLogo). As expected, it is observed that, GPU-based ABM execution affords excellent speedup on simple models, with better speedup on models exhibiting good locality and fair amount of computation per memory element. Execution is two to three orders of magnitude faster with a GPU than with leading ABM toolkits, but at the cost of decrease in modularity, ease of programmability and reusability. At a more fundamental level, however, the data parallel paradigm is found to be somewhat at odds with traditional model-specification approaches for ABM. Effective use of data parallel execution, in general, seems to require resolution of modeling and execution challenges. Some of the challenges are identified and related solution approaches are described.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
NASA Astrophysics Data System (ADS)
Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro
2016-08-01
We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H
2006-09-04
We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM
Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.
A fast parallel Poisson solver on irregular domains applied to beam dynamics simulations
Adelmann, A. Arbenz, P. Ineichen, Y.
2010-06-20
We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or 'mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our iterative solver with an FFT-based solver that is more commonly used for applications in beam dynamics.
Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems
Martinez, E.; Monasterio, P.R.; Marian, J.
2011-02-20
An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.
NASA Astrophysics Data System (ADS)
He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.
2015-03-01
This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.
Xyce parallel electronic simulator design : mathematical formulation, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.
2004-06-01
This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.
NASA Astrophysics Data System (ADS)
Yeh, Mei-Ling
We have performed a parallel decomposition of the fictitious Lagrangian method for molecular dynamics with tight-binding total energy expression into the hypercube computer. This is the first time in literature that the dynamical simulation of semiconducting systems containing more than 512 silicon atoms has become possible with the electrons treated as quantum particles. With the utilization of the Intel Paragon system, our timing analysis predicts that our code is expected to perform realistic simulations on very large systems consisting of thousands of atoms with time requirements of the order of tens of hours. Timing results and performance analysis of our parallel code are presented in terms of calculation time, communication time, and setup time. The accuracy of the fictitious Lagrangian method in molecular dynamics simulation is also investigated, especially the energy conservation of the total energy of ions. We find that the accuracy of the fictitious Lagrangian scheme in small silicon cluster and very large silicon system simulations is good for as long as the simulations proceed, even though we quench the electronic coordinates to the Born-Oppenheimer surface only in the beginning of the run. The kinetic energy of electrons does not increase as time goes on, and the energy conservation of the ionic subsystem remains very good. This means that, as far as the ionic subsystem is concerned, the electrons are on the average in the true quantum ground states. We also tie up some odds and ends regarding a few remaining questions about the fictitious Lagrangian method, such as the difference between the results obtained from the Gram-Schmidt and SHAKE method of orthonormalization, and differences between simulations where the electrons are quenched to the Born -Oppenheimer surface only once compared with periodic quenching.
NASA Astrophysics Data System (ADS)
Honkonen, I.
2014-07-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for parallel programming is also provided by allowing users to select which simulation variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2016-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
Accelerating groundwater flow simulation in MODFLOW using JASMIN-based parallel computing.
Cheng, Tangpei; Mo, Zeyao; Shao, Jingli
2014-01-01
To accelerate the groundwater flow simulation process, this paper reports our work on developing an efficient parallel simulator through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and parallel algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is simulating flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine times faster than the GMG-based MODFLOW program. The third test is a simplified transient flow problem with the order of tens of millions of cells to examine the scalability. Compared to 32 cores, parallel efficiency of 77 and 68% are obtained on 512 and 1024 cores, respectively, which indicates impressive scalability.
Simulation of Unsteady Combustion in a Ramjet Engine Using a Highly Parallel Computer
NASA Technical Reports Server (NTRS)
Menon, Suresh; Weeratunga, Sisira; Cooper, D. M. (Technical Monitor)
1994-01-01
Combustion instability in ramjets is a complex phenomenon that involve nonlinear interaction between acoustic waves, vortex motion and unsteady heat release in the combustor. To numerically simulate this 3-D, transient phenomenon, enormous computer resources (time, memory and disk storage) are required. Although current generation vector supercomputers are capable of providing adequate resources for simulations of this nature, their high cost and limited availability, makes such machines less than satisfactory for routine use. The primary focus of this study is to assess the feasibility of using highly parallel computer systems as a cost-effective alternative for conducting such unsteady flow simulations. Towards this end, a large-eddy simulation model for combustion instability was implemented on the Intel iPSC/860 and a careful study was conducted to determine the benefits and the problems associated with the use of such machines for transient simulations. Details of this study along with the results obtained from the unsteady combustion simulations carried out on the iPSC/860 are discussed in this paper.
NASA Astrophysics Data System (ADS)
Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles
2007-11-01
Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.
A general parallelization strategy for random path based geostatistical simulation methods
NASA Astrophysics Data System (ADS)
Mariethoz, Grégoire
2010-07-01
The size of simulation grids used for numerical models has increased by many orders of magnitude in the past years, and this trend is likely to continue. Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. As cluster computers become widely available, using parallel strategies is a natural step for increasing the usable grid size and the complexity of the models. These strategies must profit from of the possibilities offered by machines with a large number of processors. On such machines, the bottleneck is often the communication time between processors. We present a strategy distributing grid nodes among all available processors while minimizing communication and latency times. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node every time. The key is to decouple the sending and the receiving operations to avoid synchronization. Centralization allows having a conflict management system ensuring that nodes being simulated simultaneously do not interfere in terms of neighborhood. The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods.
Stochastic simulation of charged particle transport on the massively parallel processor
NASA Technical Reports Server (NTRS)
Earl, James A.
1988-01-01
Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively Parallel Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These simulations have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.
3-D Hybrid Simulation of Quasi-Parallel Bow Shock and Its Effects on the Magnetosphere
Lin, Y.; Wang, X.Y.
2005-08-01
A three-dimensional (3-D) global-scale hybrid simulation is carried out for the structure of the quasi-parallel bow shock, in particular the foreshock waves and pressure pulses. The wave evolution and interaction with the dayside magnetosphere are discussed. It is shown that diamagnetic cavities are generated in the turbulent foreshock due to the ion beam plasma interaction, and these compressional pulses lead to strong surface perturbations at the magnetopause and Alfven waves/field line resonance in the magnetosphere.
Alpha-Helix Formation in C-Peptide Rnase-A Investigated by Parallel Tempering Simulations
NASA Astrophysics Data System (ADS)
Gökoğlu, Gökhan; Çelik, Tarik
We have performed parallel tempering simulations of a 13-residue peptide fragment of ribonuclease-A, c-peptide, in implicit solvent with constant dielectric permittivity. This peptide has a strong tendency to form α-helical conformations in solvent as suggested by circular dichroism (CD) and nuclear magnetic resonance (NMR) experiments. Our results demonstrate that 5th and 8-12 residues are in the α-helical region of the Ramachandran map for global minimum energy state in solvent environment. Effects of salt bridge formation on stability of α-helix structure are discussed.
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Massive parallel simulation of phenomena in condensed matter at high energy density
NASA Astrophysics Data System (ADS)
Fortov, Vladimir
2005-03-01
This talk deals with computational hydrodynamics, advanced material properties and phenomena at high energy density. New results of massive parallel 3D simulation done with method of individual particles in cells have been obtained. The gas dynamic code includes advanced physical models of matter such as multi-phase equations of state, elastic-plastic, spallation, optic properties and ion-beams stopping. Investigated are the influence on hypervelocity impact processes effects of equation of state, elastic-plastic and spallation. We also report results of numerical modeling of the action of intense heavy ion beams on metallic targets in comparison with new experimental data.
Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop
Ghosh, K K
2011-11-07
Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.
Parallel lattice Boltzmann simulation of bubble rising and coalescence in viscous flows
NASA Astrophysics Data System (ADS)
Shi, Dongyan; Wang, Zhikai
2015-07-01
A parallel three-dimensional lattice Boltzmann scheme for multicomponent immiscible fluids is proposed to simulate bubble rising and coalescence process in viscous flows. The lattice Boltzmann scheme is based on the free-energy model and is parallelized in the share-memory model by using the OpenMP. Bubble interface is described by a diffusion interface method solving the Cahn-Hilliard equation and both the surface tension force and the buoyancy are introduced in a form of discrete body force. To avoid the numerical instability caused by the interface deformation, the 18 point finite difference scheme is utilized to calculate the first- and second-order space derivative. The correction of the parallel scheme handling three-dimensional interfaces is verified by the Laplace law and the dynamic characteristics of an isolated bubble in stationary flows. Subsequently, effects of the initially relative position, accompanied by the size ratio on bubble-bubble interaction are studied. The results show that the present scheme can effectively describe the bubble interface dynamics, even if rupture and restructure occurs. In addition to the repulsion and coalescence phenomenon due to the relative position, the size ratio also plays an insignificant role in bubble deformation and trajectory.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097
Parallel Simulation of Three-Dimensional Free-Surface Fluid Flow Problems
BAER,THOMAS A.; SUBIA,SAMUEL R.; SACKINGER,PHILIP A.
2000-01-18
We describe parallel simulations of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact lines. The Galerlin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of problem unknowns. Issues concerning the proper constraints along the solid-fluid dynamic contact line in three dimensions are discussed. Parallel computations are carried out for an example taken from the coating flow industry, flow in the vicinity of a slot coater edge. This is a three-dimensional free-surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another part of the flow domain. Discussion focuses on parallel speedups for fixed problem size, a class of problems of immediate practical importance.
Simulation of optical devices using parallel finite-difference time-domain method
NASA Astrophysics Data System (ADS)
Li, Kang; Kong, Fanmin; Mei, Liangmo; Liu, Xin
2005-11-01
This paper presents a new parallel finite-difference time-domain (FDTD) numerical method in a low-cost network environment to stimulate optical waveguide characteristics. The PC motherboard based cluster is used, as it is relatively low-cost, reliable and has high computing performance. Four clusters are networked by fast Ethernet technology. Due to the simplicity nature of FDTD algorithm, a native Ethernet packet communication mechanism is used to reduce the overhead of the communication between the adjacent clusters. To validate the method, a microcavity ring resonator based on semiconductor waveguides is chosen as an instance of FDTD parallel computation. Speed-up rate under different division density is calculated. From the result we can conclude that when the decomposing size reaches a certain point, a good parallel computing speed up will be maintained. This simulation shows that through the overlapping of computation and communication method and controlling the decomposing size, the overhead of the communication of the shared data will be conquered. The result indicates that the implementation can achieve significant speed up for the FDTD algorithm. This will enable us to tackle the larger real electromagnetic problem by the low-cost PC clusters.
Pei, Yidong; Pei, Baoqing; Li, Hui; Fan, Yubo
2013-01-01
In view of the shortage of medical equipment road transportation simulation platform, we put forward a road transportation simulation method based on 6-DOF parallel robots. A 3D road spectrum model was built by the improvement of the harmonic superposition method. The simulation model was then compared with the standard model to verify its performance. Taking the road spectrum as the excitation, we could get the robot motion data to control the parallel robot through the S-shaped linear interpolation of the absolute position. It can simulate the movement of vehicles with different speed under various road conditions efficiently and accurately.
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS
Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.
2013-02-15
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.
A Parallel Monte Carlo Code for Simulating Collisional N-body Systems
NASA Astrophysics Data System (ADS)
Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.
2013-02-01
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N ~ 107 particles. Our code is based on the Hénon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 105 to 107. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within <~ 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 105, 128 for N = 106 and 256 for N = 107. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60×, 100×, and 220×, respectively.
Parallel computing simulation of electrical excitation and conduction in the 3D human heart.
Di Yu; Dongping Du; Hui Yang; Yicheng Tu
2014-01-01
A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart simulation model with the use of massive parallel computing with GPGPU and OpenGL. The simulation algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based simulation outperforms the conventional CPU-based approach and significantly improves the speed of simulation. By adopting modern computer architecture, this present investigation enables real-time simulation and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart.
Validating the simulation of large-scale parallel applications using statistical characteristics
Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; ...
2016-03-01
Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodologymore » and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.« less
Validating the simulation of large-scale parallel applications using statistical characteristics
Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; Dechev, Damian
2016-03-01
Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodology and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.
Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; ...
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less
Simulating massively parallel electron beam inspection for sub-20 nm defects
NASA Astrophysics Data System (ADS)
Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt
2015-03-01
SEMATECH has initiated a program to develop massively-parallel electron beam defect inspection (MPEBI). Here we use JMONSEL simulations to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these simulated images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.
Deiterding, Ralf; Wood, Stephen L
2013-01-01
We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.
Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development
NASA Technical Reports Server (NTRS)
Mangieri, Mark L.; Hoang, June
2014-01-01
NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.
NASA Astrophysics Data System (ADS)
van der Kaap, N. J.; Koster, L. J. A.
2016-02-01
A parallel, lattice based Kinetic Monte Carlo simulation is developed that runs on a GPGPU board and includes Coulomb like particle-particle interactions. The performance of this computationally expensive problem is improved by modifying the interaction potential due to nearby particle moves, instead of fully recalculating it. This modification is achieved by adding dipole correction terms that represent the particle move. Exact evaluation of these terms is guaranteed by representing all interactions as 32-bit floating numbers, where only the integers between -222 and 222 are used. We validate our method by modelling the charge transport in disordered organic semiconductors, including Coulomb interactions between charges. Performance is mainly governed by the particle density in the simulation volume, and improves for increasing densities. Our method allows calculations on large volumes including particle-particle interactions, which is important in the field of organic semiconductors.
MPI parallelization of full PIC simulation code with Adaptive Mesh Refinement
NASA Astrophysics Data System (ADS)
Matsui, Tatsuki; Nunami, Masanori; Usui, Hideyuki; Moritaka, Toseo
2010-11-01
A new parallelization technique developed for PIC method with adaptive mesh refinement (AMR) is introduced. In AMR technique, the complicated cell arrangements are organized and managed as interconnected pointers with multiple resolution levels, forming a fully threaded tree structure as a whole. In order to retain this tree structure distributed over multiple processes, remote memory access, an extended feature of MPI2 standards, is employed. Another important feature of the present simulation technique is the domain decomposition according to the modified Morton ordering. This algorithm can group up the equal number of particle calculation loops, which allows for the better load balance. Using this advanced simulation code, preliminary results for basic physical problems are exhibited for the validity check, together with the benchmarks to test the performance and the scalability.
A Many-Task Parallel Approach for Multiscale Simulations of Subsurface Flow and Reactive Transport
Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.
2014-12-16
Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall simulation domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in parallel; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo simulations of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing parallel continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale simulations, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface simulations with minimal overhead.
Zonal methods for the parallel execution of range-limited N-body simulations
Bowers, Kevin J.; Dror, Ron O.; Shaw, David E. . E-mail: david@deshaw.com
2007-01-20
Particle simulations in fields ranging from biochemistry to astrophysics require the evaluation of interactions between all pairs of particles separated by less than some fixed interaction radius. The applicability of such simulations is often limited by the time required for calculation, but the use of massive parallelism to accelerate these computations is typically limited by inter-processor communication requirements. Recently, Snir [M. Snir, A note on N-body computations with cutoffs, Theor. Comput. Syst. 37 (2004) 295-318] and Shaw [D.E. Shaw, A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions, J. Comput. Chem. 26 (2005) 1318-1328] independently introduced two distinct methods that offer asymptotic reductions in the amount of data transferred between processors. In the present paper, we show that these schemes represent special cases of a more general class of methods, and introduce several new algorithms in this class that offer practical advantages over all previously described methods for a wide range of problem parameters. We also show that several of these algorithms approach an approximate lower bound on inter-processor data transfer.
SDA 7: A modular and parallel implementation of the simulation of diffusional association software
Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan
2015-01-01
The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630
Parallel simulation of HGMS of weakly magnetic nanoparticles in irrotational flow of inviscid fluid.
Hournkumnuard, Kanok; Dolwithayakul, Banpot; Chantrapornchai, Chantana
2014-01-01
The process of high gradient magnetic separation (HGMS) using a microferromagnetic wire for capturing weakly magnetic nanoparticles in the irrotational flow of inviscid fluid is simulated by using parallel algorithm developed based on openMP. The two-dimensional problem of particle transport under the influences of magnetic force and fluid flow is considered in an annular domain surrounding the wire with inner radius equal to that of the wire and outer radius equal to various multiples of wire radius. The differential equations governing particle transport are solved numerically as an initial and boundary values problem by using the finite-difference method. Concentration distribution of the particles around the wire is investigated and compared with some previously reported results and shows the good agreement between them. The results show the feasibility of accumulating weakly magnetic nanoparticles in specific regions on the wire surface which is useful for applications in biomedical and environmental works. The speedup of parallel simulation ranges from 1.8 to 21 depending on the number of threads and the domain problem size as well as the number of iterations. With the nature of computing in the application and current multicore technology, it is observed that 4-8 threads are sufficient to obtain the optimized speedup.
A scalable parallel Stokesian Dynamics method for the simulation of colloidal suspensions
NASA Astrophysics Data System (ADS)
Bülow, F.; Hamberger, P.; Nirschl, H.; Dörfler, W.
2016-07-01
We have developed a new method for the efficient numerical simulation of colloidal suspensions. This method is designed and especially well-suited for parallel code execution, but it can also be applied to single-core programs. It combines the Stokesian Dynamics method with a variant of the widely used Barnes-Hut algorithm in order to reduce computational costs. This combination and the inherent parallelization of the method make simulations of large numbers of particles within days possible. The level of accuracy can be determined by the user and is limited by the truncation of the used multipole expansion. Compared to the original Stokesian Dynamics method the complexity can be reduced from O(N2) to linear complexity for dilute suspensions of strongly clustered particles, N being the number of particles. In case of non-clustered particles in a dense suspension, the complexity depends on the particle configuration and is between O(N) and O(Pnp,max2) , where P is the number of used processes and np,max = ⌈ N / P ⌉ , respectively.
Ion equation of state in quasi-parallel shocks - A simulation result
NASA Technical Reports Server (NTRS)
Mandt, M. E.; Kan, J. R.
1988-01-01
Ion equation of state in the quasi-parallel collisionless shock is deduced from simulation results. The simulations were performed for theta(bn) = 10 deg, beta = 0.5 and M sub A in the range from 1.2 to 8, where M sub A is the Alfven Mach number, beta is the upstream ratio of plasma pressure to magnetic pressure, and theta(bn) is the angle between the shock normal and the upstream magnetic field. The equation of state can be approximated by a power law with different exponents in the upstream and downstream sides of the shock transition region. The exponent in the upstream side of the transition region is much greater than the adiabatic value of 5/3 and increases with M sub A. The exponent in the downstream side of the transition region is slightly less than 5/3. The results show that ion heating in the quasi-parallel shock is highly nonadiabatic with a large increase in entropy and in temperature ratio in the upstream side of the transition region, while the heating is highly isentropic with a large increase in temperature difference across the principal density jump in the downstream side of the transition region.
Parallel Simulation of HGMS of Weakly Magnetic Nanoparticles in Irrotational Flow of Inviscid Fluid
Hournkumnuard, Kanok
2014-01-01
The process of high gradient magnetic separation (HGMS) using a microferromagnetic wire for capturing weakly magnetic nanoparticles in the irrotational flow of inviscid fluid is simulated by using parallel algorithm developed based on openMP. The two-dimensional problem of particle transport under the influences of magnetic force and fluid flow is considered in an annular domain surrounding the wire with inner radius equal to that of the wire and outer radius equal to various multiples of wire radius. The differential equations governing particle transport are solved numerically as an initial and boundary values problem by using the finite-difference method. Concentration distribution of the particles around the wire is investigated and compared with some previously reported results and shows the good agreement between them. The results show the feasibility of accumulating weakly magnetic nanoparticles in specific regions on the wire surface which is useful for applications in biomedical and environmental works. The speedup of parallel simulation ranges from 1.8 to 21 depending on the number of threads and the domain problem size as well as the number of iterations. With the nature of computing in the application and current multicore technology, it is observed that 4–8 threads are sufficient to obtain the optimized speedup. PMID:24955411
Optimized simulations of Olami-Feder-Christensen systems using parallel algorithms
NASA Astrophysics Data System (ADS)
Dominguez, Rachele; Necaise, Rance; Montag, Eric
The sequential nature of the Olami-Feder-Christensen (OFC) model for earthquake simulations limits the benefits of parallel computing approaches because of the frequent communication required between processors. We developed a parallel version of the OFC algorithm for multi-core processors. Our data, even for relatively small system sizes and low numbers of processors, indicates that increasing the number of processors provides significantly faster simulations; producing more efficient results than previous attempts that used network-based Beowulf clusters. Our algorithm optimizes performance by exploiting the multi-core processor architecture, minimizing communication time in contrast to the networked Beowulf-cluster approaches. Our multi-core algorithm is the basis for a new algorithm using GPUs that will drastically increase the number of processors available. Previous studies incorporating realistic structural features of faults into OFC models have revealed spatial and temporal patterns observed in real earthquake systems. The computational advances presented here will allow for studying interacting networks of faults, rather than individual faults, further enhancing our understanding of the relationship between the earth's structure and the triggering process. Support for this project comes from the Chenery Research Fund, the Rashkind Family Endowment, the Walter Williams Craigie Teaching Endowment, and the Schapiro Undergraduate Research Fellowship.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.
Halic, Tansel; Ahn, Woojin; De, Suvranu
2014-01-01
This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
Parallel grid library with adaptive mesh refinement for development of highly scalable simulations
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2012-04-01
As the single CPU core performance is saturating while the number of cores in the fastest supercomputers increases exponentially, the parallel performance of simulations on distributed memory machines is crucial. At the same time, utilizing efficiently the large number of available cores presents a challenge, especially in simulations with run-time adaptive mesh refinement. We have developed a generic grid library (dccrg) aimed at finite volume simulations that is easy to use and scales well up to tens of thousands of cores. The grid has several attractive features: It 1) allows an arbitrary C++ class or structure to be used as cell data; 2) provides a simple interface for adaptive mesh refinement during a simulation; 3) encapsulates the details of MPI communication when updating the data of neighboring cells between processes; and 4) provides a simple interface to run-time load balancing, e.g. domain decomposition, through the Zoltan library. Dccrg is freely available for anyone to use, study and modify under the GNU Lesser General Public License v3. We will present the implementation of dccrg, simple and advanced usage examples and scalability results on various supercomputers and problems.
Simulation verification of SNR and parallel imaging improvements by ICE-decoupled loop array in MRI.
Yan, Xinqiang; Cao, Zhipeng; Zhang, Xiaoliang
2016-04-01
Transmit/receive L/C loop arrays with the induced current elimination (ICE) or magnetic wall decoupling method has shown high signal-to-noise (SNR) and excellent parallel imaging ability for MR imaging at ultrahigh fields, e.g., 7 T. In this study, we aim to numerically analyze the performance of an eight-channel ICE-decoupled loop array at 7 T. Three dimensional (3-D) electromagnetic (EM) and radiofrequency (RF) circuit co-simulation approach was employed. The values of all capacitors were obtained by optimizing the S-parameters of all coil elements. The EM simulation accurately modeled the coil structure, the phantom and the excitation. All coil elements were well matched to 50 ohm and the isolation between any two coil elements was better -15 dB. The simulated S parameters were exactly similar with the experimental results, indicating the simulation results were reliable. Compared with the conventional capacitively decoupled array, the ICE-decoupled array had higher sensitivity at the peripheral areas of the image subjects due to the shielding effect of the decoupling loops. The increased receive sensitivity resulted in an improvement of signal intensity and SNR for the ICE-decoupled array.
NASA Technical Reports Server (NTRS)
Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.
1994-01-01
Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.
Levin, A.E. ); Montgomery, B.H. )
1990-01-01
The Thermal-Hydraulic Out of Reactor Safety (THORS) Program at Oak Ridge National Laboratory (ORNL) had as its objective the testing of simulated, electrically heated liquid metal reactor (LMR) fuel assemblies in an engineering-scale, sodium loop. Between 1971 and 1985, the THORS Program operated 11 simulated fuel bundles in conditions covering a wide range of normal and off-normal conditions. The last test series in the Program, THORS-SHRS Assembly 1, employed two parallel, 19-pin, full-length, simulated fuel assemblies of a design consistent with the large LMR (Large Scale Prototype Breeder -- LSPB) under development at that time. These bundles were installed in the THORS Facility, allowing single- and parallel-bundle testing in thermal-hydraulic conditions up to and including sodium boiling and dryout. As the name SHRS (Shutdown Heat Removal System) implies, a major objective of the program was testing under conditions expected during low-power reactor operation, including low-flow forced convection, natural convection, and forced-to-natural convection transition at various powers. The THORS-SHRS Assembly 1 experimental program was divided up into four phases. Phase 1 included preliminary and shakedown tests, including the collection of baseline steady-state thermal-hydraulic data. Phase 2 comprised natural convection testing. Forced convection testing was conducted in Phase 3. The final phase of testing included forced-to-natural convection transition tests. Phases 1, 2, and 3 have been discussed in previous papers. The fourth phase is described in this paper. 3 refs., 2 figs.
Jung, Jaewoon; Naurse, Akira; Kobayashi, Chigusa; Sugita, Yuji
2016-10-11
The graphics processing unit (GPU) has become a popular computational platform for molecular dynamics (MD) simulations of biomolecules. A significant speedup in the simulations of small- or medium-size systems using only a few computer nodes with a single or multiple GPUs has been reported. Because of GPU memory limitation and slow communication between GPUs on different computer nodes, it is not straightforward to accelerate MD simulations of large biological systems that contain a few million or more atoms on massively parallel supercomputers with GPUs. In this study, we develop a new scheme in our MD software, GENESIS, to reduce the total computational time on such computers. Computationally intensive real-space nonbonded interactions are computed mainly on GPUs in the scheme, while less intensive bonded interactions and communication-intensive reciprocal-space interactions are performed on CPUs. On the basis of the midpoint cell method as a domain decomposition scheme, we invent the single particle interaction list for reducing the GPU memory usage. Since total computational time is limited by the reciprocal-space computation, we utilize the RESPA multiple time-step integration and reduce the CPU resting time by assigning a subset of nonbonded interactions on CPUs as well as on GPUs when the reciprocal-space computation is skipped. We validated our GPU implementations in GENESIS on BPTI and a membrane protein, porin, by MD simulations and an alanine-tripeptide by REMD simulations. Benchmark calculations on TSUBAME supercomputer showed that an MD simulation of a million atoms system was scalable up to 256 computer nodes with GPUs.
Trinitis, C; Bader, M; Schulz, M
2009-06-09
In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Significant progress in CPU architecture (multi- and many-core CPUs, SMT, transactional memory, virtualization support, shared caches etc.) system scalability, and interconnect technology, continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in algorithms, simulation techniques, and software integration from multiple disciplines. In its 8th year, ParSim continues to build a bridge between application disciplines and computer science and to help fostering closer cooperations between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. We believe that this offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables participants to present and discuss their work within the scope of both the session and the host conference. This year, five papers from authors in five countries were submitted to Par-Sim, and we selected three of them. They cover a range of different application fields including mechanical engineering, material science, and structural engineering simulations. We are confident that this resulted in an attractive special session and that this will be an informal setting for lively discussions as well as for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Jan Westerholm, Juha
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a
Weaver, R. P.; Gittings, M. L.
2004-01-01
The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively parallel hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-parallel versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest simulations we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these simulations. Key features of any massively parallel system
NASA Astrophysics Data System (ADS)
Zhou, Jun
The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP
NASA Astrophysics Data System (ADS)
Hobson, T.; Clarkson, V.
2012-09-01
As a result of continual space activity since the 1950s, there are now a large number of man-made Resident Space Objects (RSOs) orbiting the Earth. Because of the large number of items and their relative speeds, the possibility of destructive collisions involving important space assets is now of significant concern to users and operators of space-borne technologies. As a result, a growing number of international agencies are researching methods for improving techniques to maintain Space Situational Awareness (SSA). Computer simulation is a method commonly used by many countries to validate competing methodologies prior to full scale adoption. The use of supercomputing and/or reduced scale testing is often necessary to effectively simulate such a complex problem on todays computers. Recently the authors presented a simulation aimed at reducing the computational burden by selecting the minimum level of fidelity necessary for contrasting methodologies and by utilising multi-core CPU parallelism for increased computational efficiency. The resulting simulation runs on a single PC while maintaining the ability to effectively evaluate competing methodologies. Nonetheless, the ability to control the scale and expand upon the computational demands of the sensor management system is limited. In this paper, we examine the advantages of increasing the parallelism of the simulation by means of General Purpose computing on Graphics Processing Units (GPGPU). As many sub-processes pertaining to SSA management are independent, we demonstrate how parallelisation via GPGPU has the potential to significantly enhance not only research into techniques for maintaining SSA, but also to enhance the level of sophistication of existing space surveillance sensors and sensor management systems. Nonetheless, the use of GPGPU imposes certain limitations and adds to the implementation complexity, both of which require consideration to achieve an effective system. We discuss these challenges and
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with
A package of Linux scripts for the parallelization of Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Badal, Andreu; Sempau, Josep
2006-09-01
Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) simulation involves a prohibitively large amount of time. This limitation can be overcome by having recourse to parallel computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a parallelization scheme of a MC simulation that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the parallel calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to parallelize PENELOPE, a code for MC simulation of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in parallel easily, without requiring specific libraries or significant alterations of the
MPI parallelization of Vlasov codes for the simulation of nonlinear laser-plasma interactions
NASA Astrophysics Data System (ADS)
Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.
2003-10-01
The simulation of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive parallelization. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for parallel computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.
2011-08-15
There has been a great concern about the origin of the parallel electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to simulate magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with simulation results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when parallel current flows. The electron convection term enables us to describe a situation in which a parallel electric field and parallel electron acceleration coexist, which is impossible for ideal or resistive MHD.
A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters
NASA Astrophysics Data System (ADS)
Pathak, Ashish; Raessi, Mehdi
2015-11-01
We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.
Kolakowska, A; Novotny, M A; Korniss, G
2003-04-01
We consider parallel simulations for asynchronous systems employing L processing elements that are arranged on a ring. Processors communicate only among the nearest neighbors and advance their local simulated time only if it is guaranteed that this does not violate causality. In simulations with no constraints, in the infinite L limit the utilization scales [Korniss et al., Phys. Rev. Lett. 84, 1351 (2000)]; but, the width of the virtual time horizon diverges (i.e., the measurement phase of the algorithm does not scale). In this work, we introduce a moving Delta-window global constraint, which modifies the algorithm so that the measurement phase scales as well. We present results of systematic studies in which the system size (i.e., L and the volume load per processor) as well as the constraint are varied. The Delta constraint eliminates the extreme fluctuations in the virtual time horizon, provides a bound on its width, and controls the average progress rate. The width of the Delta window can serve as a tuning parameter that, for a given volume load per processor, could be adjusted to optimize the utilization, so as to maximize the efficiency. This result may find numerous applications in modeling the evolution of general spatially extended short-range interacting systems with asynchronous dynamics, including dynamic Monte Carlo studies.
Parallel tempering Monte Carlo simulations of lysozyme orientation on charged surfaces
NASA Astrophysics Data System (ADS)
Xie, Yun; Zhou, Jian; Jiang, Shaoyi
2010-02-01
In this work, the parallel tempering Monte Carlo (PTMC) algorithm is applied to accurately and efficiently identify the global-minimum-energy orientation of a protein adsorbed on a surface in a single simulation. When applying the PTMC method to simulate lysozyme orientation on charged surfaces, it is found that lysozyme could easily be adsorbed on negatively charged surfaces with "side-on" and "back-on" orientations. When driven by dominant electrostatic interactions, lysozyme tends to be adsorbed on negatively charged surfaces with the side-on orientation for which the active site of lysozyme faces sideways. The side-on orientation agrees well with the experimental results where the adsorbed orientation of lysozyme is determined by electrostatic interactions. As the contribution from van der Waals interactions gradually dominates, the back-on orientation becomes the preferred one. For this orientation, the active site of lysozyme faces outward, which conforms to the experimental results where the orientation of adsorbed lysozyme is co-determined by electrostatic interactions and van der Waals interactions. It is also found that despite of its net positive charge, lysozyme could be adsorbed on positively charged surfaces with both "end-on" and back-on orientations owing to the nonuniform charge distribution over lysozyme surface and the screening effect from ions in solution. The PTMC simulation method provides a way to determine the preferred orientation of proteins on surfaces for biosensor and biomaterial applications.
Experiences with serial and parallel algorithms for channel routing using simulated annealing
NASA Technical Reports Server (NTRS)
Brouwer, Randall Jay
1988-01-01
Two algorithms for channel routing using simulated annealing are presented. Simulated annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of simulated annealing. The algorithm was implemented as a serial program for a workstation, and a parallel program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.
Mechanisms for the convergence of time-parallelized, parareal turbulent plasma simulations
Reynolds-Barredo, J.; Newman, David E; Sanchez, R.; Samaddar, D.; Berry, Lee A; Elwasif, Wael R
2012-01-01
Parareal is a recent algorithm able to parallelize the time dimension in spite of its sequential nature. It has been applied to several linear and nonlinear problems and, very recently, to a simulation of fully-developed, two-dimensional drift wave turbulence. The mere fact that parareal works in such a turbulent regime is in itself somewhat unexpected, due to the characteristic sensitivity of turbulence to any change in initial conditions. This fundamental property of any turbulent system should render the iterative correction procedure characteristic of the parareal method inoperative, but this seems not to be the case. In addition, the choices that must be made to implement parareal (division of the temporal domain, election of the coarse solver and so on) are currently made using trial-and-error approaches. Here, we identify the mechanisms responsible for the convergence of parareal of these simulations of drift wave turbulence. We also investigate which conditions these mechanisms impose on any successful parareal implementation. The results reported here should be useful to guide future implementations of parareal within the much wider context of fully-developed fluid and plasma turbulent simulations.
Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics
NASA Technical Reports Server (NTRS)
Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak
2016-01-01
This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.
Parametric decay of a parallel propagating monochromatic whistler wave: Particle-in-cell simulations
NASA Astrophysics Data System (ADS)
Ke, Yangguang; Gao, Xinliang; Lu, Quanming; Wang, Shui
2017-01-01
In this paper, by using one-dimensional (1-D) particle-in-cell simulations, we investigate the parametric decay of a parallel propagating monochromatic whistler wave with various wave frequencies and amplitudes. The pump whistler wave can decay into a backscattered daughter whistler wave and an ion acoustic wave, and the decay instability grows more rapidly with the increase of the frequency or amplitude. When the frequency or amplitude is sufficiently large, a multiple decay process may occur, where the daughter whistler wave undergoes a secondary decay into an ion acoustic wave and a forward propagating whistler wave. We also find that during the parametric decay a considerable part of protons can be accelerated along the background magnetic field by the enhanced ion acoustic wave through the Landau resonance. The implication of the parametric decay to the evolution of whistler waves in Earth's magnetosphere is also discussed in the paper.
Hybrid simulations of a parallel collisionless shock in the large plasma device
NASA Astrophysics Data System (ADS)
Weidl, Martin S.; Winske, Dan; Jenko, Frank; Niemann, Chris
2016-12-01
We present two-dimensional hybrid kinetic/magnetohydrodynamic simulations of planned laser-ablation experiments in the Large Plasma Device. Our results, based on parameters that have been validated in previous experiments, show that a parallel collisionless shock can begin forming within the available space. Carbon-debris ions that stream along the magnetic-field direction with a blow-off speed of four times the Alfvén velocity excite strong magnetic fluctuations, eventually transferring part of their kinetic energy to the surrounding hydrogen ions. This acceleration and compression of the background plasma creates a shock front, which satisfies the Rankine-Hugoniot conditions and can therefore propagate on its own. Furthermore, we analyze the upstream turbulence and show that it is dominated by the right-hand resonant instability.
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.
2013-05-23
Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach was evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.
Implementation of a parallel algorithm for thermo-chemical nonequilibrium flow simulations
NASA Astrophysics Data System (ADS)
Wong, C. C.; Blottner, F. G.; Payne, J. L.; Soetrisno, M.
1995-01-01
Massively parallel (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to simulate a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale simulation with a large number of computer nodes. This paper will discuss the implementation of a massively parallel computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.
A parallel direct numerical simulation of dust particles in a turbulent flow
NASA Astrophysics Data System (ADS)
Nguyen, H. V.; Yokota, R.; Stenchikov, G.; Kocurek, G.
2012-04-01
Due to their effects on radiation transport, aerosols play an important role in the global climate. Mineral dust aerosol is a predominant natural aerosol in the desert and semi-desert regions of the Middle East and North Africa (MENA). The Arabian Peninsula is one of the three predominant source regions on the planet "exporting" dust to almost the entire world. Mineral dust aerosols make up about 50% of the tropospheric aerosol mass and therefore produces a significant impact on the Earth's climate and the atmospheric environment, especially in the MENA region that is characterized by frequent dust storms and large aerosol generation. Understanding the mechanisms of dust emission, transport and deposition is therefore essential for correctly representing dust in numerical climate prediction. In this study we present results of numerical simulations of dust particles in a turbulent flow to study the interaction between dust and the atmosphere. Homogenous and passive dust particles in the boundary layers are entrained and advected under the influence of a turbulent flow. Currently no interactions between particles are included. Turbulence is resolved through direct numerical simulation using a parallel incompressible Navier-Stokes flow solver. Model output provides information on particle trajectories, turbulent transport of dust and effects of gravity on dust motion, which will be used to compare with the wind tunnel experiments at University of Texas at Austin. Results of testing of parallel efficiency and scalability is provided. Future versions of the model will include air-particle momentum exchanges, varying particle sizes and saltation effect. The results will be used for interpreting wind tunnel and field experiments and for improvement of dust generation parameterizations in meteorological models.
NASA Astrophysics Data System (ADS)
Zhang, Shanghong; Yuan, Rui; Wu, Yu; Yi, Yujun
2016-01-01
The Open Accelerator (OpenACC) application programming interface is a relatively new parallel computing standard. In this paper, particle-based flow field simulations are examined as a case study of OpenACC parallel computation. The parallel conversion process of the OpenACC standard is explained, and further, the performance of the flow field parallel model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the parallel algorithm, a speedup factor of 18.26× is possible. In contrast, a speedup factor of just 11.77× was achieved with the conventional Open Multi-Processing (OpenMP) parallel mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC parallel mode is found to have good portability, making it easy to implement parallel computation from the original serial model.
NASA Astrophysics Data System (ADS)
Wang, Wenlong; Machta, Jonathan; Katzgraber, Helmut G.
2015-07-01
Population annealing is a Monte Carlo algorithm that marries features from simulated-annealing and parallel-tempering Monte Carlo. As such, it is ideal to overcome large energy barriers in the free-energy landscape while minimizing a Hamiltonian. Thus, population-annealing Monte Carlo can be used as a heuristic to solve combinatorial optimization problems. We illustrate the capabilities of population-annealing Monte Carlo by computing ground states of the three-dimensional Ising spin glass with Gaussian disorder, while comparing to simulated-annealing and parallel-tempering Monte Carlo. Our results suggest that population annealing Monte Carlo is significantly more efficient than simulated annealing but comparable to parallel-tempering Monte Carlo for finding spin-glass ground states.
Wang, Wenlong; Machta, Jonathan; Katzgraber, Helmut G
2015-07-01
Population annealing is a Monte Carlo algorithm that marries features from simulated-annealing and parallel-tempering Monte Carlo. As such, it is ideal to overcome large energy barriers in the free-energy landscape while minimizing a Hamiltonian. Thus, population-annealing Monte Carlo can be used as a heuristic to solve combinatorial optimization problems. We illustrate the capabilities of population-annealing Monte Carlo by computing ground states of the three-dimensional Ising spin glass with Gaussian disorder, while comparing to simulated-annealing and parallel-tempering Monte Carlo. Our results suggest that population annealing Monte Carlo is significantly more efficient than simulated annealing but comparable to parallel-tempering Monte Carlo for finding spin-glass ground states.
NASA Astrophysics Data System (ADS)
Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji
2016-03-01
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
NASA Technical Reports Server (NTRS)
Moxon, Bruce C.; Green, John A.
1990-01-01
A high-performance platform for development of real-time helicopter flight simulations based on a simulation development and analysis platform combining a parallel simulation development and analysis environment with a scalable multiprocessor computer system is described. Simulation functional decomposition is covered, including the sequencing and data dependency of simulation modules and simulation functional mapping to multiple processors. The multiprocessor-based implementation of a blade-element simulation of the UH-60 helicopter is presented, and a prototype developed for a TC2000 computer is generalized in order to arrive at a portable multiprocessor software architecture. It is pointed out that the proposed approach coupled with a pilot's station creates a setting in which simulation engineers, computer scientists, and pilots can work together in the design and evaluation of advanced real-time helicopter simulations.
NASA Astrophysics Data System (ADS)
Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.
2016-02-01
We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation.
AMRSim: an object-oriented performance simulator for parallel adaptive mesh refinement
Miller, B; Philip, B; Quinlan, D; Wissink, A
2001-01-08
Adaptive mesh refinement is complicated by both the algorithms and the dynamic nature of the computations. In parallel the complexity of getting good performance is dependent upon the architecture and the application. Most attempts to address the complexity of AMR have lead to the development of library solutions, most have developed object-oriented libraries or frameworks. All attempts to date have made numerous and sometimes conflicting assumptions which make the evaluation of performance of AMR across different applications and architectures difficult or impracticable. The evaluation of different approaches can alternatively be accomplished through simulation of the different AMR processes. In this paper we outline our research work to simulate the processing of adaptive mesh refinement grids using a distributed array class library (P++). This paper presents a combined analytic and empirical approach, since details of the algorithms can be readily predicted (separated into specific phases), while the performance associated with the dynamic behavior must be studied empirically. The result, AMRSim, provides a simple way to develop bounds on the expected performance of AMR calculations subject to constraints given by the algorithms, frameworks, and architecture.
Simulated Wake Characteristics Data for Closely Spaced Parallel Runway Operations Analysis
NASA Technical Reports Server (NTRS)
Guerreiro, Nelson M.; Neitzke, Kurt W.
2012-01-01
A simulation experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced parallel runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were simulated to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.
SPILADY: A parallel CPU and GPU code for spin-lattice magnetic molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Ma, Pui-Wai; Dudarev, S. L.; Woo, C. H.
2016-10-01
Spin-lattice dynamics generalizes molecular dynamics to magnetic materials, where dynamic variables describing an evolving atomic system include not only coordinates and velocities of atoms but also directions and magnitudes of atomic magnetic moments (spins). Spin-lattice dynamics simulates the collective time evolution of spins and atoms, taking into account the effect of non-collinear magnetism on interatomic forces. Applications of the method include atomistic models for defects, dislocations and surfaces in magnetic materials, thermally activated diffusion of defects, magnetic phase transitions, and various magnetic and lattice relaxation phenomena. Spin-lattice dynamics retains all the capabilities of molecular dynamics, adding to them the treatment of non-collinear magnetic degrees of freedom. The spin-lattice dynamics time integration algorithm uses symplectic Suzuki-Trotter decomposition of atomic coordinate, velocity and spin evolution operators, and delivers highly accurate numerical solutions of dynamic evolution equations over extended intervals of time. The code is parallelized in coordinate and spin spaces, and is written in OpenMP C/C++ for CPU and in CUDA C/C++ for Nvidia GPU implementations. Temperatures of atoms and spins are controlled by Langevin thermostats. Conduction electrons are treated by coupling the discrete spin-lattice dynamics equations for atoms and spins to the heat transfer equation for the electrons. Worked examples include simulations of thermalization of ferromagnetic bcc iron, the dynamics of laser pulse demagnetization, and collision cascades.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-01-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331
Automated integration of genomic physical mapping data via parallel simulated annealing
Slezak, T.
1994-06-01
The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via simulated annealing performed on a network of 40+ Unix machines in parallel, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the parallel net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.
Simulation of light scattering by a pressure deformed red blood cell with a parallel FDTD method
NASA Astrophysics Data System (ADS)
Brock, Robert S.; Hu, Xin-Hua; Yang, Ping; Lu, Jun Q.
2005-03-01
Mature human red blood cells (RBCs) are light scatterers with homogeneous bodies enclosed by membranes and have attracted significant attention for optical diagnosis of disorders related to blood. RBCs possess viscoelastic structures and tend to deform from biconcave shapes isovolumetrically in blood flow in response to pressure variations. Elastic scattering of light by a deformed RBC provides a means to determine their shapes because of the presence of strong light scattering signals, and development of efficient modeling tools is important for developing bed-side instrumentation. The size parameters α, defined as α=2πα/λ with 2α as the characteristic size of the scatterer and λ as the light wavelength in the host medium, of the scatterer of RBCs are in the range of 10 to 50 for wavelengths of light in visible and near-infrared regions, and no analytical solutions have been reported for light scattering from deformed RBCs. We developed a parallel Finite-Difference-Time-Domain (FDTD) method to numerically simulate light scattering by a deformed RBC in a carrier fluid under different flow pressures. The use of parallel computing techniques significantly reduced the computation time of the FDTD method on a low-cost PC cluster. The deformed RBC is modeled in the 3D space as a homogeneous body characterized by a complex dielectric constant at the given wavelength of the incident light. The angular distribution of the light scattering signal was obtained in the form of the Mueller scattering matrix elements and their dependence on shape change due to pressure variation and orientation was studied. Also calculated were the scattering and absorption efficiencies and the potential for using these results to probe the shape change of RBCs will be discussed.
Direct numerical simulation of instabilities in parallel flow with spherical roughness elements
NASA Technical Reports Server (NTRS)
Deanna, R. G.
1992-01-01
Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.
Guo, Hao; Tian, Yimei; Shen, Hailiang; Wang, Yi; Kang, Mengxin
A design approach for determining the optimal flow pattern in a landscape lake is proposed based on FLUENT simulation, multiple objective optimization, and parallel computing. This paper formulates the design into a multi-objective optimization problem, with lake circulation effects and operation cost as two objectives, and solves the optimization problem with non-dominated sorting genetic algorithm II. The lake flow pattern is modelled in FLUENT. The parallelization aims at multiple FLUENT instance runs, which is different from the FLUENT internal parallel solver. This approach: (1) proposes lake flow pattern metrics, i.e. weighted average water flow velocity, water volume percentage of low flow velocity, and variance of flow velocity, (2) defines user defined functions for boundary setting, objective and constraints calculation, and (3) parallels the execution of multiple FLUENT instances runs to significantly reduce the optimization wall-clock time. The proposed approach is demonstrated through a case study for Meijiang Lake in Tianjin, China.
Kadoya, Y.; Abe, H.
1988-04-01
A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied parallel to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The parallel electric field E/sub parallel/ perturbs the electron velocity v/sub parallel/ parallel to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong
2010-10-01
Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies.
Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique
Kanarska, Y
2010-03-24
Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical simulation of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle
Nanostructure modeling in oxide ceramics using large scale parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Campbell, Timothy J.
1998-12-01
The purpose of this dissertation is to investigate the properties and processes in nanostructured oxide ceramics using molecular-dynamics (MD) simulations. These simulations are based on realistic interatomic potentials and require scalable and portable multiresolution algorithms implemented on parallel computers. The dynamics of oxidation of aluminum nanoclusters is studied with a MD scheme that can simultaneously treat metallic and oxide systems. Dynamic charge transfer between anions and cations which gives rise to a compute-intensive Coulomb interaction, is treated by the O(N) Fast Multipole Method. Structural and dynamical correlations and local stresses reveal significant charge transfer and stress variations which cause rapid diffusion of Al and O on the nanocluster surface. At a constant temperature, the formation of an amorphous surface-oxide layer is observed during the first 100 picoseconds. Subsequent sharp decrease in O diffusion normal to the cluster surface arrests the growth of the oxide layer with a saturation thickness of 4 nanometers; this is in excellent agreement with experiments. Analyses of the oxide scale reveal significant charge transfer and variations in local structure. When the heat is not extracted from the cluster, the oxidizing reaction becomes explosive. Sintering, structural correlations, vibrational properties, and mechanical behavior of nanophase silica glasses are also studied using the MD approach based on an empirical interatomic potential that consists of both two and three-body interactions. Nanophase silica glasses with densities ranging from 76 to 93% of the bulk glass density are obtained using an isothermal-isobaric MD approach. During the sintering process, the pore sizes and distribution change without any discernable change in the pore morphology. The height and position of the first sharp diffraction peak (the signature of intermediate-range order) in the neutron static structure factor shows significant differences
A Three Dimensional Parallel Time Accurate Turbopump Simulation Procedure Using Overset Grid Systems
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Chan, William; Kwak, Dochan
2001-01-01
The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete simulation of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate simulations with moving boundary capability will be presented along with the performance of parallel versions of the code.
A Three-Dimensional Parallel Time-Accurate Turbopump Simulation Procedure Using Overset Grid System
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Chan, William; Kwak, Dochan
2002-01-01
The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete simulation of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate simulations with moving boundary capability are presented along with the performance of parallel versions of the code.
Parallel mesh support for particle-in-cell methods in magnetic fusion simulations
NASA Astrophysics Data System (ADS)
Yoon, Eisung; Shephard, Mark S.; Seol, E. Seegyoung; Kalyanaraman, Kaushik; Ibanez, Daniel
2016-10-01
As supercomputing power continues to increase Particle-In-Cell (PIC) methods are being widely adopted for transport simulations of magnetic fusion devices. Current implementations place a copy of the entire continuum mesh and its fields used in the PIC calculations on every node. This is in general not a scalable solution as computational power continues to grow faster than node level memory. To address this scalability issue, while still maintaining sufficient mesh per node to control costly inter-node communication, a new unstructured mesh distribution methods and associated mesh based PIC calculation procedure is being developed building on the parallel unstructured mesh infrastructure (PUMI). Key components to be outlined in the presentation include (i) the mesh distribution strategy, (ii) how the particles are tracked during a push cycle taking advantage of the unstructured mesh adjacency structures and searches based on that structure, and (iii) how the field solve steps and particle migration are controlled. Performance comparisons to the current approach will also be presented.
A Parallel 2D Numerical Simulation of Tumor Cells Necrosis by Local Hyperthermia
NASA Astrophysics Data System (ADS)
Reis, R. F.; Loureiro, F. S.; Lobosco, M.
2014-03-01
Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical simulations are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a parallel programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine.
Singhal, R.P.; Bhardwaj, A. )
1991-09-01
A Monte Carlo simulation of photoelectron energization and energy degradation in H{sub 2} gas in the presence of parallel electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.
Kerr, I D; Sankararamakrishnan, R; Smart, O S; Sansom, M S
1994-01-01
A parallel bundle of transmembrane (TM) alpha-helices surrounding a central pore is present in several classes of ion channel, including the nicotinic acetylcholine receptor (nAChR). We have modeled bundles of hydrophobic and of amphipathic helices using simulated annealing via restrained molecular dynamics. Bundles of Ala20 helices, with N = 4, 5, or 6 helices/bundle were generated. For all three N values the helices formed left-handed coiled coils, with pitches ranging from 160 A (N = 4) to 240 A (N = 6). Pore radius profiles revealed constrictions at residues 3, 6, 10, 13, and 17. A left-handed coiled coil and a similar pattern of pore constrictions were observed for N = 5 bundles of Leu20. In contrast, N = 5 bundles of Ile20 formed right-handed coiled coils, reflecting loosened packing of helices containing beta-branched side chains. Bundles formed by each of two classes of amphipathic helices were examined: (a) M2a, M2b, and M2c derived from sequences of M2 helices of nAChR; and (b) (LSSLLSL)3, a synthetic channel-forming peptide. Both classes of amphipathic helix formed left-handed coiled coils. For (LSSLLSL)3 the pitch of the coil increased as N increased from 4 to 6. The M2c N = 5 helix bundle is discussed in the context of possible models of the pore domain of nAChR. Images FIGURE 1 FIGURE 3 PMID:7529585
Simulation and instability investigation of the flow around a cylinder between two parallel walls
NASA Astrophysics Data System (ADS)
Dou, Hua-Shu; Ben, An-Qing
2015-04-01
The two-dimensional flows around a cylinder between two parallel walls at Re=40 and Re=100 are simulated with computational fluid dynamics (CFD). The governing equations are Navier-Stokes equations. They are discretized with finite volume method (FVM) and the solution is iterated with PISO Algorithm. Then, the calculating results are compared with the numerical results in literature, and good agreements are obtained. After that, the mechanism of the formation of Karman vortex street is investigated and the instability of the entire flow field is analyzed with the energy gradient theory. It is found that the two eddies attached at the rear of the cylinder have no effect on the flow instability for steady flow, i.e., they don't contribute to the formation of Karman vortex street. The formation of Karman vortex street originates from the combinations of the interaction of two shear layers at two lateral sides of the cylinder and the absolute instability in the cylinder wake. For the flow with Karman vortex street, the initial instability occurs at the region in a vortex downstream of the wake and the center of a vortex firstly loses its stability in a vortex. For pressure driven flow, it is confirmed that the inflection point on the time-averaged velocity profile leads to the instability. It is concluded that the energy gradient theory is potentially applicable to study the flow stability and to reveal the mechanism of turbulent transition.
A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations
NASA Astrophysics Data System (ADS)
Smith, Luke; Liang, Qiuhua
2015-04-01
Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed
Parallel simulations of vortex-induced vibrations in turbulent flow: Linear and nonlinear models
NASA Astrophysics Data System (ADS)
Evangelinos, Constantinos
1999-11-01
In this work unstructured spectral/hp element based direct numerical simulation (DNS) techniques are used to simulate vortex-induced vibrations (VIV) of flexible cylinders. Linear structural models are employed for tension- dominated structures (cables) and bending stiffness- dominated structures (beams). Flow-structure interactions are studied in transitional (200-300) and turbulent (1000) Reynolds numbers. Structural responses as well as hydrodynamic forces are analyzed and their relationship with the near wake flow structures is examined. The following conclusions were reached: (1)A Reynolds number effect exists for the observed oscillation amplitude. (2)The phase relationship between cross- flow displacement and coefficient of lift is correlated with both the magnitudes of lift forces and displacement. (3)Cables enhance transition to turbulent flow, while beams (and rigidly vibrating cylinders) delay it. In the transition regime beams oscillate with 70% of the amplitude of cables. (4)Oblique and parallel shedding appear to coexist in the turbulent wake of cables and beams with a traveling wave structural response. The corresponding wake structure behind a cylinder with pinned ends vibrating as a standing wave, displays lambda-type vortices similar to those seen at lower (laminar) Reynolds numbers. (5)Cables and beams at a Reynolds number of 1000 give: (a)extremely similar velocity spectra, (b)differing autocorrelation profiles and large flow structures, and (c)differing structural responses. (6)The empirical formula for the coefficient of drag due to Skop et al. (1977) is shown to be in disagreement with the experimental data; a modified formula fits the results much better. A non-linear set of equations for the finite amplitude vibrations of a string are also derived and investigated. It is combined with an Arbitrary Lagrangian-Eulerian (ALE) flow solver and applied to model simulations of low Reynolds number (100) flow past flexible cylinders with pinned ends
NASA Astrophysics Data System (ADS)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-01
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up
NASA Technical Reports Server (NTRS)
Hill, Gary; Duval, Ronald W.; Green, John A.; Huynh, Loc C.
1991-01-01
A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A simulation development and analysis tool, FLIGHTLAB, was used to implement these models in real time using parallel processing technology. Pilot comments and quantitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing simulation facilities can be upgraded using parallel processing, and the importance of these upgrades to simulation fidelity.
NASA Technical Reports Server (NTRS)
Hill, Gary; Du Val, Ronald W.; Green, John A.; Huynh, Loc C.
1991-01-01
A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A simulation development and analysis tool, FLIGHTLAB, was used to implement these models in real time using parallel processing technology. Pilot comments and qualitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing simulation facilities can be upgraded using parallel processing, and the importance of these upgrades to simulation fidelity.
NASA Technical Reports Server (NTRS)
Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke
1989-01-01
Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
MA,H.; MCCAFFREY,J.; PIACSEK,S.
1997-11-01
This paper is about the parallel implementation of a high-resolution, spectral element, primitive equation model of a homogeneous equatorial ocean. The present work shows that the high-order domain decomposition methods can be efficiently implemented in a massively parallel computing environment to solve large-scale CFD problems, such as the general circulation of the ocean.
NASA Astrophysics Data System (ADS)
Nizenkov, Paul; Noeding, Peter; Konopka, Martin; Fasoulas, Stefanos
2017-03-01
The in-house direct simulation Monte Carlo solver PICLas, which enables parallel, three-dimensional simulations of rarefied gas flows, is verified and validated. Theoretical aspects of the method and the employed schemes are briefly discussed. Considered cases include simple reservoir simulations and complex re-entry geometries, which were selected from literature and simulated with PICLas. First, the chemistry module is verified using simple numerical and analytical solutions. Second, simulation results of the rarefied gas flow around a 70° blunted-cone, the REX Free-Flyer as well as multiple points of the re-entry trajectory of the Orion capsule are presented in terms of drag and heat flux. A comparison to experimental measurements as well as other numerical results shows an excellent agreement across the different simulation cases. An outlook on future code development and applications is given.
NASA Astrophysics Data System (ADS)
Nizenkov, Paul; Noeding, Peter; Konopka, Martin; Fasoulas, Stefanos
2016-07-01
The in-house direct simulation Monte Carlo solver PICLas, which enables parallel, three-dimensional simulations of rarefied gas flows, is verified and validated. Theoretical aspects of the method and the employed schemes are briefly discussed. Considered cases include simple reservoir simulations and complex re-entry geometries, which were selected from literature and simulated with PICLas. First, the chemistry module is verified using simple numerical and analytical solutions. Second, simulation results of the rarefied gas flow around a 70° blunted-cone, the REX Free-Flyer as well as multiple points of the re-entry trajectory of the Orion capsule are presented in terms of drag and heat flux. A comparison to experimental measurements as well as other numerical results shows an excellent agreement across the different simulation cases. An outlook on future code development and applications is given.
Kusuda, T
1980-10-31
The TC 4.7 simplified energy calculation method is a bin method used by the REAP procedure of the Carrier Corporation, except for the load estimating calculations. The simplified procedure was compared with hourly simulation procedures for an office building in Washington, DC. The comparison studied the extent as well as the reasons for agreement and discrepancies due to these two different types of annual energy analysis (bin method and hourly simulation methods). Results of the parallel calculations are discussed and the major reasons of discrepancies between the hourly simulation technique and the simplified TC 4.7 method are identified. Data resulting from the calculation methods are tabulated. (MCW)
Bylaska, Eric J; Weare, Jonathan Q; Weare, John H
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using a variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts
NASA Astrophysics Data System (ADS)
Tian, Shuling; Wu, Yizhao; Xia, Jian
A parallel Navier-Stokes solver based on dynamic overset unstructured grids method is presented to simulate the unsteady turbulent flow field around helicopter in forward flight. The grid method has the advantages of unstructured grid and Chimera grid and is suitable to deal with multiple bodies in relatively moving. Unsteady Navier-Stokes equations are solved on overset unstructured grids by an explicit dual time-stepping, finite volume method. Preconditioning method applied to inner iteration of the dual-time stepping is used to speed up the convergence of numerical simulation. The Spalart-Allmaras one-equation turbulence model is used to evaluate the turbulent viscosity. Parallel computation is based on the dynamic domain decomposition method in overset unstructured grids system at each physical time step. A generic helicopter Robin with a four-blade rotor in forward flight is considered to validate the method presented in this paper. Numerical simulation results show that the parallel dynamic overset unstructured grids method is very efficient for the simulation of helicopter flow field and the results are reliable.
NASA Astrophysics Data System (ADS)
Schroeder, Matthias; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim
2014-05-01
Thousands of numerical tsunami simulations allow the computation of inundation and run-up along the coast for vulnerable areas over the time. A so-called Matching Scenario Database (MSDB) [1] contains this large number of simulations in text file format. In order to visualize these wave propagations the scenarios have to be reprocessed automatically. In the TRIDEC project funded by the seventh Framework Programme of the European Union a Virtual Scenario Database (VSDB) and a Matching Scenario Database (MSDB) were established amongst others by the working group of the University of Bologna (UniBo) [1]. One part of TRIDEC was the developing of a new generation of a Decision Support System (DSS) for tsunami Early Warning Systems (TEWS) [2]. A working group of the GFZ German Research Centre for Geosciences was responsible for developing the Command and Control User Interface (CCUI) as central software application which support operator activities, incident management and message disseminations. For the integration and visualization in the CCUI, the numerical tsunami simulations from MSDB must be converted into the shapefiles format. The usage of shapefiles enables a much easier integration into standard Geographic Information Systems (GIS). Since also the CCUI is based on two widely used open source products (GeoTools library and uDig), whereby the integration of shapefiles is provided by these libraries a priori. In this case, for an example area around the Western Iberian margin several thousand tsunami variations were processed. Due to the mass of data only a program-controlled process was conceivable. In order to optimize the computing efforts and operating time the use of an existing GFZ High Performance Computing Cluster (HPC) had been chosen. Thus, a geospatial software was sought after that is capable for parallel processing. The FOSS tool Geospatial Data Abstraction Library (GDAL/OGR) was used to match the coordinates with the wave heights and generates the
Falcao, D.M.; Kaszkurewicz, E. . COPPE-EE Almeida, H.L.S. . Centro de Pesquisas de Energia Electrica)
1993-02-01
This paper proposes the use of parallel techniques for the computation of power system electromagnetic transients in a multiprocessor environment. System partitioning and parallel solution methods are described. Questions regarding computational load balancing and communication overheads are discussed and techniques are presented to improve the proposed method with respect to those matters. In order to demonstrate the feasibility and to assess the performance of the proposed techniques, a parallel electromagnetic transients program for a multiprocessor environment has been developed. Tests using real power networks of different sizes, executed in an 8-processor hypercube machine, have shown promising performance indices.
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T; Hammond, Glenn; Mahinthakumar, Kumar
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
NASA Astrophysics Data System (ADS)
Blake, Douglas Clifton
A new methodology is presented for conducting numerical simulations of electromagnetic scattering and wave-propagation phenomena on massively parallel computing platforms. A process is constructed which is rooted in the Finite-Volume Time-Domain (FVTD) technique to create a simulation capability that is both versatile and practical. In terms of versatility, the method is platform independent, is easily modifiable, and is capable of solving a large number of problems with no alterations. In terms of practicality, the method is sophisticated enough to solve problems of engineering significance and is not limited to mere academic exercises. In order to achieve this capability, techniques are integrated from several scientific disciplines including computational fluid dynamics, computational electromagnetics, and parallel computing. The end result is the first FVTD solver capable of utilizing the highly flexible overset-gridding process in a distributed-memory computing environment. In the process of creating this capability, work is accomplished to conduct the first study designed to quantify the effects of domain-decomposition dimensionality on the parallel performance of hyperbolic partial differential equations solvers; to develop a new method of partitioning a computational domain comprised of overset grids; and to provide the first detailed assessment of the applicability of overset grids to the field of computational electromagnetics. Using these new methods and capabilities, results from a large number of wave propagation and scattering simulations are presented. The overset-grid FVTD algorithm is demonstrated to produce results of comparable accuracy to single-grid simulations while simultaneously shortening the grid-generation process and increasing the flexibility and utility of the FVTD technique. Furthermore, the new domain-decomposition approaches developed for overset grids are shown to be capable of producing partitions that are better load balanced and
Hybrid Parallelization of Adaptive MHD-Kinetic Module in Multi-Scale Fluid-Kinetic Simulation Suite
Borovikov, Sergey; Heerikhuisen, Jacob; Pogorelov, Nikolai
2013-04-01
The Multi-Scale Fluid-Kinetic Simulation Suite has a computational tool set for solving partially ionized flows. In this paper we focus on recent developments of the kinetic module which solves the Boltzmann equation using the Monte-Carlo method. The module has been recently redesigned to utilize intra-node hybrid parallelization. We describe in detail the redesign process, implementation issues, and modifications made to the code. Finally, we conduct a performance analysis.
A Discussion of Metrics for Parallelized Army Mobile ad hoc Network Simulations
2011-09-01
years including Yet Another Network Simulator (YANS), ns-2 and ns-3, OPNET , and ROSS (4–8). All of these approaches employ a discrete event simulator (DES...adding NetDMF awareness to the COMPOSER toolkit. Commercial tools, such as OPNET , also maintain a suite of tools for analysis of network simulation...Network Simulator-3 (ns-3). [6] Chang, X. Network Simulations with OPNET . In Winter Simulation Conference, 1999. [7] Khosa, I.; Haider, U.; Mosood, H
NASA Astrophysics Data System (ADS)
Lin, K.-M.; Hu, M.-H.; Hung, C.-T.; Wu, J.-S.; Hwang, F.-N.; Chen, Y.-S.; Cheng, G.
2012-12-01
Development of a hybrid numerical algorithm which couples weakly with the gas flow model (GFM) and the plasma fluid model (PFM) for simulating an atmospheric-pressure plasma jet (APPJ) and its acceleration by two approaches is presented. The weak coupling between gas flow and discharge is introduced by transferring between the results obtained from the steady-state solution of the GFM and cycle-averaged solution of the PFM respectively. Approaches of reducing the overall runtime include parallel computing of the GFM and the PFM solvers, and employing a temporal multi-scale method (TMSM) for PFM. Parallel computing of both solvers is realized using the domain decomposition method with the message passing interface (MPI) on distributed-memory machines. The TMSM considers only chemical reactions by ignoring the transport terms when integrating temporally the continuity equations of heavy species at each time step, and then the transport terms are restored only at an interval of time marching steps. The total reduction of runtime is 47% by applying the TMSM to the APPJ example presented in this study. Application of the proposed hybrid algorithm is demonstrated by simulating a parallel-plate helium APPJ impinging onto a substrate, which the cycle-averaged properties of the 200th cycle are presented. The distribution patterns of species densities are strongly correlated by the background gas flow pattern, which shows that consideration of gas flow in APPJ simulations is critical.
NASA Astrophysics Data System (ADS)
Renouf, Mathieu; Dubois, Frederic; Alart, Pierre
2004-07-01
The NSCD method has shown its efficiency in the simulation of granular media. Since the number of particles and contact increases, the shape of the discrete elements becomes more complicated and the simulated problems becomes more complex, the numerical tools need to be improved in order to preserve reasonable elapsed CPU time. In this paper we present a parallelization approach of the NSCD algorithm and we investigate its influence on the numerical behaviour of the method. We illustrate the efficiency on an example made of hard disks: a free surface compaction.
Large-eddy simulation of the Rayleigh-Taylor instability on a massively parallel computer
Amala, P.A.K.
1995-03-01
A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively parallel computer. Programming models on massively parallel computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively parallel computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data parallel code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data parallel approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.
NASA Technical Reports Server (NTRS)
Fijany, A.; Roberts, J. A.; Jain, A.; Man, G. K.
1993-01-01
Part 1 of this paper presented the requirements for the real-time simulation of Cassini spacecraft along with some discussion of the DARTS algorithm. Here, in Part 2 we discuss the development and implementation of parallel/vectorized DARTS algorithm and architecture for real-time simulation. Development of the fast algorithms and architecture for real-time hardware-in-the-loop simulation of spacecraft dynamics is motivated by the fact that it represents a hard real-time problem, in the sense that the correctness of the simulation depends on both the numerical accuracy and the exact timing of the computation. For a given model fidelity, the computation should be computed within a predefined time period. Further reduction in computation time allows increasing the fidelity of the model (i.e., inclusion of more flexible modes) and the integration routine.
Coupled Models and Parallel Simulations for Three-Dimensional Full-Stokes Ice Sheet Modeling
Zhang, Huai; Ju, Lili
2011-01-01
A three-dimensional full-Stokes computational model is considered for determining the dynamics, temperature, and thickness of ice sheets. The governing thermomechanical equations consist of the three-dimensional full-Stokes system with nonlinear rheology for the momentum, an advective-diffusion energy equation for temperature evolution, and a mass conservation equation for icethickness changes. Here, we discuss the variable resolution meshes, the finite element discretizations, and the parallel algorithms employed by the model components. The solvers are integrated through a well-designed coupler for the exchange of parametric data between components. The discretization utilizes high-quality, variable-resolution centroidal Voronoi Delaunay triangulation meshing and existing parallel solvers. We demonstrate the gridding technology, discretization schemes, and the efficiency and scalability of the parallel solvers through computational experiments using both simplified geometries arising from benchmark test problems and a realistic Greenland ice sheet geometry.
NASA Astrophysics Data System (ADS)
Stupl, J.; Faber, N.; Foster, C.; Yang, F.; Nelson, B.; Aziz, J.; Nuttall, A.; Henze, C.; Levit, C.
2014-09-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has proven that a few ground-based systems consisting of 10 kW class lasers directed by 1.5 m telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present both our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system.
García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine
2015-01-01
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented
García-Grajales, Julián A; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine
2015-01-01
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented
2008-05-01
Reynolds’ Distributed Behavior Model [18]. Corner [5,6,20] ported the model from a single-processor Windows platform to a parallel Linux-based Beowulf ... Beowulf clusters consist of 1 to 16 processors and 1 to 1024 problems in intervals of powers of two. Each test is run 30 times for statistical analysis
NASA Astrophysics Data System (ADS)
Fang, Ye; Feng, Sheng; Tam, Ka-Ming; Yun, Zhifeng; Moreno, Juana; Ramanujam, J.; Jarrell, Mark
2014-10-01
Monte Carlo simulations of the Ising model play an important role in the field of computational statistical physics, and they have revealed many properties of the model over the past few decades. However, the effect of frustration due to random disorder, in particular the possible spin glass phase, remains a crucial but poorly understood problem. One of the obstacles in the Monte Carlo simulation of random frustrated systems is their long relaxation time making an efficient parallel implementation on state-of-the-art computation platforms highly desirable. The Graphics Processing Unit (GPU) is such a platform that provides an opportunity to significantly enhance the computational performance and thus gain new insight into this problem. In this paper, we present optimization and tuning approaches for the CUDA implementation of the spin glass simulation on GPUs. We discuss the integration of various design alternatives, such as GPU kernel construction with minimal communication, memory tiling, and look-up tables. We present a binary data format, Compact Asynchronous Multispin Coding (CAMSC), which provides an additional 28.4% speedup compared with the traditionally used Asynchronous Multispin Coding (AMSC). Our overall design sustains a performance of 33.5 ps per spin flip attempt for simulating the three-dimensional Edwards-Anderson model with parallel tempering, which significantly improves the performance over existing GPU implementations.
NASA Astrophysics Data System (ADS)
Jolliet, S.; McMillan, B. F.; Vernay, T.; Villard, L.; Hatzky, R.; Bottino, A.; Angelino, P.
2009-07-01
In this paper, the influence of the parallel nonlinearity on zonal flows and heat transport in global particle-in-cell ion-temperature-gradient simulations is studied. Although this term is in theory orders of magnitude smaller than the others, several authors [L. Villard, P. Angelino, A. Bottino et al., Plasma Phys. Contr. Fusion 46, B51 (2004); L. Villard, S. J. Allfrey, A. Bottino et al., Nucl. Fusion 44, 172 (2004); J. C. Kniep, J. N. G. Leboeuf, and V. C. Decyck, Comput. Phys. Commun. 164, 98 (2004); J. Candy, R. E. Waltz, S. E. Parker et al., Phys. Plasmas 13, 074501 (2006)] found different results on its role. The study is performed using the global gyrokinetic particle-in-cell codes TORB (theta-pinch) [R. Hatzky, T. M. Tran, A. Könies et al., Phys. Plasmas 9, 898 (2002)] and ORB5 (tokamak geometry) [S. Jolliet, A. Bottino, P. Angelino et al., Comput. Phys. Commun. 177, 409 (2007)]. In particular, it is demonstrated that the parallel nonlinearity, while important for energy conservation, affects the zonal electric field only if the simulation is noise dominated. When a proper convergence is reached, the influence of parallel nonlinearity on the zonal electric field, if any, is shown to be small for both the cases of decaying and driven turbulence.
Reddy, A.V.; Kothe, D.B.; Lam, K.L.
1997-06-01
The Los Alamos National Laboratory (LANL) is currently developing a new casting simulation tool (known as Telluride) that employs robust, high-resolution finite volume algorithms for incompressible fluid flow, volume tracking of interfaces, and solidification physics on three-dimensional (3-D) unstructured meshes. Their finite volume algorithms are based on colocated cell-centered schemes that are formally second order in time and space. The flow algorithm is a 3-D extension of recent work on projection method solutions of the Navier-Stokes (NS) equations. Their volume tracking algorithm can accurately track topologically complex interfaces by approximating the interface geometry as piecewise planar. Coupled to their fluid flow algorithm is a comprehensive binary alloy solidification model that incorporates macroscopic descriptions of heat transfer, solute redistribution, and melt convection as well as a microscopic description of segregation. The finite volume algorithms, which are efficient, parallel, and robust, can yield high-fidelity solutions on a variety of meshes, ranging from those that are structured orthogonal to fully unstructured (finite element). The authors discuss key computer science issues that have enabled them to efficiently parallelize their unstructured mesh algorithms on both distributed and shared memory computing platforms. These include their functionally object-oriented use of Fortran 90 and new parallel libraries for gather/scatter functions (PGSLib) and solutions of linear systems of equations (JTpack90). Examples of their current capabilities are illustrated with simulations of mold filling and solidification of complex 3-D components currently being poured in LANL foundries.
NASA Astrophysics Data System (ADS)
Wang, Cheng; Dong, XinZhuang; Shu, Chi-Wang
2015-10-01
For numerical simulation of detonation, computational cost using uniform meshes is large due to the vast separation in both time and space scales. Adaptive mesh refinement (AMR) is advantageous for problems with vastly different scales. This paper aims to propose an AMR method with high order accuracy for numerical investigation of multi-dimensional detonation. A well-designed AMR method based on finite difference weighted essentially non-oscillatory (WENO) scheme, named as AMR&WENO is proposed. A new cell-based data structure is used to organize the adaptive meshes. The new data structure makes it possible for cells to communicate with each other quickly and easily. In order to develop an AMR method with high order accuracy, high order prolongations in both space and time are utilized in the data prolongation procedure. Based on the message passing interface (MPI) platform, we have developed a workload balancing parallel AMR&WENO code using the Hilbert space-filling curve algorithm. Our numerical experiments with detonation simulations indicate that the AMR&WENO is accurate and has a high resolution. Moreover, we evaluate and compare the performance of the uniform mesh WENO scheme and the parallel AMR&WENO method. The comparison results provide us further insight into the high performance of the parallel AMR&WENO method.
Huixin, Wu; Duo, Mo; He, Li
2014-01-01
Spectrum allocation is one of the key issues to improve spectrum efficiency and has become the hot topic in the research of cognitive wireless network. This paper discusses the real-time feature and efficiency of dynamic spectrum allocation and presents a new spectrum allocation algorithm based on the master-slave parallel immune optimization model. The algorithm designs a new encoding scheme for the antibody based on the demand for convergence rate and population diversity. For improving the calculating efficiency, the antibody affinity in the population is calculated in multiple computing nodes at the same time. Simulation results show that the algorithm reduces the total spectrum allocation time and can achieve higher network profits. Compared with traditional serial algorithms, the algorithm proposed in this paper has better speedup ratio and parallel efficiency.
Alerstam, Erik; Svensson, Tomas; Andersson-Engels, Stefan
2008-01-01
General-purpose computing on graphics processing units (GPGPU) is shown to dramatically increase the speed of Monte Carlo simulations of photon migration. In a standard simulation of time-resolved photon migration in a semi-infinite geometry, the proposed methodology executed on a low-cost graphics processing unit (GPU) is a factor 1000 faster than simulation performed on a single standard processor. In addition, we address important technical aspects of GPU-based simulations of photon migration. The technique is expected to become a standard method in Monte Carlo simulations of photon migration.
Duan, Nan; Dimitrovski, Aleksandar D; Simunovic, Srdjan; Sun, Kai
2016-01-01
The development of high-performance computing techniques and platforms has provided many opportunities for real-time or even faster-than-real-time implementation of power system simulations. One approach uses the Parareal in time framework. The Parareal algorithm has shown promising theoretical simulation speedups by temporal decomposing a simulation run into a coarse simulation on the entire simulation interval and fine simulations on sequential sub-intervals linked through the coarse simulation. However, it has been found that the time cost of the coarse solver needs to be reduced to fully exploit the potentials of the Parareal algorithm. This paper studies a Parareal implementation using reduced generator models for the coarse solver and reports the testing results on the IEEE 39-bus system and a 327-generator 2383-bus Polish system model.
A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers
NASA Technical Reports Server (NTRS)
Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl
2010-01-01
Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.
2012-05-22
redistributing the chemistry workload, namely (a) PLP, purely local processing; (b) URAN , the uniform random distribution of chemistry com- putations among...all cores following an early stage of PLP; and (c) P- URAN , a Partitioned URAN strategy that redistributes the workload within parti- tions or subsets...parallel strate- gies for redistributing the chemistry workload, namely (a) PLP, purely local processing; (b) URAN , the uniform random distribution
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations.
Landge, A G; Levine, J A; Bhatele, A; Isaacs, K E; Gamblin, T; Schulz, M; Langer, S H; Bremer, Peer-Timo; Pascucci, V
2012-12-01
The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D's performance on an IBM Blue Gene/P system.
A Parallel Adaptive Wavelet Method for the Simulation of Compressible Reacting Flows
NASA Astrophysics Data System (ADS)
Zikoski, Zachary; Paolucci, Samuel
2011-11-01
The Wavelet Adaptive Multiresolution Representation (WAMR) method provides a robust method for controlling spatial grid adaption--fine grid spacing in regions of a solution requiring high resolution (i.e. near steep gradients, singularities, or near- singularities) and using much coarser grid spacing where the solution is slowly varying. The sparse grids produced using the WAMR method exhibit very high compression ratios compared to uniform grids of equivalent resolution. Subsequently, a wide range of spatial scales often occurring in continuum physics models can be captured efficiently. Furthermore, the wavelet transform provides a direct measure of local error at each grid point, effectively producing automatically verified solutions. The algorithm is parallelized using an MPI-based domain decomposition approach suitable for a wide range of distributed-memory parallel architectures. The method is applied to the solution of the compressible, reactive Navier-Stokes equations and includes multi-component diffusive transport and chemical kinetics models. Results for the method's parallel performance are reported, and its effectiveness on several challenging compressible reacting flow problems is highlighted.
NASA Astrophysics Data System (ADS)
Bachlechner, Martina E.; Omeltchenko, Andrey; Nakano, Aiichiro; Kalia, Rajiv K.; Vashishta, Priya; Ebbsjö, Ingvar; Madhukar, Anupam
2000-01-01
Mechanical behavior of the Si\\(111\\)/Si3N4\\(0001\\) interface is studied using million atom molecular dynamics simulations. At a critical value of applied strain parallel to the interface, a crack forms on the silicon nitride surface and moves toward the interface. The crack does not propagate into the silicon substrate; instead, dislocations are emitted when the crack reaches the interface. The dislocation loop propagates in the \\(1¯ 1¯1\\) plane of the silicon substrate with a speed of 500 \\(+/-100\\) m/s. Time evolution of the dislocation emission and nature of defects is studied.
Bachlechner; Omeltchenko; Nakano; Kalia; Vashishta; Ebbsjo; Madhukar
2000-01-10
Mechanical behavior of the Si(111)/Si(3)N4(0001) interface is studied using million atom molecular dynamics simulations. At a critical value of applied strain parallel to the interface, a crack forms on the silicon nitride surface and moves toward the interface. The crack does not propagate into the silicon substrate; instead, dislocations are emitted when the crack reaches the interface. The dislocation loop propagates in the (1; 1;1) plane of the silicon substrate with a speed of 500 (+/-100) m/s. Time evolution of the dislocation emission and nature of defects is studied.
Domel, N.D.; Thompson, D.S. )
1991-01-01
The effect of shock impingement on the mixing and combustion of a reacting shear-layer is numerically simulated. Hydrogen fuel is injected at sonic velocity behind a backward facing step in a direction parallel to a supersonic freestream vitiated with H{sub 2}O. The two-dimensional Navier-Stokes equations are solved and explicitly coupled to a chemistry package employing a global, two-step combustion model. The results show that shock impingement enhances the mixing and combustion. 17 refs.
Simulation Neurotechnologies for Advancing Brain Research: Parallelizing Large Networks in NEURON.
Lytton, William W; Seidenstein, Alexandra H; Dura-Bernal, Salvador; McDougal, Robert A; Schürmann, Felix; Hines, Michael L
2016-10-01
Large multiscale neuronal network simulations are of increasing value as more big data are gathered about brain wiring and organization under the auspices of a current major research initiative, such as Brain Research through Advancing Innovative Neurotechnologies. The development of these models requires new simulation technologies. We describe here the current use of the NEURON simulator with message passing interface (MPI) for simulation in the domain of moderately large networks on commonly available high-performance computers (HPCs). We discuss the basic layout of such simulations, including the methods of simulation setup, the run-time spike-passing paradigm, and postsimulation data storage and data management approaches. Using the Neuroscience Gateway, a portal for computational neuroscience that provides access to large HPCs, we benchmark simulations of neuronal networks of different sizes (500-100,000 cells), and using different numbers of nodes (1-256). We compare three types of networks, composed of either Izhikevich integrate-and-fire neurons (I&F), single-compartment Hodgkin-Huxley (HH) cells, or a hybrid network with half of each. Results show simulation run time increased approximately linearly with network size and decreased almost linearly with the number of nodes. Networks with I&F neurons were faster than HH networks, although differences were small since all tested cells were point neurons with a single compartment.
Milind Deo; Chung-Kan Huang; Huabing Wang
2008-08-31
volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase simulations in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional simulators were developed for fractured reservoirs using the generalized framework. The thermal-compositional simulator was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The simulators were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently parallel. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.
NASA Technical Reports Server (NTRS)
Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)
2002-01-01
The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
NASA Astrophysics Data System (ADS)
Marx, Alain; Lütjens, Hinrich
2017-03-01
A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.
NASA Astrophysics Data System (ADS)
Ren, Rong; Wang, Jin; Jiang, Xiyan; Lu, Yunqing; Xu, Ji
2014-10-01
The finite-difference time-domain (FDTD) method, which solves time-dependent Maxwell's curl equations numerically, has been proved to be a highly efficient technique for numerous applications in electromagnetic. Despite the simplicity of the FDTD method, this technique suffers from serious limitations in case that substantial computer resource is required to solve electromagnetic problems with medium or large computational dimensions, for example in high-index optical devices. In our work, an efficient wavelet-based FDTD model has been implemented and extended in a parallel computation environment, to analyze high-index optical devices. This model is based on Daubechies compactly supported orthogonal wavelets and Deslauriers-Dubuc interpolating functions as biorthogonal wavelet bases, and thus is a very efficient algorithm to solve differential equations numerically. This wavelet-based FDTD model is a high-spatial-order FDTD indeed. Because of the highly linear numerical dispersion properties of this high-spatial-order FDTD, the required discretization can be coarser than that required in the standard FDTD method. In our work, this wavelet-based FDTD model achieved significant reduction in the number of cells, i.e. used memory. Also, as different segments of the optical device can be computed simultaneously, there was a significant gain in computation time. Substantially, we achieved speed-up factors higher than 30 in comparisons to using a single processor. Furthermore, the efficiency of the parallelized computation such as the influence of the discretization and the load sharing between different processors were analyzed. As a conclusion, this parallel-computing model is promising to analyze more complicated optical devices with large dimensions.
Parallel, out-of-core methods for N-body simulation
Salmon, J.; Warren, M.S.
1997-03-01
Hierarchical treecodes have, to a large extent, converted the compute-bound N-body problem into a memory-bound problem. The large ratio of DRAM to disk pricing suggests use of out-of-core techniques to overcome memory capacity limitations. The authors describe a parallel, out-of-core treecode library, targeted at machines with independent secondary storage associated with each processor. Borrowing the space-filling curve techniques from the in-core library, and manually paging, resulting in excellent spatial and temporal locality and very good performance.
Scalable Iterative Solvers Applied to 3D Parallel Simulation of Advanced Semiconductor Devices
NASA Astrophysics Data System (ADS)
García-Loureiro, A. J.; Aldegunde, M.; Seoane, N.
2009-08-01
We have studied the performance of a preconditioned iterative solver to speed up a 3D semiconductor device simulator. Since 3D simulations necessitate large computing resources, the choice of algorithms and their parameters become of utmost importance. This code uses a density gradient drift-diffusion semiconductor transport model based on the finite element method which is one of the most general and complex discretisation techniques. It has been implemented for a distributed memory multiprocessor environment using the Message Passing Interface (MPI) library. We have applied this simulator to a 67 nm effective gate length Si MOSFET.
Applying Parallel Adaptive Methods with GeoFEST/PYRAMID to Simulate Earth Surface Crustal Dynamics
NASA Technical Reports Server (NTRS)
Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Glasscoe, Margaret; Donnellan, Andrea; Li, Peggy
2006-01-01
This viewgraph presentation reviews the use Adaptive Mesh Refinement (AMR) in simulating the Crustal Dynamics of Earth's Surface. AMR simultaneously improves solution quality, time to solution, and computer memory requirements when compared to generating/running on a globally fine mesh. The use of AMR in simulating the dynamics of the Earth's Surface is spurred by future proposed NASA missions, such as InSAR for Earth surface deformation and other measurements. These missions will require support for large-scale adaptive numerical methods using AMR to model observations. AMR was chosen because it has been successful in computation fluid dynamics for predictive simulation of complex flows around complex structures.
NASA Technical Reports Server (NTRS)
Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon
2014-01-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a
Stage-parallel fully implicit Runge-Kutta solvers for discontinuous Galerkin fluid simulations
NASA Astrophysics Data System (ADS)
Pazner, Will; Persson, Per-Olof
2017-04-01
In this paper, we develop new techniques for solving the large, coupled linear systems that arise from fully implicit Runge-Kutta methods. This method makes use of the iterative preconditioned GMRES algorithm for solving the linear systems, which has seen success for fluid flow problems and discontinuous Galerkin discretizations. By transforming the resulting linear system of equations, one can obtain a method which is much less computationally expensive than the untransformed formulation, and which compares competitively with other time-integration schemes, such as diagonally implicit Runge-Kutta (DIRK) methods. We develop and test several ILU-based preconditioners effective for these large systems. We additionally employ a parallel-in-time strategy to compute the Runge-Kutta stages simultaneously. Numerical experiments are performed on the Navier-Stokes equations using Euler vortex and 2D and 3D NACA airfoil test cases in serial and in parallel settings. The fully implicit Radau IIA Runge-Kutta methods compare favorably with equal-order DIRK methods in terms of accuracy, number of GMRES iterations, number of matrix-vector multiplications, and wall-clock time, for a wide range of time steps.
Introducing ONETEP: linear-scaling density functional simulations on parallel computers.
Skylaris, Chris-Kriton; Haynes, Peter D; Mostofi, Arash A; Payne, Mike C
2005-02-22
We present ONETEP (order-N electronic total energy package), a density functional program for parallel computers whose computational cost scales linearly with the number of atoms and the number of processors. ONETEP is based on our reformulation of the plane wave pseudopotential method which exploits the electronic localization that is inherent in systems with a nonvanishing band gap. We summarize the theoretical developments that enable the direct optimization of strictly localized quantities expressed in terms of a delocalized plane wave basis. These same localized quantities lead us to a physical way of dividing the computational effort among many processors to allow calculations to be performed efficiently on parallel supercomputers. We show with examples that ONETEP achieves excellent speedups with increasing numbers of processors and confirm that the time taken by ONETEP as a function of increasing number of atoms for a given number of processors is indeed linear. What distinguishes our approach is that the localization is achieved in a controlled and mathematically consistent manner so that ONETEP obtains the same accuracy as conventional cubic-scaling plane wave approaches and offers fast and stable convergence. We expect that calculations with ONETEP have the potential to provide quantitative theoretical predictions for problems involving thousands of atoms such as those often encountered in nanoscience and biophysics.
CMAD: A Self-consistent Parallel Code to Simulate the Electron Cloud Build-up and Instabilities
Pivi, M.T.F.; /SLAC
2007-11-07
We present the features of CMAD, a newly developed self-consistent code which simulates both the electron cloud build-up and related beam instabilities. By means of parallel (Message Passing Interface - MPI) computation, the code tracks the beam in an existing (MAD-type) lattice and continuously resolves the interaction between the beam and the cloud at each element location, with different cloud distributions at each magnet location. The goal of CMAD is to simulate single- and coupled-bunch instability, allowing tune shift, dynamic aperture and frequency map analysis and the determination of the secondary electron yield instability threshold. The code is in its phase of development and benchmarking with existing codes. Preliminary results on benchmarking are presented in this paper.
NASA Astrophysics Data System (ADS)
Knapczyk, J.; Tora, G.
2014-08-01
A novel parallel manipulator with 3 legs (2 actuated by linear actuators and one supporting pillar),which is applied in a wheel loader driving simulator, is proposed in this paper. The roll angle and the pitch angle of the platform are derived in closed-form of functions of the variable lengths of two actuators. The linear velocity and acceleration of the selected point and angular velocity of the moving platform are determined and compared with measurement results obtained in the respective point and in the body of the wheel loader. The differences between the desired and actual actuator displacements are used as feedback to compute how much force to send to the actuators as some function of the servo error. A numerical example with a proposed mechanism as a driving simulator is presented
NASA Astrophysics Data System (ADS)
Reuter, K.; Jenko, F.; Forest, C. B.; Bayliss, R. A.
2008-08-01
A parallel implementation of a nonlinear pseudo-spectral MHD code for the simulation of turbulent dynamos in spherical geometry is reported. It employs a dual domain decomposition technique in both real and spectral space. It is shown that this method shows nearly ideal scaling going up to 128 CPUs on Beowulf-type clusters with fast interconnect. Furthermore, the potential of exploiting single precision arithmetic on standard x86 processors is examined. It is pointed out that the MHD code thereby achieves a maximum speedup of 1.7, whereas the validity of the computations is still granted. The combination of both measures will allow for the direct numerical simulation of highly turbulent cases ( 1500
NASA Astrophysics Data System (ADS)
Johnson, Timothy C.; Hammond, Glenn E.; Chen, Xingyuan
2017-02-01
Time-lapse electrical resistivity tomography (ERT) is finding increased application for remotely monitoring processes occurring in the near subsurface in three-dimensions (i.e. 4D monitoring). However, there are few codes capable of simulating the evolution of subsurface resistivity and corresponding tomographic measurements arising from a particular process, particularly in parallel and with an open source license. Herein we describe and demonstrate an electrical resistivity tomography module for the PFLOTRAN subsurface flow and reactive transport simulation code, named PFLOTRAN-E4D. The PFLOTRAN-E4D module operates in parallel using a dedicated set of compute cores in a master-slave configuration. At each time step, the master processes receives subsurface states from PFLOTRAN, converts those states to bulk electrical conductivity, and instructs the slave processes to simulate a tomographic data set. The resulting multi-physics simulation capability enables accurate feasibility studies for ERT imaging, the identification of the ERT signatures that are unique to a given process, and facilitates the joint inversion of ERT data with hydrogeological data for subsurface characterization. PFLOTRAN-E4D is demonstrated herein using a field study of stage-driven groundwater/river water interaction ERT monitoring along the Columbia River, Washington, USA. Results demonstrate the complex nature of subsurface electrical conductivity changes, in both the saturated and unsaturated zones, arising from river stage fluctuations and associated river water intrusion into the aquifer. The results also demonstrate the sensitivity of surface based ERT measurements to those changes over time. PFLOTRAN-E4D is available with the PFLOTRAN development version with an open-source license at https://bitbucket.org/pflotran/pflotran-dev.
The Integrated Simulation Analysis on 6-DOF Electrohydraulic Servo-Control Parallel Platform
NASA Astrophysics Data System (ADS)
Lin, Cheng; Ren, Liangcheng
The unmatched characteristics between dissymmetry hydraulic cylinder and symmetry servo valve while building the model were analyzed. And the simulation analysis adopting dynamic pressure feedback controlling method were proceeded. The unmatched problem has been solved well. The integration simulation study of test bed upon step-input function and sine movement associating with the tri-system of mechanics, hydraulics and control, as well as the static and dynamic control system performance were described in this article.
NASA Astrophysics Data System (ADS)
Debolt, Stephen Edward
Solvent effects were studied and described via molecular dynamics (MD) and free energy perturbation (FEP) simulations using the molecular mechanics program AMBER. The following specific topics were explored:. Polar solvents cause a blue shift of the rm nto pi^* transition band of simple alkyl carbonyl compounds. The ground- versus excited-state solvation effects responsible for the observed solvatochromism are described in terms of the molecular level details of solute-solvent interactions in several modeled solvents spanning the range from polar to nonpolar, including water, methanol, and carbon tetrachloride. The structure and dynamics of octanol media were studied to explore the question: "why is octanol/water media such a good biophase analog?". The formation of linear and cyclic polymers of hydrogen-bonded solvent molecules, micelle-like clusters, and the effects of saturating waters are described. Two small drug-sized molecules, benzene and phenol, were solvated in water-saturated octanol. The solute-solvent structure and dynamics were analysed. The difference in their partitioning free energies was calculated. MD and FEP calculations were adapted for parallel computation, increasing their "speed" or the time span accessible by a simulation. The non-cyclic polyether ionophore salinomycin was studied in methanol solvent via parallel FEP. The path of binding and release for a potassium ion was investigated by calculating the potential of mean force along the "exit vector".
NASA Astrophysics Data System (ADS)
Felder, Gary
2008-10-01
We describe an MPI C++ program that we have written and made available for calculating the evolution of interacting scalar fields in an expanding universe on parallel clusters. The program is a parallel programming extension of the simulation program LATTICEEASY. The ability to run these simulations on parallel clusters, however, greatly extends the range of scales and times that can be simulated. The program is particularly useful for the study of reheating and thermalization after inflation. The program and its full documentation are available on the Web at http://www.science.smith.edu/departments/Physics/fstaff/gfelder/latticeeasy/. In this paper we provide a brief overview of what the program does and what it is useful for. Catalogue identifier: AEBJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7469 No. of bytes in distributed program, including test data, etc.: 613 334 Distribution format: tar.gz Programming language: C++/MPI Computer: Cluster. Must have the library FFTW installed Operating system: Any RAM: Typically 4 MB to 1 GB per processor Classification: 1.9 External routines: A single-precision version of the FFTW library (http://www.fftw.org/) must be available on the target machine. Nature of problem: After inflation the universe consisted of interacting fields in a high energy, nonthermal state [1]. The evolution of these fields cannot be described with standard approximation techniques such as linearization, kinetic theory, or Hartree expansion, and must thus be simulated numerically. Fortunately, the fields rapidly acquire large occupation numbers over a range of frequencies, so their evolution can be accurately modeled with classical field theory [2]. The specific fields and
Phase space simulation of collisionless stellar systems on the massively parallel processor
NASA Technical Reports Server (NTRS)
White, Richard L.
1987-01-01
A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.
NASA Technical Reports Server (NTRS)
Nishikawa, K.-I.; Ganguli, G.; Lee, Y. C.; Palmadesso, P. J.
1989-01-01
A spatially two-dimensional electrostatic PIC simulation code was used to study the stability of a plasma equilibrium characterized by a localized transverse dc electric field and a field-aligned drift for L is much less than Lx, where Lx is the simulation length in the x direction and L is the scale length associated with the dc electric field. It is found that the dc electric field and the field-aligned current can together play a synergistic role to enable the excitation of electrostatic waves even when the threshold values of the field aligned drift and the E x B drift are individually subcritical. The simulation results show that the growing ion waves are associated with small vortices in the linear stage, which evolve to the nonlinear stage dominated by larger vortices with lower frequencies.
NASA Technical Reports Server (NTRS)
Lake, George; Quinn, Thomas; Richardson, Derek C.; Stadel, Joachim
1999-01-01
"The orbit of any one planet depends on the combined motion of all the planets, not to mention the actions of all these on each other. To consider simultaneously all these causes of motion and to define these motions by exact laws allowing of convenient calculation exceeds, unless I am mistaken, the forces of the entire human intellect" -Isaac Newton 1687. Epochal surveys are throwing down the gauntlet for cosmological simulation. We describe three keys to meeting the challenge of N-body simulation: adaptive potential solvers, adaptive integrators and volume renormalization. With these techniques and a dedicated Teraflop facility, simulation can stay even with observation of the Universe. We also describe some problems in the formation and stability of planetary systems. Here, the challenge is to perform accurate integrations that retain Hamiltonian properties for 10(exp 13) timesteps.
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
Warren, Michael S.
2014-01-01
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2 18 ) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (4096 3 ) particle cosmological simulations, accounting for 4×10 20 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracymore » and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.« less
Design of a high-speed digital processing element for parallel simulation
NASA Technical Reports Server (NTRS)
Milner, E. J.; Cwynar, D. S.
1983-01-01
A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
NASA Astrophysics Data System (ADS)
Küchlin, Stephan; Jenny, Patrick
2017-01-01
A major challenge for the conventional Direct Simulation Monte Carlo (DSMC) technique lies in the fact that its computational cost becomes prohibitive in the near continuum regime, where the Knudsen number (Kn)-characterizing the degree of rarefaction-becomes small. In contrast, the Fokker-Planck (FP) based particle Monte Carlo scheme allows for computationally efficient simulations of rarefied gas flows in the low and intermediate Kn regime. The Fokker-Planck collision operator-instead of performing binary collisions employed by the DSMC method-integrates continuous stochastic processes for the phase space evolution in time. This allows for time step and grid cell sizes larger than the respective collisional scales required by DSMC. Dynamically switching between the FP and the DSMC collision operators in each computational cell is the basis of the combined FP-DSMC method, which has been proven successful in simulating flows covering the whole Kn range. Until recently, this algorithm had only been applied to two-dimensional test cases. In this contribution, we present the first general purpose implementation of the combined FP-DSMC method. Utilizing both shared- and distributed-memory parallelization, this implementation provides the capability for simulations involving many particles and complex geometries by exploiting state of the art computer cluster technologies.
Prinz, Jan-Hendrik; Chondera, John D; Pande, Vijay S; Swope, William C; Smith, Jeremy C; Noe, F
2011-01-01
Parallel tempering (PT) molecular dynamics simulations have been extensively investigated as a means of efficient sampling of the configurations of biomolecular systems. Recent work has demonstrated how the short physical trajectories generated in PT simulations of biomolecules can be used to construct the Markov models describing biomolecular dynamics at each simulated temperature. While this approach describes the temperature-dependent kinetics, it does not make optimal use of all available PT data, instead estimating the rates at a given temperature using only data from that temperature. This can be problematic, as some relevant transitions or states may not be sufficiently sampled at the temperature of interest, but might be readily sampled at nearby temperatures. Further, the comparison of temperature-dependent properties can suffer from the false assumption that data collected from different temperatures are uncorrelated. We propose here a strategy in which, by a simple modification of the PT protocol, the harvested trajectories can be reweighted, permitting data from all temperatures to contribute to the estimated kinetic model. The method reduces the statistical uncertainty in the kinetic model relative to the single temperature approach and provides estimates of transition probabilities even for transitions not observed at the temperature of interest. Further, the method allows the kinetics to be estimated at temperatures other than those at which simulations were run. We illustrate this method by applying it to the generation of a Markov model of the conformational dynamics of the solvated terminally blocked alanine peptide.
Parallel code NSBC: Simulations of relativistic nuclei scattering by a bent crystal
NASA Astrophysics Data System (ADS)
Babaev, A. A.
2014-01-01
The presented program was designed to simulate the passage of relativistic nuclei through a bent crystal. Namely, the input data is related to a nuclei beam. The nuclei move into the crystal under planar channeling and quasichanneling conditions. The program realizes the numerical algorithm to evaluate the trajectory of nucleus in the bent crystal. The program output is formed by the projectile motion data including the angular distribution of nuclei behind the crystal. The program could be useful to simulate the particle tracking at the accelerator facilities used the crystal collimation systems. The code has been written on C++ and designed for the multiprocessor systems (clusters).
Parallel adaptive Cartesian upwind methods for shock-driven multiphysics simulation
Deiterding, Ralf
2011-01-01
The multiphysics fluid-structure interaction simulation of shock-loaded thin-walled structures requires the dynamic coupling of a shock-capturing flow solver to a solid mechanics solver for large deformations. By combining a Cartesian embedded boundary approach with dynamic mesh adaptation a generic software framework for such flow solvers has been constructed that allows easy exchange of the specific hydrodynamic finite volume upwind scheme and coupling to various explicit finite element solid dynamics solvers. The paper gives an overview of the computational approach and presents first simulations that couple the software to the general purpose solid dynamics code DYNA3D.
2013-08-01
as elastic constants, minimized crystal structures from various loading conditions and generalized stacking fault energy surfaces of the minimized... structures . Detailed descriptions of Matlab scripts are also provided for post-processing the simulation data. These procedures are well suited to...minimization of the experimental crystal structure . Each step in the flowchart references the section containing the description of the files used
1985-11-18
A R. Newton reviewed this thesis, and through discussion offered many suggestions on circuit simulation techniques. Along with Prof. Desoer and Prof...Berkeley, Electronics Research Laboratory, 1985 ’ [35] W. Rudin, Functional Analysis, McGraw-HiD, 1969. [36] C. A. Desoer and E. S. Kuh, Basic
Massively Parallel Simulation of Uranium Migration at the Hanford 300 Area
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.
2009-12-01
Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to simulate uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to simulate geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to simulating radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration direction, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components simulated, 3D simulation of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.
Byers, J.A.; Williams, T.J.; Cohen, B.I.; Dimits, A.M.
1994-04-27
One of the programs of the Magnetic fusion Energy (MFE) Theory and computations Program is studying the anomalous transport of thermal energy across the field lines in the core of a tokamak. We use the method of gyrokinetic particle-in-cell simulation in this study. For this LDRD project we employed massively parallel processing, new algorithms, and new algorithms, and new formal techniques to improve this research. Specifically, we sought to take steps toward: researching experimentally-relevant parameters in our simulations, learning parallel computing to have as a resource for our group, and achieving a 100 {times} speedup over our starting-point Cray2 simulation code`s performance.
``SAFFMAN-TAYLOR'' Finger in 2d Parallel Viscous: BGK Lattice Gas Simulations
NASA Astrophysics Data System (ADS)
Salin, Dominique; Rakotomalala, Nicole; Watzky, Philippe
1996-11-01
We study the displacement of miscible fluids between two parallel plates for different values of the Peclet number Pe and of the viscosity ratio M. The full Navier-Stokes problem is addressed. We use the BGK lattice gas method, which is well suited for miscible fluids and allows to introduce molecular diffusion at the microscopic scale of the lattice. This numerical experiment leads to a symmetric concentration profile about the middle of the gap between the plates. At Pe numbers of the order of 1, mixing involves diffusion and advection in the flow direction. At large Pe, the fluids do not mix and an interface between them can be defined. Moreover, above M ~ 10, the interface becomes a well defined finger, the reduced width of which tends to λ_∞=0.56 at large values of M. Assuming that miscible fluids at high Pe numbers are similar to immiscible fluids at high capillary numbers, we find the analytical shape of the finger, using an extrapolation of the Reinelt-Saffman calculations for a Stokes immiscible flow. Surprisingly, the result is that our finger can be deduced from the celebrated Saffman-Taylor' s one, obtained in a potential flow, by a streching in the flow direction by a numerical factor of 2.125.
Massively parallel first-principles simulation of electron dynamics in materials
Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.; ...
2017-03-04
Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Parallelization of Rocket Engine Simulator Software (P.R.E.S.S.)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1999-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The project has started on October 19, 1995, and after a three-year period corresponding to project phases and fiscal-year funding by NASA Lewis Research Center (now Glenn Research Center), has ended on October 18, 1998. The one-year no-cost extension period was granted on June 7, 1998, until October 19, 1999. The aim of this one year no-cost extension period was to carry out further research to complete the work and lay the groundwork for subsequent research in the area of aerospace engine design optimization software tools. The previous progress for the research has been reported in great detail in respective interim and final research progress reports, seven of them, in all. While the purpose of this report is to be a final summary and an valuative view of the entire work since the first year funding, the following is a quick recap of the most important sections of the interim report dated April 30, 1999.
Modeling and simulation of a Stewart platform type parallel structure robot
NASA Technical Reports Server (NTRS)
Lim, Gee Kwang; Freeman, Robert A.; Tesar, Delbert
1989-01-01
The kinematics and dynamics of a Stewart Platform type parallel structure robot (NASA's Dynamic Docking Test System) were modeled using the method of kinematic influence coefficients (KIC) and isomorphic transformations of system dependence from one set of generalized coordinates to another. By specifying the end-effector (platform) time trajectory, the required generalized input forces which would theoretically yield the desired motion were determined. It was found that the relationship between the platform motion and the actuators motion was nonlinear. In addition, the contribution to the total generalized forces, required at the actuators, from the acceleration related terms were found to be more significant than the velocity related terms. Hence, the curve representing the total required actuator force generally resembled the curve for the acceleration related force. Another observation revealed that the acceleration related effective inertia matrix I sub dd had the tendency to decouple, with the elements on the main diagonal of I sub dd being larger than the off-diagonal elements, while the velocity related inertia power array P sub ddd did not show such tendency. This tendency results in the acceleration related force curve of a given actuator resembling the acceleration profile of that particular actuator. Furthermore, it was indicated that the effective inertia matrix for the legs is more decoupled than that for the platform. These observations provide essential information for further research to develop an effective control strategy for real-time control of the Dynamic Docking Test System.
A Study of Parallel Software Development with HPF and MPI for Composite Process Modeling Simulations
2011-01-01
molding ( RTM ) and its variants such as the vacuum-assisted resin transfer molding ( VARTM ). Both of these techniques use ber pre- forms made from...leading to warped ber tows. Failure to achieve appropriate vacuum in VARTM may lead to a less than desired ber fraction. Two prominent and potentially...and its physical accuracy and computational eciency used in our simulations is brie y discussed. 3.1 Resin Impregnation and Process Modeling in RTM
An Empirical Development of Parallelization Guidelines for Time-Driven Simulation
1989-12-01
ground-based lasers and space-based relay mirrors " BMD models only boost- phase defense engagements against strategic ballistic missiles A booster...engagement occurs in BMDSIM when the following conditions are met: 1. A booster is in boost phase and above a minimum engagement altitude 2. A...trajectories. The boost phase trajectory is based on curve fits of detailed ICBM and SLBM trajec- tory simulation data. Overall trajectory and orbital
NASA Astrophysics Data System (ADS)
Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.
2011-12-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Design and Sensitivity Analysis Simulation of a Novel 3D Force Sensor Based on a Parallel Mechanism
Yang, Eileen Chih-Ying
2016-01-01
Automated force measurement is one of the most important technologies in realizing intelligent automation systems. However, while many methods are available for micro-force sensing, measuring large three-dimensional (3D) forces and loads remains a significant challenge. Accordingly, the present study proposes a novel 3D force sensor based on a parallel mechanism. The transformation function and sensitivity index of the proposed sensor are analytically derived. The simulation results show that the sensor has a larger effective measuring capability than traditional force sensors. Moreover, the sensor has a greater measurement sensitivity for horizontal forces than for vertical forces over most of the measurable force region. In other words, compared to traditional force sensors, the proposed sensor is more sensitive to shear forces than normal forces. PMID:27999246
NASA Astrophysics Data System (ADS)
Theodorakis, Panagiotis E.; Dellago, Christoph; Kahl, Gerhard
2013-01-01
We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006), 10.1088/0953-8984/18/26/L02] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong directionality of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a parallel replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard simulations in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings.
NASA Astrophysics Data System (ADS)
Ghosal, Ashitava; Shyam, R. B. Ashith
2016-05-01
There is an increased thrust to harvest solar energy in India to meet increasing energy requirements and to minimize imported fossil fuels. In a solar power tower system, an array of tracking mirrors or heliostats are used to concentrate the incident solar energy on an elevated stationary receiver and then the thermal energy converted to electricity using a heat engine. The conventional method of tracking are the Azimuth-Elevation (Az-El) or Target-Aligned (T-A) mount. In both the cases, the mirror is rotated about two mutually perpendicular axes and is supported at the center using a pedestal which is fixed to the ground. In this paper, a three degree-of-freedom parallel manipulator, namely the 3-RPS, is proposed for tracking the sun in a solar power tower system. We present modeling, simulation and design of the 3-RPS parallel manipulator and show its advantages over conventional Az-El and T-A mounts. The 3-RPS manipulator consists of three rotary (R), three prismatic (P) and three spherical (S) joints and the mirror assembly is mounted at three points in contrast to the Az-El and T-A mounts. The kinematic equations for sun tracking are derived for the 3-RPS manipulator and from the simulations, we obtain the range of motion of the rotary, prismatic and spherical joints. Since the mirror assembly is mounted at three points, the wind load and self-weight are distributed and as a consequence, the deflections due to loading are smaller than in conventional mounts. It is shown that the weight of the supporting structure is between 15% and 65% less than that of conventional systems. Hence, even though one additional actuator is used, the larger area mirrors can be used and costs can be reduced.
Wendel, D. E.; Olson, D. K.; Hesse, M.; Kuznetsova, M.; Adrian, M. L.; Aunai, N.; Karimabadi, H.; Daughton, W.
2013-12-15
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first–order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
A Modified Parallel Tree Code for N-Body Simulation of the Large-Scale Structure of the Universe
NASA Astrophysics Data System (ADS)
Becciani, U.; Antonuccio-Delogu, V.; Gambera, M.
2000-09-01
N-body codes for performing simulations of the origin and evolution of the large-scale structure of the universe have improved significantly over the past decade in terms of both the resolution achieved and the reduction of the CPU time. However, state-of-the-art N-body codes hardly allow one to deal with particle numbers larger than a few 107, even on the largest parallel systems. In order to allow simulations with larger resolution, we have first reconsidered the grouping strategy as described in J. Barnes (1990, J. Comput. Phys. 87, 161) (hereafter B90) and applied it with some modifications to our WDSH-PT (Work and Data SHaring-Parallel Tree) code (U. Becciani et al., 1996, Comput. Phys. Comm. 99, 1). In the first part of this paper we will give a short description of the code adopting the algorithm of J. E. Barnes and P. Hut (1986, Nature 324, 446) and in particular the memory and work distribution strategy applied to describe the data distribution on a CC-NUMA machine like the CRAY-T3E system. In very large simulations (typically N>=107), due to network contention and the formation of clusters of galaxies, an uneven load easily verifies. To remedy this, we have devised an automatic work redistribution mechanism which provided a good dynamic load balance without adding significant overhead. In the second part of the paper we describe the modification to the Barnes grouping strategy we have devised to improve the performance of the WDSH-PT code. We will use the property that nearby particles have similar interaction lists. This idea has been checked in B90, where an interaction list is built which applies everywhere within a cell Cgroup containing a small number of particles Ncrit. B90 reuses this interaction list for each particle p∈Cgroup in the cell in turn. We will assume each particle p to have the same interaction list. We consider that the agent force Fp on a particle p can be decomposed into two terms Fp=Ffar+Fnear. The first term Ffar is the same for
A comparison between parallelization approaches in molecular dynamics simulations on GPUs.
Rovigatti, Lorenzo; Sulc, Petr; Reguly, István Z; Romano, Flavio
2015-01-05
We test the relative performances of two different approaches to the computation of forces for molecular dynamics simulations on graphics processing units. A "vertex-based" approach, where a computing thread is started per particle, is compared to an "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former is more efficient for systems with many simple interactions per particle while the latter is more efficient if the system has more complicated interactions or fewer of them. By comparing computation times on more and less recent graphics processing unit technology, we predict that, if the current trend of increasing the number of processing cores--as opposed to their computing power--remains, the "edge-based" approach will gradually become the most efficient choice in an increasing number of cases.
A Moving Window Technique in Parallel Finite Element Time Domain Electromagnetic Simulation
Lee, Lie-Quan; Candel, Arno; Ng, Cho; Ko, Kwok; /SLAC
2010-06-07
A moving window technique for the finite element time domain (FETD) method is developed to simulate the propagation of electromagnetic waves induced by the transit of a charged particle beam inside large and long structures. The window moving along with the beam in the computational domain adopts high-order finite-element basis functions through p refinement and/or a high-resolution mesh through h refinement so that a sufficient accuracy is attained with substantially reduced computational costs. Algorithms to transfer discretized fields from one mesh to another, which are the key to implementing a moving window in a finite-element unstructured mesh, are presented. Numerical experiments are carried out using the moving window technique to compute short-range wakefields in long accelerator structures. The results are compared with those obtained from the normal FETD method and the advantages of using the moving window technique are discussed.
Simulation of Ionospheric E-Region Plasma Turbulence with a Massively Parallel Hybrid PIC/Fluid Code
NASA Astrophysics Data System (ADS)
Young, M.; Oppenheim, M. M.; Dimant, Y. S.
2015-12-01
The Farley-Buneman (FB) and gradient drift (GD) instabilities are plasma instabilities that occur at roughly 100 km in the equatorial E-region ionosphere. They develop when ion-neutral collisions dominate ion motion while electron motion is affected by both electron-neutral collisions and the background magnetic field. GD drift waves grow when the background density gradient and electric field are aligned; FB waves grow when the background electric field causes electrons to E × B drift with a speed slightly larger than the ion acoustic speed. Theory predicts that FB and GD turbulence should develop in the same plasma volume when GD waves create a perturbation electric field that exceeds the threshold value for FB turbulence. However, ionospheric radars, which regularly observe meter-scale irregularities associated with FB turbulence, must infer kilometer-scale GD dynamics rather than observe them directly. Numerical simulations have been unable to simultaneously resolve GD and FB structure. We present results from a parallelized hybrid simulation that uses a particle-in-cell (PIC) method for ions while modeling electrons as an inertialess, quasi-neutral fluid. This approach allows us to reach length scales of hundreds of meters to kilometers with sub-meter resolution, but requires solving a large linear system derived from an elliptic PDE that depends on plasma density, ion flux, and electron parameters. We solve the resultant linear system at each time step via the Portable Extensible Toolkit for Scientific Computing (PETSc). We compare results of simulated FB turbulence from this model to results from a thoroughly tested PIC code and describe progress toward the first simultaneous simulations of FB and GD instabilities. This model has immediate applications to radar observations of the E-region ionosphere, as well as potential applications to the F-region ionosphere and the chromosphere of the Sun.
Research in parallel computing
NASA Technical Reports Server (NTRS)
Ortega, James M.; Henderson, Charles
1994-01-01
This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-01-01
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-08-13
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs.
NASA Astrophysics Data System (ADS)
Shay, M. A.; Drake, J. F.
2009-12-01
In a recent substorm case study using THEMIS data [1], it was inferred that auroral intensification occurred 96 seconds after reconnection onset initiated a substorm in the magnetotail. These conclusions have been the subject of some controversy [2,3]. The time delay between reconnection and auroral intensification requires a propagation speed significantly faster than can be explained by Alfvén waves. Kinetic Alfvén waves, however, can be much faster and could possibly explain the time lag. To test this possiblity, we simulate large scale reconnection events with the kinetic PIC code P3D and examine the disturbances on a magnetic field line as it propagates through a reconnection region. In the regions near the separatrices but relatively far from the x-line, the propagation physics is expected to be governed by the physics of kinetic Alfvén waves. Indeed, we find that the propagation speed of the magnetic disturbance roughly scales with kinetic Alfvén speeds. We also examine energization of electrons due to this disturbance. Consequences for our understanding of substorms will be discussed. [1] Angelopoulos, V. et al., Science, 321, 931, 2008. [2] Lui, A. T. Y., Science, 324, 1391-b, 2009. [3] Angelopoulos, V. et al., Science, 324, 1391-c, 2009.
NASA Astrophysics Data System (ADS)
Poulet, Thomas; Paesold, Martin; Veveakis, Manolis
2017-03-01
Faults play a major role in many economically and environmentally important geological systems, ranging from impermeable seals in petroleum reservoirs to fluid pathways in ore-forming hydrothermal systems. Their behavior is therefore widely studied and fault mechanics is particularly focused on the mechanisms explaining their transient evolution. Single faults can change in time from seals to open channels as they become seismically active and various models have recently been presented to explain the driving forces responsible for such transitions. A model of particular interest is the multi-physics oscillator of Alevizos et al. (J Geophys Res Solid Earth 119(6), 4558-4582, 2014) which extends the traditional rate and state friction approach to rate and temperature-dependent ductile rocks, and has been successfully applied to explain spatial features of exposed thrusts as well as temporal evolutions of current subduction zones. In this contribution we implement that model in REDBACK, a parallel open-source multi-physics simulator developed to solve such geological instabilities in three dimensions. The resolution of the underlying system of equations in a tightly coupled manner allows REDBACK to capture appropriately the various theoretical regimes of the system, including the periodic and non-periodic instabilities. REDBACK can then be used to simulate the drastic permeability evolution in time of such systems, where nominally impermeable faults can sporadically become fluid pathways, with permeability increases of several orders of magnitude.
NASA Technical Reports Server (NTRS)
Bruno, John
1984-01-01
The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.
Rodgers, A; Matzel, E; Pasyanos, M; Petersson, A; Sjogreen, B; Bono, C; Vorobiev, O; Antoun, T; Walter, W; Myers, S; Lomov, I
2008-07-07
The development of accurate numerical methods to simulate wave propagation in three-dimensional (3D) earth models and advances in computational power offer exciting possibilities for modeling the motions excited by underground nuclear explosions. This presentation will describe recent work to use new numerical techniques and parallel computing to model earthquakes and underground explosions to improve understanding of the wave excitation at the source and path-propagation effects. Firstly, we are using the spectral element method (SEM, SPECFEM3D code of Komatitsch and Tromp, 2002) to model earthquakes and explosions at regional distances using available 3D models. SPECFEM3D simulates anelastic wave propagation in fully 3D earth models in spherical geometry with the ability to account for free surface topography, anisotropy, ellipticity, rotation and gravity. Results show in many cases that 3D models are able to reproduce features of the observed seismograms that arise from path-propagation effects (e.g. enhanced surface wave dispersion, refraction, amplitude variations from focusing and defocusing, tangential component energy from isotropic sources). We are currently investigating the ability of different 3D models to predict path-specific seismograms as a function of frequency. A number of models developed using a variety of methodologies are available for testing. These include the WENA/Unified model of Eurasia (e.g. Pasyanos et al 2004), the global CUB 2.0 model (Shapiro and Ritzwoller, 2002), the partitioned waveform model for the Mediterranean (van der Lee et al., 2007) and stochastic models of the Yellow Sea Korean Peninsula region (Pasyanos et al., 2006). Secondly, we are extending our Cartesian anelastic finite difference code (WPP of Nilsson et al., 2007) to model the effects of free-surface topography. WPP models anelastic wave propagation in fully 3D earth models using mesh refinement to increase computational speed and improve memory efficiency. Thirdly
NASA Astrophysics Data System (ADS)
Lockard, David Patrick
This thesis makes contributions towards the use of computational aeroacoustics (CAA) as a tool for noise analysis. CAA uses numerical methods to simulate acoustic phenomena. CAA algorithms have been shown to reproduce wave propagation much better than traditional computational fluid dynamics (CFD) methods. In the current approach, a finite-difference, time-domain algorithm is used to simulate unsteady, compressible flows. Dispersion-relation-preserving methodology is used to extend the range of frequencies that can be represented properly by the scheme. Since CAA algorithms are relatively inefficient at obtaining a steady-state solution, multigrid methods are applied to accelerate the convergence. All of the calculations are performed on parallel computers. Excellent speedup ratios are obtained for the explicit, time-stepping algorithm used in this research. A common problem in the area of broadband noise is the prediction of the acoustic field generated by a vortical gust impinging on a solid body. The problem is modeled initially in two-dimensions by a flat plate experiencing a uniform mean flow with a sinusoidal, vertical velocity perturbation. Good agreement is obtained with results from semi-analytic methods for several gust frequencies. Then, a cascade of plates is used to simulate a turbomachinery blade row. A new approach is used to impose the vortical disturbance inside the computational domain rather than imposing it at the computational boundary. The influence of the mean flow on the radiated noise is examined by considering NACA0012 and RAE2822 airfoils. After a steady-state is obtained from the multigrid method, the un-steady simulation is used to model the vortical gust's interaction with the airfoil. The mean loading on the airfoil is shown to have a significant effect on the directivity of the sound with the strongest influence observed for high frequencies. Camber is shown to have a similar effect as the angle of attack. A three-dimensional problem
NASA Astrophysics Data System (ADS)
Dindar, Saleh; Ford, Eric B.; Juric, Mario; Yeo, Young In; Gao, Jianwei; Boley, Aaron C.; Nelson, Benjamin; Peters, Jörg
2013-10-01
We present Swarm-NG, a C++ library for the efficient direct integration of many n-body systems using a Graphics Processing Unit (GPU), such as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body simulations with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems with 3…15 bodies each, as is typical for the study of planetary systems. Swarm-NG parallelizes the simulation, including both the numerical integration of the equations of motion and the evaluation of forces using NVIDIA's "Compute Unified Device Architecture" (CUDA) on the GPU. Swarm-NG includes optimized implementations of 4th order time-symmetrized Hermite integration and mixed variable symplectic integration, as well as several sample codes for other algorithms to illustrate how non-CUDA-savvy users may themselves introduce customized integrators into the Swarm-NG framework. To optimize performance, we analyze the effect of GPU-specific parameters on performance under double precision. For an ensemble of 131072 planetary systems, each containing three bodies, the NVIDIA Tesla M2070 GPU outperforms a 6-core Intel Xeon X5675 CPU by a factor of ˜2.75. Thus, we conclude that modern GPUs offer an attractive alternative to a cluster of CPUs for the integration of an ensemble of many few-body systems. Applications of Swarm-NG include studying the late stages of planet formation, testing the stability of planetary systems and evaluating the goodness-of-fit between many planetary system models and observations of extrasolar planet host stars (e.g., radial velocity, astrometry, transit timing). While Swarm-NG focuses on the parallel integration of many planetary systems, the underlying integrators could be applied to a wide variety of problems that require repeatedly integrating a set of ordinary differential equations many times using different initial conditions and/or parameter values.
Parallel Wolff Cluster Algorithms
NASA Astrophysics Data System (ADS)
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
NASA Astrophysics Data System (ADS)
Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao
2017-01-01
Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is
NASA Astrophysics Data System (ADS)
Kato, Tsunehiko N.
2015-04-01
We herein investigate shock formation and particle acceleration processes for both protons and electrons in a quasi-parallel high-Mach-number collisionless shock through a long-term, large-scale, particle-in-cell simulation. We show that both protons and electrons are accelerated in the shock and that these accelerated particles generate large-amplitude Alfvénic waves in the upstream region of the shock. After the upstream waves have grown sufficiently, the local structure of the collisionless shock becomes substantially similar to that of a quasi-perpendicular shock due to the large transverse magnetic field of the waves. A fraction of protons are accelerated in the shock with a power-law-like energy distribution. The rate of proton injection to the acceleration process is approximately constant, and in the injection process, the phase-trapping mechanism for the protons by the upstream waves can play an important role. The dominant acceleration process is a Fermi-like process through repeated shock crossings of the protons. This process is a “fast” process in the sense that the time required for most of the accelerated protons to complete one cycle of the acceleration process is much shorter than the diffusion time. A fraction of the electrons are also accelerated by the same mechanism, and have a power-law-like energy distribution. However, the injection does not enter a steady state during the simulation, which may be related to the intermittent activity of the upstream waves. Upstream of the shock, a fraction of the electrons are pre-accelerated before reaching the shock, which may contribute to steady electron injection at a later time.
Kato, Tsunehiko N.
2015-04-01
We herein investigate shock formation and particle acceleration processes for both protons and electrons in a quasi-parallel high-Mach-number collisionless shock through a long-term, large-scale, particle-in-cell simulation. We show that both protons and electrons are accelerated in the shock and that these accelerated particles generate large-amplitude Alfvénic waves in the upstream region of the shock. After the upstream waves have grown sufficiently, the local structure of the collisionless shock becomes substantially similar to that of a quasi-perpendicular shock due to the large transverse magnetic field of the waves. A fraction of protons are accelerated in the shock with a power-law-like energy distribution. The rate of proton injection to the acceleration process is approximately constant, and in the injection process, the phase-trapping mechanism for the protons by the upstream waves can play an important role. The dominant acceleration process is a Fermi-like process through repeated shock crossings of the protons. This process is a “fast” process in the sense that the time required for most of the accelerated protons to complete one cycle of the acceleration process is much shorter than the diffusion time. A fraction of the electrons are also accelerated by the same mechanism, and have a power-law-like energy distribution. However, the injection does not enter a steady state during the simulation, which may be related to the intermittent activity of the upstream waves. Upstream of the shock, a fraction of the electrons are pre-accelerated before reaching the shock, which may contribute to steady electron injection at a later time.
NASA Astrophysics Data System (ADS)
Yuan, G.; Wang, D. H.
2017-03-01
Multi-directional and multi-degree-of-freedom (multi-DOF) vibration energy harvesting are attracting more and more research interest in recent years. In this paper, the principle of a piezoelectric six-DOF vibration energy harvester based on parallel mechanism is proposed to convert the energy of the six-DOF vibration to single-DOF vibrations of the limbs on the energy harvester and output voltages. The dynamic model of the piezoelectric six-DOF vibration energy harvester is established to estimate the vibrations of the limbs. On this basis, a Stewart-type piezoelectric six-DOF vibration energy harvester is developed and explored. In order to validate the established dynamic model and the analysis results, the simulation model of the Stewart-type piezoelectric six-DOF vibration energy harvester is built and tested with different vibration excitations by SimMechanics, and some preliminary experiments are carried out. The results show that the vibration of the limbs on the piezoelectric six-DOF vibration energy harvester can be estimated by the established dynamic model. The developed Stewart-type piezoelectric six-DOF vibration energy harvester can harvest the energy of multi-directional linear vibration and multi-axis rotating vibration with resonance frequencies of 17 Hz, 25 Hz, and 47 Hz. Moreover, the resonance frequencies of the developed piezoelectric six-DOF vibration energy harvester are not affected by the direction changing of the vibration excitation.
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations
Osei-Kuffuor, Daniel; Fattebert, Jean-Luc
2014-01-01
Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N^{3}) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix, based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.
NASA Astrophysics Data System (ADS)
Kordilla, J.; Shigorina, E.; Tartakovsky, A. M.; Pan, W.; Geyer, T.
2015-12-01
Under idealized conditions (smooth surfaces, linear relationship between Bond number and Capillary number of droplets) steady-state flow modes on fracture surfaces have been shown to develop from sliding droplets to rivulets and finally (wavy) film flow, depending on the specified flux. In a recent study we demonstrated the effect of surface roughness on droplet flow in unsaturated wide aperture fractures, however, its effect on other prevailing flow modes is still an open question. The objective of this work is to investigate the formation of complex flow modes on fracture surfaces employing an efficient three-dimensional parallelized SPH model. The model is able to simulate highly intermittent, gravity-driven free-surface flows under dynamic wetting conditions. The effect of surface tension is included via efficient pairwise interaction forces. We validate the model using various analytical and semi-analytical relationships for droplet and complex flow dynamics. To investigate the effect of surface roughness on flow dynamics we construct surfaces with a self-affine fractal geometry and roughness characterized by the Hurst exponent. We demonstrate the effect of surface roughness (on macroscopic scales this can be understood as a tortuosity) on the steady-state distribution of flow modes. Furthermore we show the influence of a wide range of natural wetting conditions (defined by static contact angles) on the final distribution of surface coverage, which is of high importance for matrix-fracture interaction processes.
NASA Astrophysics Data System (ADS)
Karrech, Ali; Schrank, Christoph; Regenauer-Lieb, Klaus
2015-10-01
Massive fluid injections into the earth's upper crust are commonly used to stimulate permeability in geothermal reservoirs, enhance recovery in oil reservoirs, store carbon dioxide and so forth. Currently used models for reservoir simulation are limited to small perturbations and/or hydraulic aspects that are insufficient to describe the complex thermal-hydraulic-mechanical behaviour of natural geomaterials. Comprehensive approaches, which take into account the non-linear mechanical deformations of rock masses, fluid flow in percolating pore spaces, and changes of temperature due to heat transfer, are necessary to predict the behaviour of deep geo-materials subjected to high pressure and temperature changes. In this paper, we introduce a thermodynamically consistent poromechanics formulation which includes coupled thermal, hydraulic and mechanical processes. Moreover, we propose a numerical integration strategy based on massively parallel computing. The proposed formulations and numerical integration are validated using analytical solutions of simple multi-physics problems. As a representative application, we investigate the massive injection of fluids within deep formation to mimic the conditions of reservoir stimulation. The model showed, for instance, the effects of initial pre-existing stress fields on the orientations of stimulation-induced failures.
NASA Astrophysics Data System (ADS)
Zhang, Hong-Na; Li, Feng-Chen; Li, Xiao-Bin; Li, Dong-Yang; Cai, Wei-Hua; Yu, Bo
2016-09-01
Direct numerical simulations (DNSs) of purely elastic turbulence in rectilinear shear flows in a three-dimensional (3D) parallel plate channel were carried out, by which numerical databases were established. Based on the numerical databases, the present paper analyzed the structural and statistical characteristics of the elastic turbulence including flow patterns, the wall effect on the turbulent kinetic energy spectrum, and the local relationship between the flow motion and the microstructures’ behavior. Moreover, to address the underlying physical mechanism of elastic turbulence, its generation was presented in terms of the global energy budget. The results showed that the flow structures in elastic turbulence were 3D with spatial scales on the order of the geometrical characteristic length, and vortex tubes were more likely to be embedded in the regions where the polymers were strongly stretched. In addition, the patterns of microstructures’ elongation behave like a filament. From the results of the turbulent kinetic energy budget, it was found that the continuous energy releasing from the polymers into the main flow was the main source of the generation and maintenance of the elastic turbulent status. Project supported by the National Natural Science Foundation of China (Grant Nos. 51276046 and 51506037), the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 51421063), the China Postdoctoral Science Foundation (Grant No. 2016M591526), the Heilongjiang Postdoctoral Fund, China (Grant No. LBH-Z15063), and the China Postdoctoral International Exchange Program.
English, Niall J
2014-12-21
Ice crystallisation and melting was studied via massively parallel molecular dynamics under periodic boundary conditions, using approximately spherical ice nano-particles (both "isolated" and as a series of heterogeneous "seeds") of varying size, surrounded by liquid water and at a variety of temperatures. These studies were performed for a series of systems ranging in size from ∼1 × 10(6) to 8.6 × 10(6) molecules, in order to establish system-size effects upon the nano-clusters" crystallisation and dissociation kinetics. Both "traditional" four-site and "single-site" and water models were used, with and without formal point charges, dipoles, and electrostatics, respectively. Simulations were carried out in the microcanonical and isothermal-isobaric ensembles, to assess the influence of "artificial" thermo- and baro-statting, and important disparities were observed, which declined upon using larger systems. It was found that there was a dependence upon system size for both ice growth and dissociation, in that larger systems favoured slower growth and more rapid melting, given the lower extent of "communication" of ice nano-crystallites with their periodic replicae in neighbouring boxes. Although the single-site model exhibited less variation with system size vis-à-vis the multiple-site representation with explicit electrostatics, its crystallisation-dissociation kinetics was artificially fast.
NASA Astrophysics Data System (ADS)
English, Niall J.
2014-12-01
Ice crystallisation and melting was studied via massively parallel molecular dynamics under periodic boundary conditions, using approximately spherical ice nano-particles (both "isolated" and as a series of heterogeneous "seeds") of varying size, surrounded by liquid water and at a variety of temperatures. These studies were performed for a series of systems ranging in size from ˜1 × 106 to 8.6 × 106 molecules, in order to establish system-size effects upon the nano-clusters" crystallisation and dissociation kinetics. Both "traditional" four-site and "single-site" and water models were used, with and without formal point charges, dipoles, and electrostatics, respectively. Simulations were carried out in the microcanonical and isothermal-isobaric ensembles, to assess the influence of "artificial" thermo- and baro-statting, and important disparities were observed, which declined upon using larger systems. It was found that there was a dependence upon system size for both ice growth and dissociation, in that larger systems favoured slower growth and more rapid melting, given the lower extent of "communication" of ice nano-crystallites with their periodic replicae in neighbouring boxes. Although the single-site model exhibited less variation with system size vis-à-vis the multiple-site representation with explicit electrostatics, its crystallisation-dissociation kinetics was artificially fast.
NASA Astrophysics Data System (ADS)
Wendel, D. E.; Olson, D. K.; Hesse, M.; Karimabadi, H.; Daughton, W. S.
2013-12-01
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection of a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of topological features such as separators and null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a correspondence between the locus of changes in magnetic connectivity, or the quasi-separatrix layer, and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we compare the distribution of parallel electric fields along field lines with the reconnection rate. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field, while the contribution from high amplitude parallel fluctuations, such as electron holes, is negligible. The results impact the determination of reconnection sites within models of 3D turbulent reconnection as well as the inference of reconnection rates from in situ spacecraft measurements. It is difficult through direct observation to isolate the locus of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the partial sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao; Winterfeld, Philip H.; Zhang, Keni; Wu, Yu-Shu
2013-12-01
TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporated into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.
NASA Astrophysics Data System (ADS)
Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime
In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively.
Wong, William W L; Feng, Zeny Z; Thein, Hla-Hla
2016-11-01
Agent-based models (ABMs) are computer simulation models that define interactions among agents and simulate emergent behaviors that arise from the ensemble of local decisions. ABMs have been increasingly used to examine trends in infectious disease epidemiology. However, the main limitation of ABMs is the high computational cost for a large-scale simulation. To improve the computational efficiency for large-scale ABM simulations, we built a parallelizable sliding region algorithm (SRA) for ABM and compared it to a nonparallelizable ABM. We developed a complex agent network and performed two simulations to model hepatitis C epidemics based on the real demographic data from Saskatchewan, Canada. The first simulation used the SRA that processed on each postal code subregion subsequently. The second simulation processed the entire population simultaneously. It was concluded that the parallelizable SRA showed computational time saving with comparable results in a province-wide simulation. Using the same method, SRA can be generalized for performing a country-wide simulation. Thus, this parallel algorithm enables the possibility of using ABM for large-scale simulation with limited computational resources.
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.
2005-03-15
The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.
NASA Astrophysics Data System (ADS)
Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.
2005-03-01
The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.
Xu, Zuwei; Zhao, Haibo Zheng, Chuguang
2015-01-15
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are
NASA Astrophysics Data System (ADS)
Xu, Zuwei; Zhao, Haibo; Zheng, Chuguang
2015-01-01
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance-rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are
NASA Astrophysics Data System (ADS)
Lee, Young-Jin; Kim, Dae-Hong; Kim, Hee-Joung
2014-04-01
In nuclear medicine, the use of a pixelated semiconductor detector with cadmium telluride (CdTe) or cadmium zinc telluride (CdZnTe) is of growing interest for new devices. Especially, the spatial resolution can be improved by using a pixelated parallel-hole collimator with equal holes and pixel sizes based on the above-mentioned detector. High-absorption and high-stopping-power pixelated parallel-hole collimator materials are often chosen because of their good spatial resolution. Capturing more gamma rays, however, may result in decreased sensitivity with the same collimator geometric designs. Therefore, a trade-off between spatial resolution and sensitivity is very important in nuclear medicine imaging. The purpose of this study was to compare spatial resolutions using a pixelated semiconductor single photon emission computed tomography (SPECT) system with lead, tungsten, gold, and depleted uranium pixelated parallel-hole collimators at equal sensitivity. We performed a simulation study of the PID 350 (Ajat Oy Ltd., Finland) CdTe pixelated semiconductor detector (pixel size: 0.35 × 0.35 mm2) by using a Geant4 Application for Tomographic Emission (GATE) simulation. Spatial resolutions were measured with different collimator materials at equivalent sensitivities. Additionally, hot-rod phantom images were acquired for each source-to-collimator distance by using a GATE simulation. At equivalent sensitivities, measured averages of the full width at half maximum (FWHM) using lead, tungsten, and gold were 4.32, 2.93, and 2.23% higher than that of depleted uranium, respectively. Furthermore, for the full width at tenth maximum (FWTM), measured averages when using lead, tungsten, and gold were 6.29, 4.10, and 2.65% higher than that of depleted uranium, respectively. Although, the spatial resolution showed little differences among the different pixelated parallel-hole collimator materials, lower absorption and stopping power materials such as lead and tungsten had
NASA Astrophysics Data System (ADS)
Fang, Bin; Martyna, Glenn; Deng, Yuefan
2007-08-01
In order to model complex heterogeneous biophysical macrostructures with non-trivial charge distributions such as globular proteins in water, it is important to evaluate the long range forces present in these systems accurately and efficiently. The Smooth Particle Mesh Ewald summation technique (SPME) is commonly used to determine the long range part of electrostatic energy in large scale molecular simulations. While the SPME technique does not give rise to a performance bottleneck on a single processor, current implementations of SPME on massively parallel, supercomputers become problematic at large processor numbers, limiting the time and length scales that can be reached. Here, a synergistic investigation involving method improvement, parallel programming and novel architectures is employed to address this difficulty. A relatively simple modification of the SPME technique is described which gives rise to both improved accuracy and efficiency on both massively parallel and scalar computing platforms. Our fine grained parallel implementation of the modified SPME method for the novel QCDOC supercomputer with its 6D-torus architecture is then given. Numerical tests of algorithm performance on up to 1024 processors of the QCDOC machine at BNL are presented for two systems of interest, a β-hairpin solvated in explicit water, a system which consists of 1142 water molecules and a 20 residue protein for a total of 3579 atoms, and the HIV-1 protease solvated in explicit water, a system which consists of 9331 water molecules and a 198 residue protein for a total of 29508 atoms.
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
NASA Astrophysics Data System (ADS)
Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.
2009-12-01
One approach for immobilizing subsurface metal contaminants involves stimulating the in situ production of mineral phases that sequester or isolate contaminants. One example is using calcium carbonate to immobilize strontium. The success of such approaches depends on understanding how various processes of flow, transport, reaction and resulting porosity-permeability change couple in subsurface systems. Reactive transport models are often used for such purpose. Current subsurface reactive transport simulators typically involve a de-coupled solution approach, such as operator-splitting, that solves the transport equations for components and batch chemistry sequentially, which has limited applicability for many biogeochemical processes with fast kinetics and strong medium property-reaction interactions. A massively parallel, fully coupled, fully implicit reactive transport simulator has been developed based on a parallel multi-physics object oriented software environment computing framework (MOOSE) developed at the Idaho National Laboratory. Within this simulator, the system of transport and reaction equations is solved simultaneously in a fully coupled manner using the Jacobian Free Newton-Krylov (JFNK) method with preconditioning. The simulator was applied to model reactive transport in a one-dimensional column where conditions that favor calcium carbonate precipitation are generated by urea hydrolysis that is catalyzed by urease enzyme. Simulation results are compared to both laboratory column experiments and those obtained using the reactive transport simulator STOMP in terms of: the spatial and temporal distributions of precipitates and reaction rates and other major species in the reaction system; the changes in porosity and permeability; and the computing efficiency based on wall clock simulation time.
NASA Astrophysics Data System (ADS)
Lee, Y.-J.; Park, S.-J.; Lee, S.-W.; Kim, D.-H.; Kim, Y.-S.; Jo, B.-D.; Kim, H.-J.
2013-03-01
It is recommended that a pixelated parallel-hole collimator in which the hole and pixel sizes are equal be used to improve the sensitivity and spatial resolution when using a small pixel size and a single-photon emission computed tomography (SPECT) system with pixelated semiconductor detector materials (e.g., CdTe and CZT). However, some significant problems arise in the manufacturing of a pixelated parallel-hole collimator. Therefore, we sought to simulate a pixelated semiconductor SPECT system with various collimator geometric designs. The purpose of this study was to compare the quality of images generated with a pixelated semiconductor SPECT system simulated with pixelated parallel-hole collimators of various geometric designs. The sensitivity and spatial resolution of the various collimator geometric designs with varying septal heights and hole sizes were measured. Moreover, to evaluate the overall performance of the imaging system, a hot-rod phantom was designed using a Monte Carlo simulation. According to the results, the average sensitivity using a 15 mm septal height was 1.80, 2.87, and 4.16 times higher than that obtained with septal heights of 20, 25, and 30 mm, respectively. Also, the average spatial resolution using the 30 mm septal height was 44.33, 22.08, and 9.26% better than that attained with 15, 20, and 25 mm septal heights, respectively. When the results acquired with 0.3 and 0.6 mm hole sizes were compared, the average sensitivity with the 0.6 mm hole size was 3.97 times higher than that obtained with the 0.3 mm hole size, and the average spatial resolution with the 0.3 mm hole size was 45.76% better than that with the 0.6 mm hole size. We have presented the pixelated parallel-hole collimators of various collimator geometric designs and evaluations. Our results showed that the effect of various collimator geometric designs can be investigated by Monte Carlo simulation so as to evaluate the feasibility of a high resolution parallel
NASA Astrophysics Data System (ADS)
He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.
2015-10-01
The open-source scientific software packages OpenGeoSys and IPhreeqc have been coupled to set up and simulate thermo-hydro-mechanical-chemical coupled processes with simultaneous consideration of aqueous geochemical reactions faster and easier on high-performance computers. In combination with the elaborated and extendable chemical database of IPhreeqc, it will be possible to set up a wide range of multiphysics problems with numerous chemical reactions that are known to influence water quality in porous and fractured media. A flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand and the underlying processes such as for groundwater flow or solute transport on the other. This technical paper presents the implementation, verification, and parallelization scheme of the coupling interface, and discusses its performance and precision.
Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi
2016-08-05
The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Karpoukhin, Mikhii G.; Kogan, Boris Y.; Karplus, Walter J.
1995-01-01
The simulation of heart arrhythmia and fibrillation are very important and challenging tasks. The solution of these problems using sophisticated mathematical models is beyond the capabilities of modern super computers. To overcome these difficulties it is proposed to break the whole simulation problem into two tightly coupled stages: generation of the action potential using sophisticated models. and propagation of the action potential using simplified models. The well known simplified models are compared and modified to bring the rate of depolarization and action potential duration restitution closer to reality. The modified method of lines is used to parallelize the computational process. The conditions for the appearance of 2D spiral waves after the application of a premature beat and the subsequent traveling of the spiral wave inside the simulated tissue are studied.
Parallel processor engine model program
NASA Technical Reports Server (NTRS)
Mclaughlin, P.
1984-01-01
The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.
NASA Astrophysics Data System (ADS)
Lee, Y.; Kang, W.
2015-07-01
The pixelated semiconductor based on cadmium zinc telluride (CZT) is a promising imaging device that provides many benefits compared with conventional scintillation detectors. By using a high-resolution square parallel-hole collimator with a pixelated semiconductor detector, we were able to improve both sensitivity and spatial resolution. Here, we present a simulation of a CZT pixleated semiconductor single-photon emission computed tomography (SPECT) system with a high-resolution square parallel-hole collimator using various geometric designs of 0.5, 1.0, 1.5, and 2.0 mm X-axis hole size. We performed a simulation study of the eValuator-2500 (eV Microelectronics Inc., Saxonburg, PA, U.S.A.) CZT pixelated semiconductor detector using a Geant4 Application for Tomographic Emission (GATE). To evaluate the performances of these systems, the sensitivity and spatial resolution was evaluated. Moreover, to evaluate the overall performance of the imaging system, a hot-rod phantom was designed. Our results showed that the average sensitivity of the 2.0 mm collimator X-axis hole size was 1.34, 1.95, and 3.92 times higher than that of the 1.5, 1.0, and 0.5 mm collimator X-axis hole size, respectively. Also, the average spatial resolution of the 0.5 mm collimator X-axis hole size was 28.69, 44.65, and 55.73% better than that of the 1.0, 1.5, and 2.0 mm collimator X-axis hole size, respectively. We discuss the high-resolution square parallel-hole collimator of various collimator geometric designs and our evaluations. In conclusion, we have successfully designed a high-resolution square parallel-hole collimator with a CZT pixelated semiconductor SPECT system.
NASA Astrophysics Data System (ADS)
Hwang, Feng-Nan; Cai, Shang-Rong; Shao, Yun-Long; Wu, Jong-Shinn
2010-09-01
We investigate fully parallel Newton-Krylov-Schwarz (NKS) algorithms for solving the large sparse nonlinear systems of equations arising from the finite element discretization of the three-dimensional Poisson-Boltzmann equation (PBE), which is often used to describe the colloidal phenomena of an electric double layer around charged objects in colloidal and interfacial science. The NKS algorithm employs an inexact Newton method with backtracking (INB) as the nonlinear solver in conjunction with a Krylov subspace method as the linear solver for the corresponding Jacobian system. An overlapping Schwarz method as a preconditioner to accelerate the convergence of the linear solver. Two test cases including two isolated charged particles and two colloidal particles in a cylindrical pore are used as benchmark problems to validate the correctness of our parallel NKS-based PBE solver. In addition, a truly three-dimensional case, which models the interaction between two charged spherical particles within a rough charged micro-capillary, is simulated to demonstrate the applicability of our PBE solver to handle a problem with complex geometry. Finally, based on the result obtained from a PC cluster of parallel machines, we show numerically that NKS is quite suitable for the numerical simulation of interaction between colloidal particles, since NKS is robust in the sense that INB is able to converge within a small number of iterations regardless of the geometry, the mesh size, the number of processors. With help of an additive preconditioned Krylov subspace method NKS achieves parallel efficiency of 71% or better on up to a hundred processors for a 3D problem with 5 million unknowns.
1997-03-01
to construct their computer codes (written largely in FORTRAN) to exploit this type of architecture. Towards the end of the 1980s, however, vector...for exploiting parallel architectures using both single and overset grids in conjunction with typical grid-based PDE solvers in general and FVTD...8217. Furthermore, it naturally exploits the means by which the electr