Science.gov

Sample records for parallel direct simulation

  1. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  2. Massively Parallel Direct Simulation of Multiphase Flow

    SciTech Connect

    COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

    2000-08-10

    The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

  3. Parallel Performance Optimization of the Direct Simulation Monte Carlo Method

    NASA Astrophysics Data System (ADS)

    Gao, Da; Zhang, Chonglin; Schwartzentruber, Thomas

    2009-11-01

    Although the direct simulation Monte Carlo (DSMC) particle method is more computationally intensive compared to continuum methods, it is accurate for conditions ranging from continuum to free-molecular, accurate in highly non-equilibrium flow regions, and holds potential for incorporating advanced molecular-based models for gas-phase and gas-surface interactions. As available computer resources continue their rapid growth, the DSMC method is continually being applied to increasingly complex flow problems. Although processor clock speed continues to increase, a trend of increasing multi-core-per-node parallel architectures is emerging. To effectively utilize such current and future parallel computing systems, a combined shared/distributed memory parallel implementation (using both Open Multi-Processing (OpenMP) and Message Passing Interface (MPI)) of the DSMC method is under development. The parallel implementation of a new state-of-the-art 3D DSMC code employing an embedded 3-level Cartesian mesh will be outlined. The presentation will focus on performance optimization strategies for DSMC, which includes, but is not limited to, modified algorithm designs, practical code-tuning techniques, and parallel performance optimization. Specifically, key issues important to the DSMC shared memory (OpenMP) parallel performance are identified as (1) granularity (2) load balancing (3) locality and (4) synchronization. Challenges and solutions associated with these issues as they pertain to the DSMC method will be discussed.

  4. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  5. Parallel direct numerical simulation of three-dimensional spray formation

    NASA Astrophysics Data System (ADS)

    Chergui, Jalel; Juric, Damir; Shin, Seungwon; Kahouadji, Lyes; Matar, Omar

    2015-11-01

    We present numerical results for the breakup mechanism of a liquid jet surrounded by a fast coaxial flow of air with density ratio (water/air) ~ 1000 and kinematic viscosity ratio ~ 60. We use code BLUE, a three-dimensional, two-phase, high performance, parallel numerical code based on a hybrid Front-Tracking/Level Set algorithm for Lagrangian tracking of arbitrarily deformable phase interfaces and a precise treatment of surface tension forces. The parallelization of the code is based on the technique of domain decomposition where the velocity field is solved by a parallel GMRes method for the viscous terms and the pressure by a parallel multigrid/GMRes method. Communication is handled by MPI message passing procedures. The interface method is also parallelized and defines the interface both by a discontinuous density field as well as by a triangular Lagrangian mesh and allows the interface to undergo large deformations including the rupture and/or coalescence of interfaces. EPSRC Programme Grant, MEMPHIS, EP/K0039761/1.

  6. Scalable simulations for directed self-assembly patterning with the use of GPU parallel computing

    NASA Astrophysics Data System (ADS)

    Yoshimoto, Kenji; Peters, Brandon L.; Khaira, Gurdaman S.; de Pablo, Juan J.

    2012-03-01

    Directed self-assembly (DSA) patterning has been increasingly investigated as an alternative lithographic process for future technology nodes. One of the critical specs for DSA patterning is defects generated through annealing process or by roughness of pre-patterned structure. Due to their high sensitivity to the process and wafer conditions, however, characterization of those defects still remain challenging. DSA simulations can be a powerful tool to predict the formation of the DSA defects. In this work, we propose a new method to perform parallel computing of DSA Monte Carlo (MC) simulations. A consumer graphics card was used to access its hundreds of processing units for parallel computing. By partitioning the simulation system into non-interacting domains, we were able to run MC trial moves in parallel on multiple graphics-processing units (GPUs). Our results show a significant improvement in computational performance.

  7. Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

    NASA Technical Reports Server (NTRS)

    Joslin, Ronald D.; Zubair, Mohammad

    1993-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

  8. A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Owen, Jeffrey E.

    1988-01-01

    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

  9. Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

    NASA Technical Reports Server (NTRS)

    Morgan, Philip E.

    2004-01-01

    This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

  10. Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.

    2008-06-01

    An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.

  11. Direct numerical simulation of instabilities in parallel flow with spherical roughness elements

    NASA Technical Reports Server (NTRS)

    Deanna, R. G.

    1992-01-01

    Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.

  12. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  13. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  14. Current Trends in Numerical Simulation for Parallel Engineering Environments New Directions and Work-in-Progress

    SciTech Connect

    Trinitis, C; Schulz, M

    2006-06-29

    In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in parallel algorithms, simulation techniques, and software integration from multiple disciplines. ParSim brings together researchers from both application disciplines and computer science and aims at fostering closer cooperation between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. This offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, eleven papers from authors in nine countries were submitted to ParSim, and we selected five of them. They cover a wide range of different application fields including gas flow simulations, thermo-mechanical processes in nuclear waste storage, and cosmological simulations. At the same time, the selected contributions also address the computer science side of their codes and discuss different parallelization strategies, programming models and languages, as well as the use nonblocking collective operations in MPI. We are confident that this provides an attractive program and that ParSim will be an informal setting for lively discussions and for fostering new

  15. Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

    NASA Technical Reports Server (NTRS)

    Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

    1995-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.

  16. Parallel direct numerical simulation of wake vortex detection using monostatic and bistatic radio acoustic sounding systems

    NASA Astrophysics Data System (ADS)

    Boluriaan Esfahaani, Said

    A parallel two-dimensional code is developed in this thesis to numerically simulate wake vortex detection using a Radio Acoustic Sounding System (RASS). The Maxwell equations for media with non-uniform permittivity and the linearized Euler equations for media with non-uniform mean flow are the main framework for the simulations. The code is written in Fortran 90 with the Message Passing Interface (MPI) for parallel implementation. The main difficulty encountered with a time accurate simulation of a RASS is the number of samples required to resolve the Doppler shift in the scattered electromagnetic signal. Even for a 1D simulation with a typical scatterer size, the CPU time required to run the code is far beyond currently available computer resources. Two solutions that overcome this problem are described. In the first the actual electromagnetic wave propagation speed is replaced with a much lower value. This allows an explicit, time accurate numerical scheme to be used. In the second the governing differential equations are recast in order to remove the carrier frequency and solve only for the frequency shift using an implicit scheme with large time steps. The numerical stability characteristics of the resulting discretized equation with complex coefficients are examined. A number of cases for both the monostatic and bistatic configurations are considered. First, a uniform mean flow is considered and the RASS simulation is performed for two different types of incident acoustic field, namely a short single frequency acoustic pulse and a continuous broadband acoustic source. Both the explicit and implicit schemes are examined and the mean flow velocity is determined from the spectrum of the backscattered electromagnetic signal with very good accuracy. Second, the Taylor and Oseen vortex models are considered and their velocity field along the incident electromagnetic beam is retrieved. The Abel transform is then applied to the velocity profiles determined by both

  17. Effects of rotation on turbulent convection: Direct numerical simulation using parallel processors

    NASA Astrophysics Data System (ADS)

    Chan, Daniel Chiu-Leung

    A new parallel implicit adaptive mesh refinement (AMR) algorithm is developed for the prediction of unsteady behaviour of laminar flames. The scheme is applied to the solution of the system of partial-differential equations governing time-dependent, two- and three-dimensional, compressible laminar flows for reactive thermally perfect gaseous mixtures. A high-resolution finite-volume spatial discretization procedure is used to solve the conservation form of these equations on body-fitted multi-block hexahedral meshes. A local preconditioning technique is used to remove numerical stiffness and maintain solution accuracy for low-Mach-number, nearly incompressible flows. A flexible block-based octree data structure has been developed and is used to facilitate automatic solution-directed mesh adaptation according to physics-based refinement criteria. The data structure also enables an efficient and scalable parallel implementation via domain decomposition. The parallel implicit formulation makes use of a dual-time-stepping like approach with an implicit second-order backward discretization of the physical time, in which a Jacobian-free inexact Newton method with a preconditioned generalized minimal residual (GMRES) algorithm is used to solve the system of nonlinear algebraic equations arising from the temporal and spatial discretization procedures. An additive Schwarz global preconditioner is used in conjunction with block incomplete LU type local preconditioners for each sub-domain. The Schwarz preconditioning and block-based data structure readily allow efficient and scalable parallel implementations of the implicit AMR approach on distributed-memory multi-processor architectures. The scheme was applied to solutions of steady and unsteady laminar diffusion and premixed methane-air combustion and was found to accurately predict key flame characteristics. For a premixed flame under terrestrial gravity, the scheme accurately predicted the frequency of the natural

  18. Parallel system simulation

    SciTech Connect

    Tai, H.M.; Saeks, R.

    1984-03-01

    A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.

  19. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  20. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  1. Advancing predictive models for particulate formation in turbulent flames via massively parallel direct numerical simulations

    PubMed Central

    Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz

    2014-01-01

    Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412

  2. Advancing predictive models for particulate formation in turbulent flames via massively parallel direct numerical simulations.

    PubMed

    Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz

    2014-08-13

    Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412

  3. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  4. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  5. Efficiency of parallel direct optimization.

    PubMed

    Janies, D A; Wheeler, W C

    2001-03-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. PMID:12240679

  6. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  7. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  8. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  9. A new parallel simulation technique

    NASA Astrophysics Data System (ADS)

    Blanco-Pillado, Jose J.; Olum, Ken D.; Shlaer, Benjamin

    2012-01-01

    We develop a "semi-parallel" simulation technique suggested by Pretorius and Lehner, in which the simulation spacetime volume is divided into a large number of small 4-volumes that have only initial and final surfaces. Thus there is no two-way communication between processors, and the 4-volumes can be simulated independently and potentially at different times. This technique allows us to simulate much larger volumes than we otherwise could, because we are not limited by total memory size. No processor time is lost waiting for other processors. We compare a cosmic string simulation we developed using the semi-parallel technique with our previous MPI-based code for several test cases and find a factor of 2.6 improvement in the total amount of processor time required to accomplish the same job for strings evolving in the matter-dominated era.

  10. Parallelizing alternating direction implicit solver on GPUs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource con...

  11. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  12. Parallel Network Simulations with NEURON

    PubMed Central

    Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

    2009-01-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

  13. A Massively Parallel Hybrid Dusty-Gasdynamics and Kinetic Direct Simulation Monte Carlo Model for Planetary Applications

    NASA Technical Reports Server (NTRS)

    Combi, Michael R.

    2004-01-01

    In order to understand the global structure, dynamics, and physical and chemical processes occurring in the upper atmospheres, exospheres, and ionospheres of the Earth, the other planets, comets and planetary satellites and their interactions with their outer particles and fields environs, it is often necessary to address the fundamentally non-equilibrium aspects of the physical environment. These are regions where complex chemistry, energetics, and electromagnetic field influences are important. Traditional approaches are based largely on hydrodynamic or magnetohydrodynamic (MHD) formulations and are very important and highly useful. However, these methods often have limitations in rarefied physical regimes where the molecular collision rates and ion gyrofrequencies are small and where interactions with ionospheres and upper neutral atmospheres are important. At the University of Michigan we have an established base of experience and expertise in numerical simulations based on particle codes which address these physical regimes. The Principal Investigator, Dr. Michael Combi, has over 20 years of experience in the development of particle-kinetic and hybrid kinetichydrodynamics models and their direct use in data analysis. He has also worked in ground-based and space-based remote observational work and on spacecraft instrument teams. His research has involved studies of cometary atmospheres and ionospheres and their interaction with the solar wind, the neutral gas clouds escaping from Jupiter s moon Io, the interaction of the atmospheres/ionospheres of Io and Europa with Jupiter s corotating magnetosphere, as well as Earth s ionosphere. This report describes our progress during the year. The contained in section 2 of this report will serve as the basis of a paper describing the method and its application to the cometary coma that will be continued under a research and analysis grant that supports various applications of theoretical comet models to understanding the

  14. Icarus: A 2D direct simulation Monte Carlo (DSMC) code for parallel computers. User`s manual - V.3.0

    SciTech Connect

    Bartel, T.; Plimpton, S.; Johannes, J.; Payne, J.

    1996-10-01

    Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modelled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modelled using steric factors derived from Arrhenius reaction rates. Surface chemistry is modelled with surface reaction probabilities. The electron number density is either a fixed external generated field or determined using a local charge neutrality assumption. Ion chemistry is modelled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electrostatic fields can either be externally input or internally generated using a Langmuir-Tonks model. The Icarus software package includes the grid generation, parallel processor decomposition, postprocessing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. The majority of the software packages are written in standard Fortran.

  15. Xyce(™) Parallel Electronic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  16. Xyce(™) Parallel Electronic Simulator

    SciTech Connect

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.

  17. Parallel execution and scriptability in micromagnetic simulations

    NASA Astrophysics Data System (ADS)

    Fischbacher, Thomas; Franchin, Matteo; Bordignon, Giuliano; Knittel, Andreas; Fangohr, Hans

    2009-04-01

    We demonstrate the feasibility of an "encapsulated parallelism" approach toward micromagnetic simulations that combines offering a high degree of flexibility to the user with the efficient utilization of parallel computing resources. While parallelization is obviously desirable to address the high numerical effort required for realistic micromagnetic simulations through utilizing now widely available multiprocessor systems (including desktop multicore CPUs and computing clusters), conventional approaches toward parallelization impose strong restrictions on the structure of programs: numerical operations have to be executed across all processors in a synchronized fashion. This means that from the user's perspective, either the structure of the entire simulation is rigidly defined from the beginning and cannot be adjusted easily, or making modifications to the computation sequence requires advanced knowledge in parallel programming. We explain how this dilemma is resolved in the NMAG simulation package in such a way that the user can utilize without any additional effort on his side both the computational power of multiple CPUs and the flexibility to tailor execution sequences for specific problems: simulation scripts written for single-processor machines can just as well be executed on parallel machines and behave in precisely the same way, up to increased speed. We provide a simple instructive magnetic resonance simulation example that demonstrates utilizing both custom execution sequences and parallelism at the same time. Furthermore, we show that this strategy of encapsulating parallelism even allows to benefit from speed gains through parallel execution in simulations controlled by interactive commands given at a command line interface.

  18. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  19. Structured building model reduction toward parallel simulation

    SciTech Connect

    Dobbs, Justin R.; Hencey, Brondon M.

    2013-08-26

    Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

  20. Data parallel sorting for particle simulation

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1992-01-01

    Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

  1. Acoustic simulation in architecture with parallel algorithm

    NASA Astrophysics Data System (ADS)

    Li, Xiaohong; Zhang, Xinrong; Li, Dan

    2004-03-01

    In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

  2. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  3. A Parallel VLSI Direction Finding Algorithm

    NASA Astrophysics Data System (ADS)

    van der Veen, Alle-Jan; Deprettere, Ed F.

    1988-02-01

    In this paper, we present a parallel VLSI architecture that is matched to a class of direction (frequency, pole) finding algorithms of type ESPRIT. The problem is modeled in such a way that it allows an easy to partition full parallel VLSI implementation, using unitary transformations only. The hard problem, the generalized Schur decomposition of a matrix pencil, is tackled using a modified Stewart Jacobi approach that improves convergence and simplifies parameter computations. The proposed architecture is a fixed size, 2-layer Jacobi iteration array that is matched to all sub-problems of the main problem: 2 QR-factorizations, 2 SVD's and a single GSD-problem. The arithmetic used is (pipelined) Cordic.

  4. Stochastic Parallel PARticle Kinetic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2008-07-01

    SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore » simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less

  5. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  6. Parallel processing of a rotating shaft simulation

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.

    1989-01-01

    A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

  7. The Xyce Parallel Electronic Simulator - An Overview

    SciTech Connect

    HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.

    2000-12-08

    The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.

  8. Parallel and Distributed System Simulation

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack

    1998-01-01

    This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.

  9. Xyce parallel electronic simulator release notes.

    SciTech Connect

    Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  10. Theory and practice of parallel direct optimization.

    PubMed

    Janies, Daniel A; Wheeler, Ward C

    2002-01-01

    Our ability to collect and distribute genomic and other biological data is growing at a staggering rate (Pagel, 1999). However, the synthesis of these data into knowledge of evolution is incomplete. Phylogenetic systematics provides a unifying intellectual approach to understanding evolution but presents formidable computational challenges. A fundamental goal of systematics, the generation of evolutionary trees, is typically approached as two distinct NP-complete problems: multiple sequence alignment and phylogenetic tree search. The number of cells in a multiple alignment matrix are exponentially related to sequence length. In addition, the number of evolutionary trees expands combinatorially with respect to the number of organisms or sequences to be examined. Biologically interesting datasets are currently comprised of hundreds of taxa and thousands of nucleotides and morphological characters. This standard will continue to grow with the advent of highly automated sequencing and development of character databases. Three areas of innovation are changing how evolutionary computation can be addressed: (1) novel concepts for determination of sequence homology, (2) heuristics and shortcuts in tree-search algorithms, and (3) parallel computing. In this paper and the online software documentation we describe the basic usage of parallel direct optimization as implemented in the software POY (ftp://ftp.amnh.org/pub/molecular/poy). PMID:11924490

  11. On extending parallelism to serial simulators

    NASA Technical Reports Server (NTRS)

    Nicol, David; Heidelberger, Philip

    1994-01-01

    This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.

  12. Parallel Performance of a Combustion Chemistry Simulation

    DOE PAGESBeta

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  13. An adaptable parallel algorithm for the direct numerical simulation of incompressible turbulent flows using a Fourier spectral/hp element method and MPI virtual topologies

    NASA Astrophysics Data System (ADS)

    Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.

    2016-09-01

    A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.

  14. Parallel Simulation of Unsteady Turbulent Flames

    NASA Technical Reports Server (NTRS)

    Menon, Suresh

    1996-01-01

    Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics

  15. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1997-01-01

    Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be

  16. Parallel algorithm strategies for circuit simulation.

    SciTech Connect

    Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

    2010-01-01

    Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

  17. Inflated speedups in parallel simulations via malloc()

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

  18. Xyce parallel electronic simulator : reference guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  19. Parallel Implicit Kinetic Simulation with PARSEK

    NASA Astrophysics Data System (ADS)

    Stefano, Markidis; Giovanni, Lapenta

    2004-11-01

    Kinetic plasma simulation is the ultimate tool for plasma analysis. One of the prime tools for kinetic simulation is the particle in cell (PIC) method. The explicit or semi-implicit (i.e. implicit only on the fields) PIC method requires exceedingly small time steps and grid spacing, limited by the necessity to resolve the electron plasma frequency, the Debye length and the speed of light (for fully explicit schemes). A different approach is to consider fully implicit PIC methods where both particles and fields are discretized implicitly. This approach allows radically larger time steps and grid spacing, reducing the cost of a simulation by orders of magnitude while keeping the full kinetic treatment. In our previous work, simulations impossible for the explicit PIC method even on massively parallel computers have been made possible on a single processor machine using the implicit PIC code CELESTE3D [1]. We propose here another quantum leap: PARSEK, a parallel cousin of CELESTE3D, based on the same approach but sporting a radically redesigned software architecture (object oriented C++, where CELESTE3D was structured and written in FORTRAN77/90) and fully parallelized using MPI for both particle and grid communication. [1] G. Lapenta, J.U. Brackbill, W.S. Daughton, Phys. Plasmas, 10, 1577 (2003).

  20. Parallel node placement method by bubble simulation

    NASA Astrophysics Data System (ADS)

    Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

    2014-03-01

    An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

  1. Parallel-distributed mobile robot simulator

    NASA Astrophysics Data System (ADS)

    Okada, Hiroyuki; Sekiguchi, Minoru; Watanabe, Nobuo

    1996-06-01

    The aim of this project is to achieve an autonomous learning and growth function based on active interaction with the real world. It should also be able to autonomically acquire knowledge about the context in which jobs take place, and how the jobs are executed. This article describes a parallel distributed movable robot system simulator with an autonomous learning and growth function. The autonomous learning and growth function which we are proposing is characterized by its ability to learn and grow through interaction with the real world. When the movable robot interacts with the real world, the system compares the virtual environment simulation with the interaction result in the real world. The system then improves the virtual environment to match the real-world result more closely. This the system learns and grows. It is very important that such a simulation is time- realistic. The parallel distributed movable robot simulator was developed to simulate the space of a movable robot system with an autonomous learning and growth function. The simulator constructs a virtual space faithful to the real world and also integrates the interfaces between the user, the actual movable robot and the virtual movable robot. Using an ultrafast CG (computer graphics) system (FUJITSU AG series), time-realistic 3D CG is displayed.

  2. Parallelism extraction and program restructuring for parallel simulation of digital systems

    SciTech Connect

    Vellandi, B.L.

    1990-01-01

    Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.

  3. Parallel Strategies for Crash and Impact Simulations

    SciTech Connect

    Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

    1998-12-07

    We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

  4. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  5. Parallel Computing Environments and Methods for Power Distribution System Simulation

    SciTech Connect

    Lu, Ning; Taylor, Zachary T.; Chassin, David P.; Guttromson, Ross T.; Studham, Scott S.

    2005-11-10

    The development of cost-effective high-performance parallel computing on multi-processor super computers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution system simulator (PDSS) takes a bottom-up approach and simulates load at appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its run time increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator (PDSS) to a 128-processor shared-memory super computer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performances of the PDSS running on a standalone PC and on the super computer are compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.

  6. Design and implementation of a massively parallel version of DIRECT

    SciTech Connect

    He, J.; Verstak, A.; Watson, L.; Sosonkina, M.

    2007-10-24

    This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.

  7. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  8. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  9. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  10. A polymorphic reconfigurable emulator for parallel simulation

    NASA Technical Reports Server (NTRS)

    Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

    1980-01-01

    Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

  11. Parallel multiscale simulations of a brain aneurysm

    SciTech Connect

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in

  12. Parallel multiscale simulations of a brain aneurysm

    NASA Astrophysics Data System (ADS)

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  13. Parallel multiscale simulations of a brain aneurysm.

    PubMed

    Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  14. A parallel algorithm for implicit depletant simulations

    NASA Astrophysics Data System (ADS)

    Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.

    2015-11-01

    We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ϕc < 0.50, and additionally enables simulation of the fluid-solid transition.

  15. A scalable parallel black oil simulator on distributed memory parallel computers

    NASA Astrophysics Data System (ADS)

    Wang, Kun; Liu, Hui; Chen, Zhangxin

    2015-11-01

    This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

  16. Parallel magnetic field perturbations in gyrokinetic simulations

    SciTech Connect

    Joiner, N.; Hirose, A.; Dorland, W.

    2010-07-15

    At low beta it is common to neglect parallel magnetic field perturbations on the basis that they are of order beta{sup 2}. This is only true if effects of order beta are canceled by a term in the nablaB drift also of order beta[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B{sub ||} is required for accurate gyrokinetic simulations at low beta for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B{sub ||} are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with eta{sub e}. The effect of B{sub ||} on the ETG instability is shown to depend on the normalized pressure gradient beta{sup '} at constant beta.

  17. Parallel Simulation of Explosion in AN Unlimited Atmosphere

    NASA Astrophysics Data System (ADS)

    Ma, Tianbao; Wang, Cheng; Fei, Guanglei; Ning, Jianguo

    In this paper, a parallel Eulerian hydrocode for the simulation of large scale complicated explosion and impact problem is developed. The data dependency in the parallel algorithm is studied in particular. As a test, the three dimensional numerical simulation of the explosion field in an unlimited atmosphere is performed. The numerical results are in good agreement with the empirical results, indicating that the proposed parallel algorithm in this paper is valid. Finally, the parallel speedup and parallel efficiency under different dividing domain areas are analyzed.

  18. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  19. Parallel/distributed direct method for solving linear systems

    NASA Technical Reports Server (NTRS)

    Lin, Avi

    1990-01-01

    A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.

  20. Improving the performance of molecular dynamics simulations on parallel clusters.

    PubMed

    Borstnik, Urban; Hodoscek, Milan; Janezic, Dusanka

    2004-01-01

    In this article a procedure is derived to obtain a performance gain for molecular dynamics (MD) simulations on existing parallel clusters. Parallel clusters use a wide array of interconnection technologies to connect multiple processors together, often at different speeds, such as multiple processor computers and networking. It is demonstrated how to configure existing programs for MD simulations to efficiently handle collective communication on parallel clusters with processor interconnections of different speeds. PMID:15032512

  1. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  2. Parallel methods for dynamic simulation of multiple manipulator systems

    NASA Technical Reports Server (NTRS)

    Mcmillan, Scott; Sadayappan, P.; Orin, David E.

    1993-01-01

    In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

  3. High Performance Parallel Methods for Space Weather Simulations

    NASA Technical Reports Server (NTRS)

    Hunter, Paul (Technical Monitor); Gombosi, Tamas I.

    2003-01-01

    This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.

  4. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1998-01-01

    We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10

  5. Series-parallel method of direct solar array regulation

    NASA Technical Reports Server (NTRS)

    Gooder, S. T.

    1976-01-01

    A 40 watt experimental solar array was directly regulated by shorting out appropriate combinations of series and parallel segments of a solar array. Regulation switches were employed to control the array at various set-point voltages between 25 and 40 volts. Regulation to within + or - 0.5 volt was obtained over a range of solar array temperatures and illumination levels as an active load was varied from open circuit to maximum available power. A fourfold reduction in regulation switch power dissipation was achieved with series-parallel regulation as compared to the usual series-only switching for direct solar array regulation.

  6. Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2008-01-01

    Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.

  7. Electromagnetic direct implicit PIC simulation

    SciTech Connect

    Langdon, A.B.

    1983-03-29

    Interesting modelling of intense electron flow has been done with implicit particle-in-cell simulation codes. In this report, the direct implicit PIC simulation approach is applied to simulations that include full electromagnetic fields. The resulting algorithm offers advantages relative to moment implicit electromagnetic algorithms and may help in our quest for robust and simpler implicit codes.

  8. Theory and simulation of collisionless parallel shocks

    NASA Technical Reports Server (NTRS)

    Quest, K. B.

    1988-01-01

    This paper presents a self-consistent theoretical model for collisionless parallel shock structure, based on the hypothesis that shock dissipation and heating can be provided by electromagnetic ion beam-driven instabilities. It is shown that shock formation and plasma heating can result from parallel propagating electromagnetic ion beam-driven instabilities for a wide range of Mach numbers and upstream plasma conditions. The theoretical predictions are compared with recently published observations of quasi-parallel interplanetary shocks. It was found that low Mach number interplanetary shock observations were consistent with the explanation that group-standing waves are providing the dissipation; two high Mach number observations confirmed the theoretically predicted rapid thermalization across the shock.

  9. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  10. Parallel discrete-event simulation of FCFS stochastic queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.

  11. Xyce parallel electronic simulator : users' guide. Version 5.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical

  12. Xyce Parallel Electronic Simulator : users' guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical

  13. A conservative approach to parallelizing the Sharks World simulation

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Riffe, Scott E.

    1990-01-01

    Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.

  14. Iterative Schemes for Time Parallelization with Application to Reservoir Simulation

    SciTech Connect

    Garrido, I; Fladmark, G E; Espedal, M S; Lee, B

    2005-04-18

    Parallel methods are usually not applied to the time domain because of the inherit sequentialness of time evolution. But for many evolutionary problems, computer simulation can benefit substantially from time parallelization methods. In this paper, they present several such algorithms that actually exploit the sequential nature of time evolution through a predictor-corrector procedure. This sequentialness ensures convergence of a parallel predictor-corrector scheme within a fixed number of iterations. The performance of these novel algorithms, which are derived from the classical alternating Schwarz method, are illustrated through several numerical examples using the reservoir simulator Athena.

  15. n-body simulations using message passing parallel computers.

    NASA Astrophysics Data System (ADS)

    Grama, A. Y.; Kumar, V.; Sameh, A.

    The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.

  16. Running Parallel Discrete Event Simulators on Sierra

    SciTech Connect

    Barnes, P. D.; Jefferson, D. R.

    2015-12-03

    In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.

  17. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  18. Traffic simulations on parallel computers using domain decomposition techniques

    SciTech Connect

    Hanebutte, U.R.; Tentner, A.M.

    1995-12-31

    Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.

  19. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  20. A CUDA based parallel multi-phase oil reservoir simulator

    NASA Astrophysics Data System (ADS)

    Zaza, Ayham; Awotunde, Abeeb A.; Fairag, Faisal A.; Al-Mouhamed, Mayez A.

    2016-09-01

    Forward Reservoir Simulation (FRS) is a challenging process that models fluid flow and mass transfer in porous media to draw conclusions about the behavior of certain flow variables and well responses. Besides the operational cost associated with matrix assembly, FRS repeatedly solves huge and computationally expensive sparse, ill-conditioned and unsymmetrical linear system. Moreover, as the computation for practical reservoir dimensions lasts for long times, speeding up the process by taking advantage of parallel platforms is indispensable. By considering the state of art advances in massively parallel computing and the accompanying parallel architecture, this work aims primarily at developing a CUDA-based parallel simulator for oil reservoir. In addition to the initial reported 33 times speed gain compared to the serial version, running experiments showed that BiCGSTAB is a stable and fast solver which could be incorporated in such simulations instead of the more expensive, storage demanding and usually utilized GMRES.

  1. Parallel finite element simulation of large ram-air parachutes

    NASA Astrophysics Data System (ADS)

    Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.

    1997-06-01

    In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.

  2. Direct Simulation Monte Carlo (DSMC) on the Connection Machine

    SciTech Connect

    Wong, B.C.; Long, L.N. )

    1992-01-01

    The massively parallel computer Connection Machine is utilized to map an improved version of the direct simulation Monte Carlo (DSMC) method for solving flows with the Boltzmann equation. The kinetic theory is required for analyzing hypersonic aerospace applications, and the features and capabilities of the DSMC particle-simulation technique are discussed. The DSMC is shown to be inherently massively parallel and data parallel, and the algorithm is based on molecule movements, cross-referencing their locations, locating collisions within cells, and sampling macroscopic quantities in each cell. The serial DSMC code is compared to the present parallel DSMC code, and timing results show that the speedup of the parallel version is approximately linear. The correct physics can be resolved from the results of the complete DSMC method implemented on the connection machine using the data-parallel approach. 41 refs.

  3. Improved task scheduling for parallel simulations. Master's thesis

    SciTech Connect

    McNear, A.E.

    1991-12-01

    The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.

  4. Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

    SciTech Connect

    HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

    2002-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models

  5. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  6. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A; Mamidala, Amith R

    2014-02-11

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  7. Panel on future directions in parallel computer architecture

    SciTech Connect

    VanTilborg, A.M. )

    1989-06-01

    One of the program highlights of the 15th Annual International Symposium on Computer Architecture, held May 30 - June 2, 1988 in Honolulu, was a panel session on future directions in parallel computer architecture. The panel was organized and chaired by the author, and was comprised of Prof. Jack Dennis (NASA Ames Research Institute for Advanced Computer Science), Prof. H.T. Kung (Carnegie Mellon), and Dr. Burton Smith (Tera Computer Company). The objective of the panel was to identify the likely trajectory of future parallel computer system progress, particularly from the sandpoint of marketplace acceptance. Approximately 250 attendees participated in the session, in which each panelist began with a ten minute viewgraph explanation of his views, followed by an open and sometimes lively exchange with the audience and fellow panelists. The session ran for ninety minutes.

  8. Parallel Optimization with Large Eddy Simulations

    NASA Astrophysics Data System (ADS)

    Talnikar, Chaitanya; Blonigan, Patrick; Bodart, Julien; Wang, Qiqi; Alex Gorodetsky Collaboration; Jasper Snoek Collaboration

    2014-11-01

    For design optimization results to be useful, the model used must be trustworthy. For turbulent flows, Large Eddy Simulations (LES) can capture separation and other phenomena that traditional models such as RANS struggle with. However, optimization with LES can be challenging because of noisy objective function evaluations. This noise is a consequence of the sampling error of turbulent statistics, or long time averaged quantities of interest, such as the drag of an airfoil or heat transfer to a turbine blade. The sampling error causes the objective function to vary noisily with respect to design parameters for finite time simulations. Furthermore, the noise decays very slowly as computational time increases. Therefore, robustness with noisy objective functions is a crucial prerequisite to optimization candidates for LES. One way of dealing with noisy objective functions is to filter the noise using a surrogate model. Bayesian optimization, which uses Gaussian processes as surrogates, has shown promise in optimizing expensive objective functions. The following talk presents a new approach for optimization with LES incorporating these ideas. Applications to flow control of a turbulent channel and the design of a turbine blade trailing edge are also discussed.

  9. Applying Parallel Processing Techniques to Tether Dynamics Simulation

    NASA Technical Reports Server (NTRS)

    Wells, B. Earl

    1996-01-01

    The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

  10. Parallel Simulation of Underdense Plasma Photocathode Experiments

    NASA Astrophysics Data System (ADS)

    Bruhwiler, David; Hidding, Bernhard; Xi, Yunfeng; Andonian, Gerard; Rosenzweig, James; Cormier-Michel, Estelle

    2013-10-01

    The underdense plasma photocathode concept (aka Trojan horse) is a promising approach to achieving fs-scale electron bunches with pC-scale charge and transverse normalized emittance below 0.01 mm-mrad, yielding peak currents of order 100 A and beam brightness as high as 1019 A /m2 / rad2 , for a wide range of achievable beam energies up to 10 GeV. A proof-of-principle experiment will be conducted at the FACET user facility in early 2014. We present 2D and 3D simulations with physical parameters relevant to the planned experiment. Work supported by DOE under Contract Nos. DE-SC0009533, DE-FG02-07ER46272 and DEFG03-92ER40693, and by ONR under Contract No. N00014-06-1-0925. NERSC computing resources are supported by DOE.

  11. Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

    SciTech Connect

    Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

    2007-01-01

    An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.

  12. Parallelizing N-Body Simulations on a Heterogeneous Cluster

    NASA Astrophysics Data System (ADS)

    Stenborg, T. N.

    2009-10-01

    This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.

  13. Massively parallel switch-level simulation: A feasibility study

    SciTech Connect

    Kravitz, S.A.

    1989-01-01

    This thesis addresses the feasibility of mapping the COSMOS switch-level simulator onto computers with thousands of simple processors. COSMOS Preprocesses transistor networks into equivalent Boolean behavioral models, capturing the switch-level behavior of a circuit in a set of Boolean formulas. The author shows that thousand-fold parallelism exists in the formulas derived by COSMOS for some actual circuits. He exposes this parallelism by eliminating the event list from the simulator, and he demonstrates that this represents an attractive tradeoff given sufficient parallelism in the circuit model. To investigate the feasibility of this approach, he has developed a prototype implementation of the COSMOS simulator on a 32k processor Connection Machine.

  14. Direct drive digital servo press with high parallel control

    NASA Astrophysics Data System (ADS)

    Murata, Chikara; Yabe, Jun; Endou, Junichi; Hasegawa, Kiyoshi

    2013-12-01

    Direct drive digital servo press has been developed as the university-industry joint research and development since 1998. On the basis of this result, 4-axes direct drive digital servo press has been developed and in the market on April of 2002. This servo press is composed of 1 slide supported by 4 ball screws and each axis has linearscale measuring the position of each axis with high accuracy less than μm order level. Each axis is controlled independently by servo motor and feedback system. This system can keep high level parallelism and high accuracy even with high eccentric load. Furthermore the 'full stroke full power' is obtained by using ball screws. Using these features, new various types of press forming and stamping have been obtained by development and production. The new stamping and forming methods are introduced and 'manufacturing' need strategy of press forming with high added value and also the future direction of press forming are also introduced.

  15. Xyce Parallel Electronic Simulator : users' guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce

  16. A hybrid parallel framework for the cellular Potts model simulations

    SciTech Connect

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  17. Parallel runway requirement analysis study. Volume 2: Simulation manual

    NASA Technical Reports Server (NTRS)

    Ebrahimi, Yaghoob S.; Chun, Ken S.

    1993-01-01

    This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.

  18. Parallelization of a Monte Carlo particle transport simulation code

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  19. Efficient parallel CFD-DEM simulations using OpenMP

    NASA Astrophysics Data System (ADS)

    Amritkar, Amit; Deb, Surya; Tafti, Danesh

    2014-01-01

    The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.

  20. Parallel PDE-Based Simulations Using the Common Component Architecture

    SciTech Connect

    McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

    2006-03-05

    Summary. The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of componentbased software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and generalpurpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications.

  1. Xyce Parallel Electronic Simulator : reference guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  2. Xyce parallel electronic simulator reference guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  3. Xyce Parallel Electronic Simulator : reference guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  4. Xyce parallel electronic simulator reference guide, version 6.0.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.

    2013-08-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].

  5. An adaptive synchronization protocol for parallel discrete event simulation

    SciTech Connect

    Bisset, K.R.

    1998-12-01

    Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.

  6. Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

    NASA Technical Reports Server (NTRS)

    Baganoff, Donald

    1996-01-01

    The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

  7. Molecular simulation of rheological properties using massively parallel supercomputers

    SciTech Connect

    Bhupathiraju, R.K.; Cui, S.T.; Gupta, S.A.; Cummings, P.T.; Cochran, H.D.

    1996-11-01

    Advances in parallel supercomputing now make possible molecular-based engineering and science calculations that will soon revolutionize many technologies, such as those involving polymers and those involving aqueous electrolytes. We have developed a suite of message-passing codes for classical molecular simulation of such complex fluids and amorphous materials and have completed a number of demonstration calculations of problems of scientific and technological importance with each. In this paper, we will focus on the molecular simulation of rheological properties, particularly viscosity, of simple and complex fluids using parallel implementations of non-equilibrium molecular dynamics. Such calculations represent significant challenges computationally because, in order to reduce the thermal noise in the calculated properties within acceptable limits, large systems and/or long simulated times are required.

  8. PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

    SciTech Connect

    Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A; Popov, Emilian L

    2012-01-01

    At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.

  9. Parallelization of Program to Optimize Simulated Trajectories (POST3D)

    NASA Technical Reports Server (NTRS)

    Hammond, Dana P.; Korte, John J. (Technical Monitor)

    2001-01-01

    This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

  10. Numerical simulation of polymer flows: A parallel computing approach

    SciTech Connect

    Aggarwal, R.; Keunings, R.; Roux, F.X.

    1993-12-31

    We present a parallel algorithm for the numerical simulation of viscoelastic fluids on distributed memory computers. The algorithm has been implemented within a general-purpose commercial finite element package used in polymer processing applications. Results obtained on the Intel iPSC/860 computer demonstrate high parallel efficiency in complex flow problems. However, since the computational load is unknown a priori, load balancing is a challenging issue. We have developed an adaptive allocation strategy which dynamically reallocates the work load to the processors based upon the history of the computational procedure. We compare the results obtained with the adaptive and static scheduling schemes.

  11. Reusable Component Model Development Approach for Parallel and Distributed Simulation

    PubMed Central

    Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

    2014-01-01

    Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

  12. Xyce Parallel Electronic Simulator Users Guide Version 6.2.

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-09-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  13. Multi-directional search: A direct search algorithm for parallel machines

    SciTech Connect

    Torczon, V.J.

    1989-01-01

    In recent years there has been a great deal in the development of optimization algorithms which exploit the computational power of parallel computer architectures. The author has developed a new direct search algorithm, which he calls multi-directional search, that is ideally suited for parallel computation. His algorithm belongs to the class of direct search methods, a class of optimization algorithms which neither compute nor approximate any derivatives of the objective function. His work, in fact, was inspired by the simplex method of Spendley, Hext, and Himsworth, and the simplex method of Nelder and Mead. The multi-directional search algorithm is inherently parallel. The basic idea of the algorithm is to perform concurrent searches in multiple directions. These searches are free of any interdependencies, so the information required can be computed in parallel. A central result of his work is the convergence analysis for his algorithm. By requiring only that the function be continuously differentiable over a bounded level set, he can prove that a subsequence of the points generated by the multi-directional search algorithm converges to a stationary point of the objective function. This is of great interest since he knows of few convergence results for practical direct search algorithms. He also presents numerical results indicating that the multidirectional search algorithm is robust, even in the presence of noise. His results include comparisons with the Nelder-Mead simplex algorithm, the method of steepest descent, and a quasi-Newton method. One surprising conclusion of his numerical tests is that the Nelder-Mead simplex algorithm is not robust. He closes with some comments about future directions of research.

  14. Parallelizing a DNA simulation code for the Cray MTA-2.

    PubMed

    Bokhari, Shahid H; Glaser, Matthew A; Jordan, Harry F; Lansac, Yves; Sauer, Jon R; Van Zeghbroeck, Bart

    2002-01-01

    The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by Field Effect Transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers. PMID:15838145

  15. Model for the evolution of the time profile in optimistic parallel discrete event simulations

    NASA Astrophysics Data System (ADS)

    Ziganurova, L.; Novotny, M. A.; Shchur, L. N.

    2016-02-01

    We investigate synchronisation aspects of an optimistic algorithm for parallel discrete event simulations (PDES). We present a model for the time evolution in optimistic PDES. This model evaluates the local virtual time profile of the processing elements. We argue that the evolution of the time profile is reminiscent of the surface profile in the directed percolation problem and in unrestricted surface growth. We present results of the simulation of the model and emphasise predictive features of our approach.

  16. Casting Pearls Ballistically: Efficient Massively Parallel Simulation of Particle Deposition

    NASA Astrophysics Data System (ADS)

    Lubachevsky, Boris D.; Privman, Vladimir; Roy, Subhas C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are "cast" vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation.

  17. Casting pearls ballistically: Efficient massively parallel simulation of particle deposition

    SciTech Connect

    Lubachevsky, B.D.; Privman, V.; Roy, S.C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.

  18. The cost of conservative synchronization in parallel discrete event simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.

  19. Numerical simulation of supersonic wake flow with parallel computers

    SciTech Connect

    Wong, C.C.; Soetrisno, M.

    1995-07-01

    Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.

  20. Modularized Parallel Neutron Instrument Simulation on the TeraGrid

    SciTech Connect

    Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E

    2007-01-01

    In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.

  1. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  2. Synchronous parallel system for emulation and discrete event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor)

    1992-01-01

    A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

  3. Direct simulation of compressible turbulence

    NASA Technical Reports Server (NTRS)

    Zang, T. A.; Erlebacher, Gordon; Hussaini, M. Y.

    1989-01-01

    Several direct simulations of 3-D homogeneous, compressible turbulence are presented with emphasis on the differences with incompressible turbulent simulations. A fully spectral collocation algorithm, periodic in all directions coupled with a 3rd order Runge-Kutta time discretization scheme is sufficient to produce well-resolved flows at Taylor Reynolds numbers below 40 on grids of 128x128x128. A Helmholtz decomposition of velocity is useful to differentiate between the purely compressible effects and those effects solely due to vorticity production. In the context of homogeneous flows, this decomposition in unique. Time-dependent energy and dissipation spectra of the compressible and solenoidal velocity components indicate the presence of localized small scale structures. These structures are strongly a function of the initial conditions. Researchers concentrate on a regime characterized by very small fluctuating Mach numbers Ma (on the order of 0.03) and density and temperature fluctuations much greater than sq Ma. This leads to a state in which more than 70 percent of the kinetic energy is contained in the so-called compressible component of the velocity. Furthermore, these conditions lead to the formation of curved weak shocks (or shocklets) which travel at approximately the sound speed across the physical domain. Various terms in the vorticity and divergence of velocity production equations are plotted versus time to gain some understanding of how small scales are actually formed. Possible links with Burger turbulence are examined. To visualize better the dynamics of the flow, new graphic visualization techniques have been developed. The 3-D structure of the shocks are visualized with the help of volume rendering algorithms developed in-house. A combination of stereographic projection and animation greatly increase the number of visual cues necessary to properly interpret the complex flow.

  4. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  5. Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

    SciTech Connect

    Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

    2005-06-01

    This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to

  6. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  7. Evaluation of parallel direct sparse linear solvers in electromagnetic geophysical problems

    NASA Astrophysics Data System (ADS)

    Puzyrev, Vladimir; Koric, Seid; Wilkin, Scott

    2016-04-01

    High performance computing is absolutely necessary for large-scale geophysical simulations. In order to obtain a realistic image of a geologically complex area, industrial surveys collect vast amounts of data making the computational cost extremely high for the subsequent simulations. A major computational bottleneck of modeling and inversion algorithms is solving the large sparse systems of linear ill-conditioned equations in complex domains with multiple right hand sides. Recently, parallel direct solvers have been successfully applied to multi-source seismic and electromagnetic problems. These methods are robust and exhibit good performance, but often require large amounts of memory and have limited scalability. In this paper, we evaluate modern direct solvers on large-scale modeling examples that previously were considered unachievable with these methods. Performance and scalability tests utilizing up to 65,536 cores on the Blue Waters supercomputer clearly illustrate the robustness, efficiency and competitiveness of direct solvers compared to iterative techniques. Wide use of direct methods utilizing modern parallel architectures will allow modeling tools to accurately support multi-source surveys and 3D data acquisition geometries, thus promoting a more efficient use of the electromagnetic methods in geophysics.

  8. Massively Parallel Processing for Fast and Accurate Stamping Simulations

    NASA Astrophysics Data System (ADS)

    Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

    2005-08-01

    The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

  9. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    NASA Technical Reports Server (NTRS)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  10. Relationship between parallel faults and stress field in rock mass based on numerical simulation

    NASA Astrophysics Data System (ADS)

    Imai, Y.; Mikada, H.; Goto, T.; Takekawa, J.

    2012-12-01

    Parallel cracks and faults, caused by earthquakes and crustal deformations, are often observed in various scales from regional to laboratory scales. However, the mechanism of formation of these parallel faults has not been quantitatively clarified, yet. Since the stress field plays a key role to the nucleation of parallel faults, it is fundamentally to investigate the failure and the extension of cracks in a large-scale rock mass (not with a laboratory-scale specimen) due to mechanically loaded stress field. In this study, we developed a numerical simulations code for rock mass failures under different loading conditions, and conducted rock failure experiments using this code. We assumed a numerical rock mass consisting of basalt with a rectangular shape for the model. We also assumed the failure of rock mass in accordance with the Mohr-Coulomb criterion, and the distribution of the initial tensile and compressive strength of rock elements to be the Weibull model. In this study, we use the Hamiltonian Particle Method (HPM), one of the particle methods, to represent large deformation and the destruction of materials. Out simulation results suggest that the confining pressure would have dominant influence for the initiation of parallel faults and their conjugates in compressive conditions. We conclude that the shearing force would provoke the propagation of parallel fractures along the shearing direction, but prevent that of fractures to the conjugate direction.

  11. Mapping a battlefield simulation onto message-passing parallel architectures

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1987-01-01

    Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.

  12. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  13. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David

    1992-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  14. Xyce Parallel Electronic Simulator Users Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  15. Acceleration of hybrid MPI parallel NBODY6++ for large N-body globular cluster simulations

    NASA Astrophysics Data System (ADS)

    Wang, Long; Spurzem, Rainer; Aarseth, Sverre; Nitadori, Keigo; Berczik, Peter; Kouwenhoven, M. B. N.; Naab, Thorsten

    2016-02-01

    Previous research on globular clusters (GCs) dynamics is mostly based on semi-analytic, Fokker-Planck, Monte-Carlo methods and on direct N-body (NB) simulations. These works have great advantages but also limits since GCs are massive and compact and close encounters and binaries play very important roles in their dynamics. The former three methods make approximations and assumptions, while expensive computing time and number of stars limit the latter method. The current largest direct NB simulation has ~ 500k stars (Heggie 2014). Here, we accelerate the direct NB code NBODY6++ (which extends NBODY6 to supercomputers by using MPI) with new parallel computing technologies (GPU, OpenMP + SSE/AVX). Our aim is to handle large N (up to 106) direct NB simulations to obtain better understanding of the dynamical evolution of GCs.

  16. Numerical Simulation of Flow Field Within Parallel Plate Plastometer

    NASA Technical Reports Server (NTRS)

    Antar, Basil N.

    2002-01-01

    Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.

  17. Massively parallel simulations of multiphase flows using Lattice Boltzmann methods

    NASA Astrophysics Data System (ADS)

    Ahrenholz, Benjamin

    2010-03-01

    In the last two decades the lattice Boltzmann method (LBM) has matured as an alternative and efficient numerical scheme for the simulation of fluid flows and transport problems. Unlike conventional numerical schemes based on discretizations of macroscopic continuum equations, the LBM is based on microscopic models and mesoscopic kinetic equations. The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics of microscopic or mesoscopic processes so that the macroscopic averaged properties obey the desired macroscopic equations. Especially applications involving interfacial dynamics, complex and/or changing boundaries and complicated constitutive relationships which can be derived from a microscopic picture are suitable for the LBM. In this talk a modified and optimized version of a Gunstensen color model is presented to describe the dynamics of the fluid/fluid interface where the flow field is based on a multi-relaxation-time model. Based on that modeling approach validation studies of contact line motion are shown. Due to the fact that the LB method generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallelization. Hence, it is possible to perform efficient simulations in complex geometries at a large scale by massively parallel computations. Here, the results of drainage and imbibition (Degree of Freedom > 2E11) in natural porous media gained from microtomography methods are presented. Those fully resolved pore scale simulations are essential for a better understanding of the physical processes in porous media and therefore important for the determination of constitutive relationships.

  18. Task parallel sensitivity analysis and parameter estimation of groundwater simulations through the SALSSA framework

    SciTech Connect

    Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

    2010-07-15

    The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

  19. New Directions in Maintenance Simulation.

    ERIC Educational Resources Information Center

    Miller, Gary G.

    A two-phase effort was conducted to design and evaluate a maintenance simulator which incorporated state-of-the-art information in simulation and instructional technology. The particular equipment selected to be simulated was the 6883 Convert/Flight Controls Test Station. Phase I included a generalized block diagram of the computer-trainer, the…

  20. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  1. Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

    NASA Technical Reports Server (NTRS)

    Kiris, Cetin C.; Kwak, Dochan; Chan, William

    2000-01-01

    This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.

  2. A Generic Scheduling Simulator for High Performance Parallel Computers

    SciTech Connect

    Yoo, B S; Choi, G S; Jette, M A

    2001-08-01

    It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of

  3. Parallel 3D Multi-Stage Simulation of a Turbofan Engine

    NASA Technical Reports Server (NTRS)

    Turner, Mark G.; Topp, David A.

    1998-01-01

    A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force

  4. Large-scale numerical simulation of laser propulsion by parallel computing

    NASA Astrophysics Data System (ADS)

    Zeng, Yaoyuan; Zhao, Wentao; Wang, Zhenghua

    2013-05-01

    As one of the most significant methods to study laser propelled rocket, the numerical simulation of laser propulsion has drawn an ever increasing attention at present. Nevertheless, the traditional serial simulation model cannot satisfy the practical needs because of insatiable memory overhead and considerable computation time. In order to solve this problem, we study on a general algorithm for laser propulsion design, and bring about parallelization by using a twolevel hybrid parallel programming model. The total computing domain is decomposed into distributed data spaces, and each partition is assigned to a MPI process. A single step of computation operates in the inter loop level, where a compiler directive is used to split MPI process into several OpenMP threads. Finally, parallel efficiency of hybrid program about two typical configurations on a China-made supercomputer with 4 to 256 cores is compared with pure MPI program. And, the hybrid program exhibits better performance than the pure MPI program on the whole, roughly as expected. The result indicates that our hybrid parallel approach is effective and practical in large-scale numerical simulation of laser propulsion.

  5. Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

    SciTech Connect

    Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.

    1997-11-01

    An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

  6. Roadmap for efficient parallelization of breast anatomy simulation

    NASA Astrophysics Data System (ADS)

    Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.

    2012-03-01

    A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.

  7. Parallel Molecular Dynamics Stencil : a new parallel computing environment for a large-scale molecular dynamics simulation of solids

    NASA Astrophysics Data System (ADS)

    Shimizu, Futoshi; Kimizuka, Hajime; Kaburaki, Hideo

    2002-08-01

    A new parallel computing environment, called as ``Parallel Molecular Dynamics Stencil'', has been developed to carry out a large-scale short-range molecular dynamics simulation of solids. The stencil is written in C language using MPI for parallelization and designed successfully to separate and conceal parts of the programs describing cutoff schemes and parallel algorithms for data communication. This has been made possible by introducing the concept of image atoms. Therefore, only a sequential programming of the force calculation routine is required for executing the stencil in parallel environment. Typical molecular dynamics routines, such as various ensembles, time integration methods, and empirical potentials, have been implemented in the stencil. In the presentation, the performance of the stencil on parallel computers of Hitachi, IBM, SGI, and PC-cluster using the models of Lennard-Jones and the EAM type potentials for fracture problem will be reported.

  8. Parallel grid library for rapid and flexible simulation development

    NASA Astrophysics Data System (ADS)

    Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

    2013-04-01

    We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and

  9. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  10. A new parallel P3M code for very large-scale cosmological simulations

    NASA Astrophysics Data System (ADS)

    MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

    1998-12-01

    We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

  11. MPSim: A Massively Parallel General Simulation Program for Materials

    NASA Astrophysics Data System (ADS)

    Iotov, Mihail; Gao, Guanghua; Vaidehi, Nagarajan; Cagin, Tahir; Goddard, William A., III

    1997-08-01

    In this talk, we describe a general purpose Massively Parallel Simulation (MPSim) program used for computational materials science and life sciences. We also will present scaling aspects of the program along with several case studies. The program incorporates highly efficient CMM method to accurately calculate the interactions. For studying bulk materials, the program uses the Reduced CMM to account for infinite range sums. The software embodies various advanced molecular dynamics algorithms, energy and structure optimization techniques with a set of analysis tools suitable for large scale structures. The applications using the program range amorphous polymers, liquid-polymer interfaces, large viruses, million atom clusters, surfaces, gas diffusion in polymers. Program is originally developed on KSR in an object oriented fashion and is ported to SGI-PC, and HP-Examplar. Message Passing version is originally implemented on Intel Paragon using NX, then MPI and later tested on Cray T3D, and IBM SP2 platforms.

  12. A 3D parallel simulator for crystal growth and solidification in complex alloy systems

    NASA Astrophysics Data System (ADS)

    Nestler, Britta

    2005-02-01

    A 3D parallel simulator is developed to numerically solve the evolution equations of a new non-isothermal phase-field model for crystal growth and solidification in complex alloy systems. The new model and the simulator are capable to simultaneously describe the diffusion processes of multiple components, the phase transitions between multiple phases and the development of the temperature field. Weak and facetted formulations of both, surface energy and kinetic anisotropies are incorporated in the phase-field model. Multicomponent bulk diffusion effects including interdiffusion coefficients as well as diffusion in the interfacial region of phase or grain boundaries are considered. We introduce our parallel simulator that is based on a finite difference discretization including effective adaptive strategies and multigrid methods to reduce computation time and memory usage. The parallelization is realized for distributed as well as shared memory computer architectures using MPI libraries and OpenMP concepts. Applying the new computer model, we present a variety of simulated crystal structures such as dendrites, grains, binary and ternary eutectics in 2D and 3D. The influence of anisotropy on the microstructure evolution shows the formation of facets in preferred crystallographic directions. Phase transformations and solidification processes in a real multi-component alloy can be described by incorporating the physical data (e.g. surface tensions, kinetic coefficients, specific heat, heat and mass diffusion coefficients) and the specific phase diagram (in particular latent heats and melting temperatures) into the diffuse interface model via the free energies.

  13. Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash

    NASA Astrophysics Data System (ADS)

    Luo, Y.; Chen, S. Y.; Huang, J.; Xiong, Y. Y.; Tang, C. J.

    2016-04-01

    The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognized as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.

  14. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    SciTech Connect

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-28

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  15. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    PubMed Central

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-01-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887

  16. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    NASA Astrophysics Data System (ADS)

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  17. A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

    NASA Technical Reports Server (NTRS)

    Bui, Trong T.

    1999-01-01

    A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

  18. Investigation of reflective notching with massively parallel simulation

    NASA Astrophysics Data System (ADS)

    Tadros, Karim H.; Neureuther, Andrew R.; Gamelin, John K.; Guerrieri, Roberto

    1990-06-01

    A massively parallel simulation program TEMPEST is used to investigate the role of topography in generating reflective notching and to study the possibility of reducing effects through the introduction of special properties of resists and antireflection coating materials. The emphasis is on examining physical scattering mechanisms such as focused specular reflections resist thickness interference effects reflections from substrate grains and focusing of incident light by the resist curvature. Specular reflection from topography can focus incident radiation causing a 10-fold increase in effective exposure. Further complications such as dimples in the surface of positive resist features can result from a second reflection of focused energy by the resist/air interface. Variations in line-edge exposure due to substrate grain structure are primarily specular in nature and can become significant for grains larger than )tresi Local exposure variations due to vertical standing waves and changes in energy coupling due to changes in resist thickness are displaced laterally and are significant effects even though they are slightly less severe than vertical wave propagation theory suggests. Focusing effects due to refraction by the curved surface of the resist produce only minor changes in exposure. Increased resist contrast and resist absorption offer some improvement in reducing notching effects though minimizing substrate reflectivity is more effective. CPU time using 32 virtual nodes to simulate a 4 pm by 2 pm isolated domain with 13 bleaching steps was 30 minutes

  19. Information processing in parallel through directionally resolved molecular polarization components in coherent multidimensional spectroscopy

    NASA Astrophysics Data System (ADS)

    Yan, Tian-Min; Fresch, Barbara; Levine, R. D.; Remacle, F.

    2015-08-01

    We propose that information processing can be implemented by measuring the directional components of the macroscopic polarization of an ensemble of molecules subject to a sequence of laser pulses. We describe the logic operation theoretically and demonstrate it by simulations. The measurement of integrated stimulated emission in different phase matching spatial directions provides a logic decomposition of a function that is the discrete analog of an integral transform. The logic operation is reversible and all the possible outputs are computed in parallel for all sets of possible multivalued inputs. The number of logic variables of the function is the number of laser pulses used in sequence. The logic function that is computed depends on the chosen chromophoric molecular complex and on its interactions with the solvent and on the two time intervals between the three pulses and the pulse strengths and polarizations. The outputs are the homodyne detected values of the polarization components that are measured in the allowed phase matching macroscopic directions, kl, k l = ∑ i l i k i where ki is the propagation direction of the ith pulse and {li} is a set of integers that encodes the multivalued inputs. Parallelism is inherently implemented because all the partial polarizations that define the outputs are processed simultaneously. The outputs, which are read directly on the macroscopic level, can be multivalued because the high dynamical range of partial polarization measurements by nonlinear coherent spectroscopy allows for fine binning of the signals. The outputs are uniquely related to the inputs so that the logic is reversible.

  20. Accelerating patch-based directional wavelets with multicore parallel computing in compressed sensing MRI.

    PubMed

    Li, Qiyue; Qu, Xiaobo; Liu, Yunsong; Guo, Di; Lai, Zongying; Ye, Jing; Chen, Zhong

    2015-06-01

    Compressed sensing MRI (CS-MRI) is a promising technology to accelerate magnetic resonance imaging. Both improving the image quality and reducing the computation time are important for this technology. Recently, a patch-based directional wavelet (PBDW) has been applied in CS-MRI to improve edge reconstruction. However, this method is time consuming since it involves extensive computations, including geometric direction estimation and numerous iterations of wavelet transform. To accelerate computations of PBDW, we propose a general parallelization of patch-based processing by taking the advantage of multicore processors. Additionally, two pertinent optimizations, excluding smooth patches and pre-arranged insertion sort, that make use of sparsity in MR images are also proposed. Simulation results demonstrate that the acceleration factor with the parallel architecture of PBDW approaches the number of central processing unit cores, and that pertinent optimizations are also effective to make further accelerations. The proposed approaches allow compressed sensing MRI reconstruction to be accomplished within several seconds. PMID:25620521

  1. Numerical simulations of acoustics problems using the direct simulation Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Hanford, Amanda Danforth

    In the current study, real gas effects in the propagation of sound waves are simulated using the direct simulation Monte Carlo method for a wide range of systems. This particle method allows for treatment of acoustic phenomena for a wide range of Knudsen numbers, defined as the ratio of molecular mean free path to wavelength. Continuum models such as the Euler and Navier-Stokes equations break down for flows greater than a Knudsen number of approximately 0.05. Continuum models also suffer from the inability to simultaneously model nonequilibrium conditions, diatomic or polyatomic molecules, nonlinearity and relaxation effects and are limited in their range of validity. Therefore, direct simulation Monte Carlo is capable of directly simulating acoustic waves with a level of detail not possible with continuum approaches. The basis of direct simulation Monte Carlo lies within kinetic theory where representative particles are followed as they move and collide with other particles. A parallel, object-oriented DSMC solver was developed for this problem. Despite excellent parallel efficiency, computation time is considerable. Monatomic gases, gases with internal energy, planetary environments, and amplitude effects spanning a large range of Knudsen number have all been modeled with the same method and compared to existing theory. With the direct simulation method, significant deviations from continuum predictions are observed for high Knudsen number flows.

  2. Simulations of structural and dynamic anisotropy in nano-confined water between parallel graphite plates.

    PubMed

    Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M H; Najafi, Bijan

    2012-11-14

    We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ~10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm(-3), bubble formation and restructuring of the water layers are observed. PMID:23163385

  3. Xyce Parallel Electronic Simulator Reference Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  4. Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

    NASA Astrophysics Data System (ADS)

    Furuichi, M.; Nishiura, D.

    2015-12-01

    Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our

  5. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.

  6. Direct simulation of compressible reacting flows

    NASA Technical Reports Server (NTRS)

    Poinsot, Thierry J.

    1989-01-01

    A research program for direct numerical simulations of compressible reacting flows is described. Two main research subjects are proposed: the effect of pressure waves on turbulent combustion and the use of direct simulation methods to validate flamelet models for turbulent combustion. The interest of a compressible code to study turbulent combustion is emphasized through examples of reacting shear layer and combustion instabilities studies. The choice of experimental data to compare with direct simulation results is discussed. A tentative program is given and the computation cases to use are described as well as the code validation runs.

  7. Rasterizing geological models for parallel finite difference simulation using seismic simulation as an example

    NASA Astrophysics Data System (ADS)

    Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan

    2016-01-01

    3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.

  8. On the suitability of the connection machine for direct particle simulation

    NASA Technical Reports Server (NTRS)

    Dagum, Leonard

    1990-01-01

    The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.

  9. Lattice Boltzmann modeling of directional wetting: comparing simulations to experiments.

    PubMed

    Jansen, H Patrick; Sotthewes, Kai; van Swigchem, Jeroen; Zandvliet, Harold J W; Kooij, E Stefan

    2013-07-01

    Lattice Boltzmann Modeling (LBM) simulations were performed on the dynamic behavior of liquid droplets on chemically striped patterned surfaces, ultimately with the aim to develop a predictive tool enabling reliable design of future experiments. The simulations accurately mimic experimental results, which have shown that water droplets on such surfaces adopt an elongated shape due to anisotropic preferential spreading. Details of the contact line motion such as advancing of the contact line in the direction perpendicular to the stripes exhibit pronounced similarities in experiments and simulations. The opposite of spreading, i.e., evaporation of water droplets, leads to a characteristic receding motion first in the direction parallel to the stripes, while the contact line remains pinned perpendicular to the stripes. Only when the aspect ratio is close to unity, the contact line also starts to recede in the perpendicular direction. Very similar behavior was observed in the LBM simulations. Finally, droplet movement can be induced by a gradient in surface wettability. LBM simulations show good semiquantitative agreement with experimental results of decanol droplets on a well-defined striped gradient, which move from high- to low-contact angle surfaces. Similarities and differences for all systems are described and discussed in terms of the predictive capabilities of LBM simulations to model direction wetting. PMID:23944550

  10. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  11. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-07

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  12. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    DOEpatents

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-14

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  13. A natural partitioning scheme for parallel simulation of multibody systems

    NASA Technical Reports Server (NTRS)

    Chiou, J. C.; Park, K. C.; Farhat, C.

    1993-01-01

    A parallel partitioning scheme based on physical-co-ordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure and a Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

  14. A natural partitioning scheme for parallel simulation of multibody systems

    NASA Technical Reports Server (NTRS)

    Chiou, J. C.; Park, K. C.; Farhat, C.

    1991-01-01

    A parallel partitioning scheme based on physical-coordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure, and Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

  15. A two-level parallel direct search implementation for arbitrarily sized objective functions

    SciTech Connect

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K.

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  16. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

    PubMed Central

    Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

    2013-01-01

    Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358

  17. Characterization of parallel-hole collimator using Monte Carlo Simulation

    PubMed Central

    Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh

    2015-01-01

    Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730

  18. Short-term gas dispersion in idealised urban canopy in street parallel with flow direction

    NASA Astrophysics Data System (ADS)

    Chaloupecká, Hana; Jaňour, Zbyněk; Nosek, Štěpán

    2016-03-01

    Chemical attacks (e.g. Syria 2014-15 chlorine, 2013 sarine or Iraq 2006-7 chlorine) as well as chemical plant disasters (e.g. Spain 2015 nitric oxide, ferric chloride; Texas 2014 methyl mercaptan) threaten mankind. In these crisis situations, gas clouds are released. Dispersion of gas clouds is the issue of interest investigated in this paper. The paper describes wind tunnel experiments of dispersion from ground level point gas source. The source is situated in a model of an idealised urban canopy. The short duration releases of passive contaminant ethane are created by an electromagnetic valve. The gas cloud concentrations are measured in individual places at the height of the human breathing zone within a street parallel with flow direction by Fast-response Ionisation Detector. The simulations of the gas release for each measurement position are repeated many times under the same experimental set up to obtain representative datasets. These datasets are analysed to compute puff characteristics (arrival, leaving time and duration). The results indicate that the mean value of the dimensionless arrival time can be described as a growing linear function of the dimensionless coordinate in the street parallel with flow direction where the gas source is situated. The same might be stated about the dimensionless leaving time as well as the dimensionless duration, however these fits are worse. Utilising a linear function, we might also estimate some other statistical characteristics from datasets than the datasets means (medians, trimeans). The datasets of the dimensionless arrival time, the dimensionless leaving time and the dimensionless duration can be fitted by the generalized extreme value distribution (GEV) in all sampling positions except one.

  19. Scalable Parallel Formulations of the Barnes-Hut Method for n-Body Simulations

    NASA Astrophysics Data System (ADS)

    Grama, Ananth Y.; Kumar, Vipin; Sameh, Ahmed

    In this paper, we present two new parallel formulations of the Barnes-Hut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation which combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. The second parallel formulation is inspired by two currently best known parallel algorithms for the Barnes-Hut method. We present an experimental evaluation of these schemes on a 256 processor nCUBE2 parallel computer for an astrophysical simulation.

  20. Parallel climate model (PCM) control and transient simulations

    NASA Astrophysics Data System (ADS)

    Washington, W. M.; Weatherly, J. W.; Meehl, G. A.; Semtner, A. J., Jr.; Bettge, T. W.; Craig, A. P.; Strand, W. G., Jr.; Arblaster, J.; Wayland, V. B.; James, R.; Zhang, Y.

    The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3) and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on several distributed and shared memory computer systems. The coupling method is similar to that used in the NCAR Climate System Model (CSM) in that a flux coupler ties the components together, with interpolations between the different grids of the component models. Flux adjustments are not used in the PCM. The ocean component has 2/3° average horizontal grid spacing with 32 vertical levels and a free surface that allows calculation of sea level changes. Near the equator, the grid spacing is approximately 1/2° in latitude to better capture the ocean equatorial dynamics. The North Pole is rotated over northern North America thus producing resolution smaller than 2/3° in the North Atlantic where the sinking part of the world conveyor circulation largely takes place. Because this ocean model component does not have a computational point at the North Pole, the Arctic Ocean circulation systems are more realistic and similar to the observed. The elastic viscous plastic sea ice model has a grid spacing of 27km to represent small-scale features such as ice transport through the Canadian Archipelago and the East Greenland current region. Results from a 300year present-day coupled climate control simulation are presented, as well as for a transient 1% per year compound CO2 increase experiment which shows a global warming of 1.27°C for a 10year average at the doubling point of CO2 and 2.89°C at the quadrupling point. There is a gradual warming beyond the doubling and quadrupling points with CO2 held constant. Globally averaged sea level rise at the time of CO2 doubling is approximately 7cm and at the

  1. Implementation and efficiency analysis of parallel computation using OpenACC: a case study using flow field simulations

    NASA Astrophysics Data System (ADS)

    Zhang, Shanghong; Yuan, Rui; Wu, Yu; Yi, Yujun

    2016-01-01

    The Open Accelerator (OpenACC) application programming interface is a relatively new parallel computing standard. In this paper, particle-based flow field simulations are examined as a case study of OpenACC parallel computation. The parallel conversion process of the OpenACC standard is explained, and further, the performance of the flow field parallel model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the parallel algorithm, a speedup factor of 18.26× is possible. In contrast, a speedup factor of just 11.77× was achieved with the conventional Open Multi-Processing (OpenMP) parallel mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC parallel mode is found to have good portability, making it easy to implement parallel computation from the original serial model.

  2. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    SciTech Connect

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  3. Direct Simulation Monte Carlo: Recent Advances and Applications

    NASA Astrophysics Data System (ADS)

    Oran, E. S.; Oh, C. K.; Cybyk, B. Z.

    The principles of and procedures for implementing direct simulation Monte Carlo (DSMC) are described. Guidelines to inherent and external errors common in DSMC applications are provided. Three applications of DSMC to transitional and nonequilibrium flows are considered: rarefied atmospheric flows, growth of thin films, and microsystems. Selected new, potentially important advances in DSMC capabilities are described: Lagrangian DSMC, optimization on parallel computers, and hybrid algorithms for computations in mixed flow regimes. Finally, the limitations of current computer technology for using DSMC to compute low-speed, high-Knudsen-number flows are outlined as future challenges.

  4. Efficient solid state NMR powder simulations using SMP and MPP parallel computation.

    PubMed

    Kristensen, Jørgen Holm; Farnan, Ian

    2003-04-01

    Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations. PMID:12713968

  5. Empirical development of parallelization guidelines for time-driven simulation. Master's thesis

    SciTech Connect

    Huson, M.L.

    1989-12-01

    Distributed simulation is an area of research which offers great promise for speeding up simulations. Program parallelization is usually an iterative process requiring several attempts to produce an efficient parallel implementation of a sequential program. This is due to the lack of any standards or guidelines for program parallelization. In this research effort a Ballistic Missile Defense (BMD) time-driven simulation program, developed by DESE Research and Engineering, was used as a test vehicle for investigating parallelization options for distributed and shared memory architectures. Implementations were developed to address issues of functional versus data program decomposition, computation versus communications overhead, and shared versus distributed memory architectures. Performance data collected from each implementation was used to develop guidelines for implementing parallel versions of sequential time-driven simulations. These guidelines were based on the relative performance of the various implementations and on general observations made during the course of the research.

  6. A sweep algorithm for massively parallel simulation of circuit-switched networks

    NASA Technical Reports Server (NTRS)

    Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

    1992-01-01

    A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

  7. High-throughput mass-directed parallel purification incorporating a multiplexed single quadrupole mass spectrometer.

    PubMed

    Xu, Rongda; Wang, Tao; Isbell, John; Cai, Zhe; Sykes, Christopher; Brailsford, Andrew; Kassel, Daniel B

    2002-07-01

    We report on the development of a parallel HPLC/MS purification system incorporating an indexed (i.e., multiplexed) ion source. In the method described, each of the flow streams from a parallel array of HPLC columns is directed toward the multiplexed (MUX) ion source and sampled in a time-dependent, parallel manner. A visual basic application has been developed and monitors in real-time the extracted ion current from each sprayer channel. Mass-directed fraction collection is initiated into a parallel array of fraction collectors specific for each of the spray channels. In the first embodiment of this technique, we report on a four-column semipreparative parallel LC/MS system incorporating MUX detection. In this parallel LC/MS application (in which sample loads between 1 and 10 mg on-column are typically made), no cross talk was observed. Ion signals from each of the channels were found reproducible over 192 injections, with interchannel signal variations between 11 and 17%. The visual basic fraction collection application permits preset individual start collection and end collection thresholds for each channel, thereby compensating for the slight variation in signal between sprayers. By incorporating postfraction collector UV detection, we have been able to optimize the valve-triggering delay time with precut transfer tubing between the mass spectrometer and fraction collectors and achieve recoveries greater than 80%. Examples of the MUX-guided, mass-directed fraction purification of both standards and real library reaction mixtures are presented within. PMID:12141664

  8. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  9. Parallel implementation of VHDL simulations on the Intel iPSC/2 hypercube. Master's thesis

    SciTech Connect

    Comeau, R.C.

    1991-12-01

    VHDL models are executed sequentially in current commercial simulators. As chip designs grow larger and more complex, simulations must run faster. One approach to increasing simulation speed is through parallel processors. This research transforms the behavioral and structural models created by Intermetrics' sequential VHDL simulator into models for parallel execution. The models are simulated on an Intel iPSC/2 hypercube with synchronization of the nodes being achieved by utilizing the Chandy Misra paradigm for discrete-event simulations. Three eight-bit adders, the ripple carry, the carry save, and the carry-lookahead, are each run through the parallel simulator. Simulation time is cut in at least half for all three test cases over the sequential Intermetrics model. Results with regard to speedup are given to show effects of different mappings, varying workloads per node, and overhead due to output messages.

  10. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  11. Towards parallel I/O in finite element simulations

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Pramono, Eddy; Felippa, Carlos

    1989-01-01

    I/O issues in finite element analysis on parallel processors are addressed. Viable solutions for both local and shared memory multiprocessors are presented. The approach is simple but limited by currently available hardware and software systems. Implementation is carried out on a CRAY-2 system. Performance results are reported.

  12. Large-scale modeling of epileptic seizures: scaling properties of two parallel neuronal network simulation algorithms.

    PubMed

    Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim

    2013-01-01

    Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069

  13. Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms

    DOE PAGESBeta

    Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; Visser, Sid; Stevens, Rick L.; Wildeman, Albert; van Drongelen, Wim

    2013-01-01

    Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less

  14. An evaluation of parallelization strategies for low-frequency electromagnetic induction simulators using staggered grid discretizations

    NASA Astrophysics Data System (ADS)

    Weiss, C. J.; Schultz, A.

    2011-12-01

    The high computational cost of the forward solution for modeling low-frequency electromagnetic induction phenomena is one of the primary impediments against broad-scale adoption by the geoscience community of exploration techniques, such as magnetotellurics and geomagnetic depth sounding, that rely on fast and cheap forward solutions to make tractable the inverse problem. As geophysical observables, electromagnetic fields are direct indicators of Earth's electrical conductivity - a physical property independent of (but in some cases correlative with) seismic wavespeed. Electrical conductivity is known to be a function of Earth's physiochemical state and temperature, and to be especially sensitive to the presence of fluids, melts and volatiles. Hence, electromagnetic methods offer a critical and independent constraint on our understanding of Earth's interior processes. Existing methods for parallelization of time-harmonic electromagnetic simulators, as applied to geophysics, have relied heavily on a combination of strategies: coarse-grained decompositions of the model domain; and/or, a high-order functional decomposition across spectral components, which in turn can be domain-decomposed themselves. Hence, in terms of scaling, both approaches are ultimately limited by the growing communication cost as the granularity of the forward problem increases. In this presentation we examine alternate parallelization strategies based on OpenMP shared-memory parallelization and CUDA-based GPU parallelization. As a test case, we use two different numerical simulation packages, each based on a staggered Cartesian grid: FDM3D (Weiss, 2006) which solves the curl-curl equation directly in terms of the scattered electric field (available under the LGPL at www.openem.org); and APHID, the A-Phi Decomposition based on mixed vector and scalar potentials, in which the curl-curl operator is replaced operationally by the vector Laplacian. We describe progress made in modifying the code to

  15. Comparison of serial and parallel simulations of a corridor fire using FDS

    NASA Astrophysics Data System (ADS)

    Valasek, L.

    2015-09-01

    Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.

  16. Robust large-scale parallel nonlinear solvers for simulations.

    SciTech Connect

    Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson

    2005-11-01

    This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write

  17. An efficient parallel algorithm for O( N2) direct summation method and its variations on distributed-memory parallel machines

    NASA Astrophysics Data System (ADS)

    Makino, Junichiro

    2002-10-01

    We present a novel, highly efficient algorithm to parallelize O( N2) direct summation method for N-body problems with individual timesteps on distributed-memory parallel machines such as Beowulf clusters. Previously known algorithms, in which all processors have complete copies of the N-body system, has the serious problem that the communication-computation ratio increases as we increase the number of processors, since the communication cost is independent of the number of processors. In the new algorithm, p processors are organized as a p×p two-dimensional array. Each processor has N/ p particles, but the data are distributed in such a way that complete system is presented if we look at any row or column consisting of p processors. In this algorithm, the communication cost scales as N/ p, while the calculation cost scales as N2/ p. Thus, we can use a much larger number of processors without losing efficiency compared to what was practical with previously known algorithms.

  18. Direct Particle Simulations of Saturn's Rings

    NASA Astrophysics Data System (ADS)

    Griv, E.; Gedalin, M.; Yuan, C.

    2001-11-01

    An asymptotic theory of spiral density waves and their instabilities in the Saturnian ring system of mutually gravitating and colliding particles has been developed (Griv et al. 2000, Pl&SS, 48, 679-698). The theory is successfully applied to interpreting the observations of the fine-scale structure of Saturn's rings. Our aim is to confirm the validity of the theory by computer many-body (N-body) calculations. As is known, N-body models of Saturn's rings suffer from graininess due to the finite number of particles in direct simulations; the graininess manifests itself as a numerical noise in the calculation. For the first time in planetary ring dynamics, we change the existing direct simulation code. This change involves multiprocessing: with the multiple processors in supercomputers it is possible to reduce the running time by processing more than one group of particles at a time. Use of the 112-processor SGI Origin 2000 supercomputer is enabled us to make long runs using realistic numbers of particles in the direct simulation code and thus simulate phenomena not previously studied numerically. The study will be held in order to predict the ``hyperfine" Saturn's ring system structure of the order 100 m or even less prior to its exploration by Cassini spacecraft that will start in 2004. Partial support for this work was provided by the Israel Science Foundation, the Israeli Ministry of Immigrant Absorption, and the Academia Sinica in Taiwan.

  19. Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

    NASA Astrophysics Data System (ADS)

    Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin

    2016-04-01

    This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.

  20. The IDES framework: A case study in development of a parallel discrete-event simulation system

    SciTech Connect

    Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.

    1997-12-31

    This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.

  1. Parallel Adaptive Multi-Mechanics Simulations using Diablo

    SciTech Connect

    Parsons, D; Solberg, J

    2004-12-03

    Coupled multi-mechanics simulations (such as thermal-stress and fluidstructure interaction problems) are of substantial interest to engineering analysts. In addition, adaptive mesh refinement techniques present an attractive alternative to current mesh generation procedures and provide quantitative error bounds that can be used for model verification. This paper discusses spatially adaptive multi-mechanics implicit simulations using the Diablo computer code. (U)

  2. Direct numerical simulation of a recorder.

    PubMed

    Giordano, N

    2013-02-01

    The aeroacoustics of a recorder are studied using a direct numerical simulation based on the Navier-Stokes equations in two dimensions. Spatial maps for the air pressure and velocity give a detailed picture of vortex shedding near the labium. Changes in the spectrum as a result of variations in the blowing speed are also investigated. The results are in good semi-quantitative agreement with general results for these phenomena from experiments. PMID:23363126

  3. A high resolution finite volume method for efficient parallel simulation of casting processes on unstructured meshes

    SciTech Connect

    Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.

    1997-03-01

    We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

  4. Large Eddy simulation of parallel blade-vortex interaction

    NASA Astrophysics Data System (ADS)

    Felten, Frederic; Lund, Thomas

    2002-11-01

    Helicopter Blade-Vortex Interaction (BVI) generally occurs under certain conditions of powered descent or during extreme maneuvering. The vibration and acoustic problems associated with the interaction of rotor tip vortices and the following blades is a major aerodynamic concern for the helicopter community. Numerous experimental and computational studies have been done over the last two decades in order to gain a better understanding of the physical mechanisms involved in BVI. The most severe interaction, in terms of generated noise, happens when the vortex filament is parallel to the blade, thus affecting a great portion of it. The majority of the previous numerical studies of parallel BVI fall within a potential flow framework. Some Navier-Stokes approaches using dissipative numerical methods and RANS-type turbulence models have also been attempted, but with limited success. The current investigation makes use of an incompressible, non-dissipative, kinetic energy conserving collocated mesh scheme in conjunction with a dynamic subgrid-scale model. The concentrated tip vortex is not attenuated as it is convected downstream and over a NACA-0012 airfoil. The lift, drag, moment and pressure coefficients induced by the passage of the vortex are monitored in time and compared with experimental data.

  5. Parallelized modelling and solution scheme for hierarchically scaled simulations

    NASA Technical Reports Server (NTRS)

    Padovan, Joe

    1995-01-01

    This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

  6. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  7. Xyce parallel electronic simulator users guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  8. Pelegant : a parallel accelerator simulation code for electron generation and tracking.

    SciTech Connect

    Wang, Y.; Borland, M. D.; Accelerator Systems Division

    2006-01-01

    elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

  9. Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers

    SciTech Connect

    Vashishta, Priya; Kalia, Rajiv

    2005-02-24

    Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.

  10. A parallel implementation of the Cellular Potts Model for simulation of cell-based morphogenesis

    PubMed Central

    Chen, Nan; Glazier, James A.; Izaguirre, Jesús A.; Alber, Mark S.

    2007-01-01

    The Cellular Potts Model (CPM) has been used in a wide variety of biological simulations. However, most current CPM implementations use a sequential modified Metropolis algorithm which restricts the size of simulations. In this paper we present a parallel CPM algorithm for simulations of morphogenesis, which includes cell–cell adhesion, a cell volume constraint, and cell haptotaxis. The algorithm uses appropriate data structures and checkerboard subgrids for parallelization. Communication and updating algorithms synchronize properties of cells simulated on different processor nodes. Tests show that the parallel algorithm has good scalability, permitting large-scale simulations of cell morphogenesis (107 or more cells) and broadening the scope of CPM applications. The new algorithm satisfies the balance condition, which is sufficient for convergence of the underlying Markov chain. PMID:18084624

  11. Polar-direct-drive simulations and experiments

    SciTech Connect

    Marozas, J.A.; Marshall, F.J.; Craxton, R.S.; Igumenshchev, I.V.; Skupsky, S.; Bonino, M.J.; Collins, T.J.B.; Epstein, R.; Glebov, V.Yu.; Jacobs-Perkins, D.; Knauer, J.P.; McCrory, R.L.; McKenty, P.W.; Meyerhofer, D.D.; Noyes, S.G.; Radha, P.B.; Sangster, T.C.; Seka, W.; Smalyuk, V.A.

    2006-05-15

    Polar direct drive (PDD) [S. Skupsky et al., Phys. Plasmas 11, 2763 (2004)] will allow direct-drive ignition experiments on the National Ignition Facility (NIF) [J. Paisner et al., Laser Focus World 30, 75 (1994)] as it is configured for x-ray drive. Optimal drive uniformity is obtained via a combination of beam repointing, pulse shapes, spot shapes, and/or target design. This article describes progress in the development of standard and 'Saturn' [R. S. Craxton and D. W. Jacobs-Perkins, Phys. Rev. Lett. 94, 0952002 (2005)] PDD target designs. Initial evaluation of experiments on the OMEGA Laser System [T. R. Boehly et al., Rev. Sci. Instrum. 66, 508 (1995)] and simulations were carried out with the two-dimensional hydrodynamics code SAGE [R. S. Craxton et al., Phys. Plasmas 12, 056304 (2005)]. This article adds to this body of work by including fusion particle production and transport as well as radiation transport within the two-dimensional DRACO [P. B. Radha et al., Phys. Plasmas 12, 032702 (2005)] hydrodynamics simulations used to model experiments. Forty OMEGA beams arranged in six rings to emulate the NIF x-ray-drive configuration are used to perform direct-drive implosions of CH shells filled with D{sub 2} gas. Target performance was diagnosed with framed x-ray backlighting and by the measured fusion yield. Saturn target experiments have resulted in {approx}75% of the yield from energy-equivalent, symmetrically irradiated implosions. The results of the two-dimensional PDD simulations performed with DRACO are in good agreement with experimental x-ray radiographs. DRACO is being used to further optimize standard PDD designs. In addition, DRACO simulations of NIF-scale PDD designs show ignition with a gain of 20 and the development of a 40 {mu}m radius, 10 keV region with a neutron-averaged {rho}r of 1270 mg/cm{sup 2} near stagnation.

  12. Virtual reality visualization of parallel molecular dynamics simulation

    SciTech Connect

    Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.

    1995-12-31

    When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.

  13. Toward parallel, adaptive mesh refinement for chemically reacting flow simulations

    SciTech Connect

    Devine, K.D.; Shadid, J.N.; Salinger, A.G. Hutchinson, S.A.; Hennigan, G.L.

    1997-12-01

    Adaptive numerical methods offer greater efficiency than traditional numerical methods by concentrating computational effort in regions of the problem domain where the solution is difficult to obtain. In this paper, the authors describe progress toward adding mesh refinement to MPSalsa, a computer program developed at Sandia National laboratories to solve coupled three-dimensional fluid flow and detailed reaction chemistry systems for modeling chemically reacting flow on large-scale parallel computers. Data structures that support refinement and dynamic load-balancing are discussed. Results using uniform refinement with mesh sequencing to improve convergence to steady-state solutions are also presented. Three examples are presented: a lid driven cavity, a thermal convection flow, and a tilted chemical vapor deposition reactor.

  14. Parallel performance optimizations on unstructured mesh-based simulations

    SciTech Connect

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

  15. Direct Numerical Simulation of Cosmological Reionization

    NASA Astrophysics Data System (ADS)

    So, Geoffrey C.

    We examine the epoch of hydrogen reionization using a new numerical method that allows us to self-consistently couple all the relevant physical processes (gas dynamics, dark matter dynamics, self-gravity, star formation/feedback, radiative transfer, ionization, recombination, heating and cooling) and evolve the system of coupled equations on the same high resolution mesh. We refer to this approach as direct numerical simulation, in contrast to existing approaches which decouple and coarse-grain the radiative transfer and ionization balance calculations relative to the underlying dynamical calculation. Our method is scalable with respect to the number of radiation sources, size of the mesh, and the number of computer processors employed, and is described in Chapter 2 of this thesis. This scalability permits us to simulate cosmological reionization in large cosmological volumes (~100 Mpc) while directly modeling the sources and sinks of ionizing radiation, including radiative feedback effects such as photoevaporation of gas from halos, Jeans smoothing of the IGM, and enhanced recombination due to small scale clumping. With our fiducial simulation, we find that roughly 2 ionizing photons per baryon is needed to highly ionize the intergalactic medium. The complicated events during reionization that lead to this number can be generally described as inside-out, but in reality the narrative depends on the level of ionization of the gas one defines as ionized. We have updated the formula observers often use for estimating the ionized volume filling fraction formula with a delta b and trec,eff to get from O(10%) to O(1%) consistency with our simulation results. This improvement comes from not using the traditional clumping factor, but instead, considering the history and local effects which were neglected in formulating the original expression. And finally, we have a new upper limit for the escape fraction of ~0.6 from our simulation, which takes into account the photons in

  16. An Optimization Algorithm for Multipath Parallel Allocation for Service Resource in the Simulation Task Workflow

    PubMed Central

    Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

    2014-01-01

    Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506

  17. Development of a Massively Parallel Particle-Mesh Algorithm for Simulations of Galaxy Dynamics and Plasmas

    NASA Astrophysics Data System (ADS)

    Wallin, John

    1996-01-01

    Particle-mesh calculations treat forces and potentials as field quantities which are represented approximately on a mesh. A system of particles is mapped onto this mesh as a density distribution of mass or charge. The Fourier transform is used to convolve this distribution with the Green's function of the potential, and a finite difference scheme is used to calculate the forces acting on the particles. The computation time scales as the Ng log Ng, where Ng is the size of the computational grid. In contrast, the particle-particle method's computing time relies on direct summation, so the time for each calculation is given by Np2, where Np is the number of particles. The particle-mesh method is best suited for simulations with a fixed minimum resolution and for collisionless systems, while hierarchical tree codes have proven to be superior for collisional systems where two-body interactions are important. Particle mesh methods still dominate in plasma physics where collisionless systems are modeled. The CM-200 Connection Machine produced by Thinking Machines Corp. is a data parallel system. On this system, the front-end computer controls the timing and execution of the parallel processing units. The programming paradigm is Single-Instruction, Multiple Data (SIMD). The processors on the CM-200 are connected in an N-dimensional hypercube; the largest number of links a message will ever have to make is N. As in all parallel computing, the efficiency of an algorithm is primarily determined by the fraction of the time spent communicating compared to that spent computing. Because of the topology of the processors, nearest neighbor communication is more efficient than general communication.

  18. Massively Parallel Reactive and Quantum Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Vashishta, Priya

    2015-03-01

    In this talk I will discuss two simulations: Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on a 163,840-processor BlueGene/P supercomputer to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit on the surface. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated. Quantum molecular dynamics (QMD) simulations are performed on 786,432-processor Blue Gene/Q to study on-demand production of hydrogen gas from water using Al nanoclusters. QMD simulations reveal rapid hydrogen production from water by an Al nanocluster. We find a low activation-barrier mechanism, in which a pair of Lewis acid and base sites on the Aln surface preferentially catalyzes hydrogen production. I will also discuss on-demand production of hydrogen gas from water using and LiAl alloy particles. Research reported in this lecture was carried in collaboration with Rajiv Kalia, Aiichiro Nakano and Ken-ichi Nomura from the University of Southern California, and Fuyuki Shimojo and Kohei Shimamura from Kumamoto University, Japan.

  19. Directional interstitial brachytherapy from simulation to application

    NASA Astrophysics Data System (ADS)

    Lin, Liyong

    Organs at risk (OAR) are sometimes adjacent to or embedded in or overlap with the clinical target volume (CTV) to be treated. The purpose of this PhD study is to develop directionally low energy gamma-emitting interstitial brachytherapy sources. These sources can be applied between OAR to selectively reduce hot spots in the OARs and normal tissues. The reduction of dose over undesired regions can expand patient eligibility or reduce toxicities for the treatment by conventional interstitial brachytherapy. This study covers the development of a directional source from design optimization to construction of the first prototype source. The Monte Carlo code MCNP was used to simulate the radiation transport for the designs of directional sources. We have made a special construction kit to assemble radioactive and gold-shield components precisely into D-shaped titanium containers of the first directional source. Directional sources have a similar dose distribution as conventional sources on the treated side but greatly reduced dose on the shielded side, with a sharp dose gradient between them. A three-dimensional dose deposition kernel for the 125I directional source has been calculated. Treatment plans can use both directional and conventional 125I sources at the same source strength for low-dose-rate (LDR) implants to optimize the dose distributions. For prostate tumors, directional 125I LDR brachytherapy can potentially reduce genitourinary and gastrointestinal toxicities and improve potency preservation for low risk patients. The combination of better dose distribution of directional implants and better therapeutic ratio between tumor response and late reactions enables a novel temporary LDR treatment, as opposed to permanent or high-dose-rate (HDR) brachytherapy for the intermediate risk T2b and high risk T2c tumors. Supplemental external-beam treatments can be shortened with a better brachytherapy boost for T3 tumors. In conclusion, we have successfully finished the

  20. Parallel performance optimizations on unstructured mesh-based simulations

    DOE PAGESBeta

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  1. Exploiting quantum parallelism to simulate quantum random many-body systems.

    PubMed

    Paredes, B; Verstraete, F; Cirac, J I

    2005-09-30

    We present an algorithm that exploits quantum parallelism to simulate randomness in a quantum system. In our scheme, all possible realizations of the random parameters are encoded quantum mechanically in a superposition state of an auxiliary system. We show how our algorithm allows for the efficient simulation of dynamics of quantum random spin chains with known numerical methods. We propose an experimental realization based on atoms in optical lattices in which disorder could be simulated in parallel and in a controlled way through the interaction with another atomic species. PMID:16241634

  2. Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

    SciTech Connect

    Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

    2011-09-06

    We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

  3. Rotating parallel ray omni-directional integration for instantaneous pressure reconstruction from measured pressure gradient

    NASA Astrophysics Data System (ADS)

    Liu, Xiaofeng; Siddle-Mitchell, Seth

    2015-11-01

    This paper presents a novel pressure reconstruction method featuring rotating parallel ray omni-directional integration, as an improvement over the circular virtual boundary integration method introduced by Liu and Katz (2003, 2006, 2008 and 2013) for non-intrusive instantaneous pressure measurement in incompressible flow field. Unlike the virtual boundary omni-directional integration, where the integration path is originated from a virtual circular boundary at a finite distance from the real boundary of the integration domain, the new method utilizes parallel rays, which can be viewed as being originated from a distance of infinity, as guidance for integration paths. By rotating the parallel rays, omni-directional paths with equal weights coming from all directions toward the point of interest at any location within the computation domain will be generated. In this way, the location dependence of the integration weight inherent in the old algorithm will be eliminated. By implementing this new algorithm, the accuracy of the reconstructed pressure for a synthetic rotational flow in terms of r.m.s. error from theoretical values is reduced from 1.03% to 0.30%. Improvement is further demonstrated from the comparison of the reconstructed pressure with that from the Johns Hopkins University isotropic turbulence database (JHTDB). This project is funded by the San Diego State University.

  4. Parallel simulation of subsonic fluid dynamics on a cluster of workstations

    NASA Astrophysics Data System (ADS)

    Skordos, Panayotis A.

    1994-11-01

    An effective approach of simulating fluid dynamics on a cluster of non-dedicated workstations is presented. The approach uses local interaction algorithms, small communication capacity, and automatic migration of parallel processes from busy hosts to free hosts. The approach is well-suited for simulating subsonic flow problems which involve both hydrodynamics and acoustic waves, for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed measurements of the parallel efficiency of 2D and 3D simulations are presented, and a theoretical model of efficiency is developed which fits closely the measurements. Two numerical methods of fluid dynamics are tested: explicit finite differences, and the lattice Boltzmann method.

  5. Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

    NASA Technical Reports Server (NTRS)

    Povitsky, A.

    1998-01-01

    In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.

  6. Implementation of a parallel algorithm for thermo-chemical nonequilibrium flow simulations

    SciTech Connect

    Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.

    1995-01-01

    Massively parallel (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to simulate a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale simulation with a large number of computer nodes. This paper will discuss the implementation of a massively parallel computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.

  7. Partitioning and packing mathematical simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.; Milner, E. J.

    1986-01-01

    The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

  8. SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeff S.

    1992-01-01

    Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

  9. Parallel FEM Simulation of Electromechanics in the Heart

    NASA Astrophysics Data System (ADS)

    Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

    2011-11-01

    Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.

  10. LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS

    SciTech Connect

    R. RYNE; J. QIANG

    2000-08-01

    In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.

  11. Dynamic stiffness removal for direct numerical simulations

    SciTech Connect

    Lu, Tianfeng; Law, Chung K.; Yoo, Chun Sang; Chen, Jacqueline H.

    2009-08-15

    A systematic approach was developed to derive non-stiff reduced mechanisms for direct numerical simulations (DNS) with explicit integration solvers. The stiffness reduction was achieved through on-the-fly elimination of short time-scales induced by two features of fast chemical reactivity, namely quasi-steady-state (QSS) species and partial-equilibrium (PE) reactions. The sparse algebraic equations resulting from QSS and PE approximations were utilized such that the efficiency of the dynamic stiffness reduction is high compared with general methods of time-scale reduction based on Jacobian decomposition. Using the dimension reduction strategies developed in our previous work, a reduced mechanism with 52 species was first derived from a detailed mechanism with 561 species. The reduced mechanism was validated for ignition and extinction applications over the parameter range of equivalence ratio between 0.5 and 1.5, pressure between 10 and 50 atm, and initial temperature between 700 and 1600 K for ignition, and worst-case errors of approximately 30% were observed. The reduced mechanism with dynamic stiffness removal was then applied in homogeneous and 1-D ignition applications, as well as a 2-D direct numerical simulation of ignition with temperature inhomogeneities at constant volume with integration time-steps of 5-10 ns. The integration was numerically stable and good accuracy was achieved. (author)

  12. Direct Numerical Simulations of Transient Dispersion

    NASA Astrophysics Data System (ADS)

    Porter, M.; Valdes-Parada, F.; Wood, B.

    2008-12-01

    Transient dispersion is important in many engineering applications, including transport in porous media. A common theoretical approach involves upscaling the micro-scale mass balance equations for convection- diffusion to macro-scale equations that contain effective medium quantities. However, there are a number of assumptions implicit in the various upscaling methods. For example, results obtained from volume averaging are often dependent on a given set of length and time scale constraints. Additionally, a number of the classical models for dispersion do not fully capture the early-time dispersive behavior of the solute for a general set of initial conditions. In this work, we present direct numerical simulations of micro-scale transient mass balance equations for convection-diffusion in both capillary tubes and porous media. Special attention is paid to analysis of the influence of a new time- decaying coefficient that filters the effects of the initial conditions. The direct numerical simulations were compared to results obtained from solving the closure problem associated with volume averaging. These comparisons provide a quantitative measure of the significance of (1) the assumptions implicit in the volume averaging method and (2) the importance of the early-time dispersive behavior of the solute due to various initial conditions.

  13. Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

    NASA Technical Reports Server (NTRS)

    Lee, J.; Kim, K.

    1991-01-01

    A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.

  14. Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

    NASA Astrophysics Data System (ADS)

    Lee, J.; Kim, K.

    A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.

  15. Direct numerical simulation of turbulent reacting flows

    SciTech Connect

    Chen, J.H.

    1993-12-01

    The development of turbulent combustion models that reflect some of the most important characteristics of turbulent reacting flows requires knowledge about the behavior of key quantities in well defined combustion regimes. In turbulent flames, the coupling between the turbulence and the chemistry is so strong in certain regimes that is is very difficult to isolate the role played by one individual phenomenon. Direct numerical simulation (DNS) is an extremely useful tool to study in detail the turbulence-chemistry interactions in certain well defined regimes. Globally, non-premixed flames are controlled by two limiting cases: the fast chemistry limit, where the turbulent fluctuations. In between these two limits, finite-rate chemical effects are important and the turbulence interacts strongly with the chemical processes. This regime is important because industrial burners operate in regimes in which, locally the flame undergoes extinction, or is at least in some nonequilibrium condition. Furthermore, these nonequilibrium conditions strongly influence the production of pollutants. To quantify the finite-rate chemistry effect, direct numerical simulations are performed to study the interaction between an initially laminar non-premixed flame and a three-dimensional field of homogeneous isotropic decaying turbulence. Emphasis is placed on the dynamics of extinction and on transient effects on the fine scale mixing process. Differential molecular diffusion among species is also examined with this approach, both for nonreacting and reacting situations. To address the problem of large-scale mixing and to examine the effects of mean shear, efforts are underway to perform large eddy simulations of round three-dimensional jets.

  16. Bi-directional series-parallel elastic actuator and overlap of the actuation layers.

    PubMed

    Furnémont, Raphaël; Mathijssen, Glenn; Verstraten, Tom; Lefeber, Dirk; Vanderborght, Bram

    2016-02-01

    Several robotics applications require high torque-to-weight ratio and energy efficient actuators. Progress in that direction was made by introducing compliant elements into the actuation. A large variety of actuators were developed such as series elastic actuators (SEAs), variable stiffness actuators and parallel elastic actuators (PEAs). SEAs can reduce the peak power while PEAs can reduce the torque requirement on the motor. Nonetheless, these actuators still cannot meet performances close to humans. To combine both advantages, the series parallel elastic actuator (SPEA) was developed. The principle is inspired from biological muscles. Muscles are composed of motor units, placed in parallel, which are variably recruited as the required effort increases. This biological principle is exploited in the SPEA, where springs (layers), placed in parallel, can be recruited one by one. This recruitment is performed by an intermittent mechanism. This paper presents the development of a SPEA using the MACCEPA principle with a self-closing mechanism. This actuator can deliver a bi-directional output torque, variable stiffness and reduced friction. The load on the motor can also be reduced, leading to a lower power consumption. The variable recruitment of the parallel springs can also be tuned in order to further decrease the consumption of the actuator for a given task. First, an explanation of the concept and a brief description of the prior work done will be given. Next, the design and the model of one of the layers will be presented. The working principle of the full actuator will then be given. At the end of this paper, experiments showing the electric consumption of the actuator will display the advantage of the SPEA over an equivalent stiff actuator. PMID:26813145

  17. A parallel algorithm for transient solid dynamics simulations with contact detection

    SciTech Connect

    Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

    1996-06-01

    Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations.

  18. Monte Carlo simulations of converging laser beam propagating in turbid media with parallel computing

    NASA Astrophysics Data System (ADS)

    Wu, Di; Lu, Jun Q.; Hu, Xin H.; Zhao, S. S.

    1999-11-01

    Due to its flexibility and simplicity, Monte Carlo method is often used to study light propagation in turbid medium where the photons are treated like classic particles being scattered and absorbed randomly based on a radiative transfer theory. However, due to the need of large number of photons to produce statistically significance results, this type of calculations requires large computing resources. To overcome such difficulty, we implemented parallel computing technique into our Monte Carlo simulations. The algorithm is based on the fact that the classic particles are uncorrelated, and the trajectories of multiple photons can be tracked simultaneously. When a beam of focused light incident to the medium, the incident photons are divided into groups according to the available processes on a parallel machine and the calculations are carried out in parallel. Utilizing PVM (Parallel Virtual Machine, a parallel computing software), the parallel programs in both C and FORTRAN are developed on the massive parallel computer Cray T3E at the North Carolina Supercomputer Center and a local PC-cluster network running UNIX/Sun Solaris. The parallel performances of our codes have been excellent on both Cray T3E and the PC clusters. In this paper, we present results on a focusing laser beam propagating through a highly scattering and diluted solution of intralipid. The dependence of the spatial distribution of light near the focal point on the concentration of intralipid solution is studied and its significance is discussed.

  19. Giant impacts during planet formation: Parallel tree code simulations using smooth particle hydrodynamics

    NASA Astrophysics Data System (ADS)

    Cohen, Randi L.

    There is both theoretical and observational evidence that giant planets collided with objects ≥ Mearth during their evolution. These impacts may play a key role in giant planet formation. This paper describes impacts of a ˜ Earth-mass object onto a suite of proto-giant-planets, as simulated using an SPH parallel tree code. We run 6 simulations, varying the impact angle and evolutionary stage of the proto-Jupiter. We find that it is possible for an impactor to free some mass from the core of the proto-planet it impacts through direct collision, as well as to make physical contact with the core yet escape partially, or even completely, intact. None of the 6 cases we consider produced a solid disk or resulted in a net decrease in the core mass of the pinto-planet (since the mass decrease due to disruption was outweighed by the increase due to the addition of the impactor's mass to the core). However, we suggest parameters which may have these effects, and thus decrease core mass and formation time in protoplanetary models and/or create satellite systems. We find that giant impacts can remove significant envelope mass from forming giant planets, leaving only 2 MEarth of gas, similar to Uranus and Neptune. They can also create compositional inhomogeneities in planetary cores, which creates differences in planetary thermal emission characteristics.

  20. Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

    2016-07-01

    Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.

  1. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  2. Simulation of reflooding on two parallel heated channel by TRACE

    NASA Astrophysics Data System (ADS)

    Zakir, Md. Ghulam

    2016-07-01

    In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.

  3. Parallel computation for reservoir thermal simulation: An overlapping domain decomposition approach

    NASA Astrophysics Data System (ADS)

    Wang, Zhongxiao

    2005-11-01

    In this dissertation, we are involved in parallel computing for the thermal simulation of multicomponent, multiphase fluid flow in petroleum reservoirs. We report the development and applications of such a simulator. Unlike many efforts made to parallelize locally the solver of a linear equations system which affects the performance the most, this research takes a global parallelization strategy by decomposing the computational domain into smaller subdomains. This dissertation addresses the domain decomposition techniques and, based on the comparison, adopts an overlapping domain decomposition method. This global parallelization method hands over each subdomain to a single processor of the parallel computer to process. Communication is required when handling overlapping regions between subdomains. For this purpose, MPI (message passing interface) is used for data communication and communication control. A physical and mathematical model is introduced for the reservoir thermal simulation. Numerical tests on two sets of industrial data of practical oilfields indicate that this model and the parallel implementation match the history data accurately. Therefore, we expect to use both the model and the parallel code to predict oil production and guide the design, implementation and real-time fine tuning of new well operating schemes. A new adaptive mechanism to synchronize processes on different processors has been introduced, which not only ensures the computational accuracy but also improves the time performance. To accelerate the convergence rate of iterative solution of the large linear equations systems derived from the discretization of governing equations of our physical and mathematical model in space and time, we adopt the ORTHOMIN method in conjunction with an incomplete LU factorization preconditioning technique. Important improvements have been made in both ORTHOMIN method and incomplete LU factorization in order to enhance time performance without affecting

  4. Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Sawyer, Darren Charles

    1994-01-01

    The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.

  5. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    NASA Technical Reports Server (NTRS)

    Krosel, S. M.; Milner, E. J.

    1982-01-01

    The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

  6. A conflict-free, path-level parallelization approach for sequential simulation algorithms

    NASA Astrophysics Data System (ADS)

    Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.

    2015-07-01

    Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.

  7. Direct numerical simulation of the Leidenfrost Effect

    NASA Astrophysics Data System (ADS)

    Rueda Villegas, Lucia; Tanguy, Sébastien

    2012-11-01

    We present direct numerical simulations of the impact of a single droplet on a heated flat surface in the Leidenfrost regime. To that end, we solve the Navier-Stokes equations, the energy equation, and the species mass fraction equation. The Level Set method is used to track the liquid-gas interface motion and the Ghost Fluid Method is implemented to treat the jump conditions. To get rid of the temporal stability condition due to viscosity, an implicit temporal discretization is used. Some specific numerical methods have been developed to deal with droplet vaporization interface jump conditions. Since the vapor layer is very thin compared to the droplet size, a non-uniform structured grid strongly refined near the wall is used to capture the droplet bounce. We present numerical simulations that enable us to study accurately the bouncing dynamics by analyzing the momentum balance during the droplet bounce. Moreover, we determine from such computation the ratio of the droplet heat transfer flux by comparing the energy used for the phase change (latent heat) to the energy used for droplet heating (specific heat). We then compare the shape of the droplet during the impact with some experimental results.

  8. High performance Python for direct numerical simulations of turbulent flows

    NASA Astrophysics Data System (ADS)

    Mortensen, Mikael; Langtangen, Hans Petter

    2016-06-01

    Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.

  9. Special purpose parallel computer architecture for real-time control and simulation in robotic applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

    1993-01-01

    This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

  10. Virtual Simulator: An infrastructure for design and performance-prediction of massively parallel codes

    NASA Astrophysics Data System (ADS)

    Perumalla, K.; Fujimoto, R.; Pande, S.; Karimabadi, H.; Driscoll, J.; Omelchenko, Y.

    2005-12-01

    Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code.

  11. Parallel simulation of tsunami inundation on a large-scale supercomputer

    NASA Astrophysics Data System (ADS)

    Oishi, Y.; Imamura, F.; Sugawara, D.

    2013-12-01

    An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the

  12. Co-ordination of directional overcurrent protection with load current for parallel feeders

    SciTech Connect

    Wright, J.W.; Lloyd, G.; Hindle, P.J.

    1999-11-01

    Directional phase overcurrent relays are commonly applied at the receiving ends of parallel feeders or transformer feeders. Their purpose is to ensure full discrimination of main or back-up power system overcurrent protection for a fault near the receiving end of one feeder. This paper reviews this type of relay application and highlights load current setting constraints for directional protection. Such constraints have not previously been publicized in well-known text books. A directional relay current setting constraint that is suggested in some text books is based purely on thermal rating considerations for older technology relays. This constraint may not exist with modern numerical relays. In the absence of any apparent constraint, there is a temptation to adopt lower current settings with modern directional relays in relation to reverse load current at the receiving ends of parallel feeders. This paper identifies the danger of adopting very low current settings without any special relay feature to ensure protection security with load current during power system faults. A system incident recorded by numerical relays is also offered to highlight this danger. In cases where there is a need to infringe the identified constraints an implemented and testing relaying technique is proposed.

  13. Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics

    SciTech Connect

    Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J.

    1996-09-01

    Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.

  14. Direct numerical simulation of hot jets

    NASA Technical Reports Server (NTRS)

    Jacob, Marc C.

    1993-01-01

    The ultimate motivation of this work is to investigate the stability of two dimensional heated jets and its implications for aerodynamic sound generation from data obtained with direct numerical simulations (DNS). As pointed out in our last report, these flows undergo two types of instabilities, convective or absolute, depending on their temperature. We also described the limits of earlier experimental and theoretical studies and explained why a numerical investigation could give us new insight into the physics of these instabilities. The aeroacoustical interest of these flows was also underlined. In order to reach this goal, we first need to succeed in the DNS of heated jets. Our past efforts have been focused on this issue which encountered several difficulties. Our numerical difficulties are directly related to the physical problem we want to investigate since these absolutely or almost absolutely unstable flows are by definition very sensitive to the smallest disturbances and are very likely to reach nonlinear saturation through a numerical feedback mechanism. As a result, it is very difficult to compute a steady laminar solution using a spatial DNS. A steady state was reached only for strongly co-flowed jets, but these flows are almost equivalent to two independent mixing layers. Thus they are far from absolute instability and have much lower growth rates.

  15. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    NASA Technical Reports Server (NTRS)

    Jones, Mark Howard

    1987-01-01

    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

  16. Object-Oriented NeuroSys: Parallel Programs for Simulating Large Networks of Biologically Accurate Neurons

    SciTech Connect

    Pacheco, P; Miller, P; Kim, J; Leese, T; Zabiyaka, Y

    2003-05-07

    Object-oriented NeuroSys (ooNeuroSys) is a collection of programs for simulating very large networks of biologically accurate neurons on distributed memory parallel computers. It includes two principle programs: ooNeuroSys, a parallel program for solving the large systems of ordinary differential equations arising from the interconnected neurons, and Neurondiz, a parallel program for visualizing the results of ooNeuroSys. Both programs are designed to be run on clusters and use the MPI library to obtain parallelism. ooNeuroSys also includes an easy-to-use Python interface. This interface allows neuroscientists to quickly develop and test complex neuron models. Both ooNeuroSys and Neurondiz have a design that allows for both high performance and relative ease of maintenance.

  17. Efficient parallel algorithm for statistical ion track simulations in crystalline materials

    NASA Astrophysics Data System (ADS)

    Jeon, Byoungseon; Grønbech-Jensen, Niels

    2009-02-01

    We present an efficient parallel algorithm for statistical Molecular Dynamics simulations of ion tracks in solids. The method is based on the Rare Event Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has been successfully applied to studies of, e.g., ion implantation into crystalline semiconductor wafers. We discuss the strategies for parallelizing the method, and we settle on a host-client type polling scheme in which a multiple of asynchronous processors are continuously fed to the host, which, in turn, distributes the resulting feed-back information to the clients. This real-time feed-back consists of, e.g., cumulative damage information or statistics updates necessary for the cloning in the rare event algorithm. We finally demonstrate the algorithm for radiation effects in a nuclear oxide fuel, and we show the balanced parallel approach with high parallel efficiency in multiple processor configurations.

  18. Direct numerical simulation of inertial flows in porous media

    NASA Astrophysics Data System (ADS)

    Apte, S.; Finn, J.; Wood, B. D.

    2010-12-01

    At modest flow rates (10 ≤ Re ≤ 300) through porous media and packed beds, fluid inertia can result in complex steady and unsteady recirculation regions, dependent on the local pore geometry. Body fitted CFD is a broadly used design and analysis tool for flows in porous media and packed bed type reactors. Unfortunately, the inherent complexities of porous media make unstructured mesh generation a difficult and time consuming step in the simulation process. To accurately capture the inertial dynamics using high-fidelity direct simulations, body fitted meshes must be high quality and sufficiently refined. We present methods to parameterize and simplify mesh generation for packed beds, with an eye toward obtaining efficient mesh independence for Reynolds numbers in the inertial and unsteady regimes. The crux of mesh generation for packed beds is dealing with sphere-sphere or sphere-wall contact points, where a geometric singularity exists. To handle the sphere-sphere and sphere-wall contact points, we use a fillet bridge model, in which every pair of contacting entities are bridged by a fillet, eliminating a small fluid region near the contact point. This results in a continuous surface mesh which does not require resizing of the spheres and can accommodate prism cells for improved boundary layer resolution. A second order accurate, parallel, incompressible flow solver [Moin and Apte, AIAA J. 2006] is used to simulate flow through three different sphere packings: a periodic simple cubic packing, a wall bounded hexagonal close packing, and a randomly packed tube. Mesh independence is assessed using several measures including Ergun pressure drop coefficients, viscous and pressure components of drag force, kinetic energy, kinetic energy dissipation and interstitial velocity profiles. The results of these test cases are used to determine the feasibility of accurate and very large scale simulations of flow through a randomly packed bed of 103 pores. Preliminary results

  19. Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

    NASA Technical Reports Server (NTRS)

    Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

    2009-01-01

    A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.

  20. A parallel computational framework for integrated surface-subsurface flow and transport simulations

    NASA Astrophysics Data System (ADS)

    Park, Y.; Hwang, H.; Sudicky, E. A.

    2010-12-01

    HydroGeoSphere is a 3D control-volume finite element hydrologic model describing fully-integrated surface and subsurface water flow and solute and thermal energy transport. Because the model solves tighly-coupled highly-nonlinear partial differential equations, often applied at regional and continental scales (for example, to analyze the impact of climate change on water resources), high performance computing (HPC) is essential. The target parallelization includes the composition of the Jacobian matrix for the iterative linearization method and the sparse-matrix solver, a preconditioned Bi-CGSTAB. The matrix assembly is parallelized by using a coarse-grained scheme in that the local matrix compositions can be performed independently. The preconditioned Bi-CGSTAB algorithm performs a number of LU substitutions, matrix-vector multiplications, and inner products, where the parallelization of the LU substitution is not trivial. The parallelization of the solver is achieved by partitioning the domain into equal-size subdomains, with an efficient reordering scheme. The computational flow of the Bi-CGSTAB solver is also modified to reduce the parallelization overhead and to be suitable for parallel architectures. The parallelized model is tested on several benchmark simulations which include linear and nonlinear flow problems involving various domain sizes and degrees of hydrologic complexities. The performance is evaluated in terms of computational robustness and efficiency, using standard scaling performance measures. The results of simulation profiling indicate that the efficiency becomes higher with an increasing number of nodes/elements in the mesh, for increasingly nonlinear transient simulations, and with domains of irregular geometry. These characteristics are promising for the large-scale analysis water resources problems involved integrated surface/subsurface flow regimes.

  1. Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

    PubMed

    Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik

    2015-06-01

    Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus. PMID:26575558

  2. IB: a Monte Carlo Simulation Tool for Neutron Scattering Instrument Design under Parallel Virtual Machine

    SciTech Connect

    Zhao, Jinkui

    2011-01-01

    IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.

  3. Direct and Inverse Kinematics of a Novel Tip-Tilt-Piston Parallel Manipulator

    NASA Technical Reports Server (NTRS)

    Tahmasebi, Farhad

    2004-01-01

    Closed-form direct and inverse kinematics of a new three degree-of-freedom (DOF) parallel manipulator with inextensible limbs and base-mounted actuators are presented. The manipulator has higher resolution and precision than the existing three DOF mechanisms with extensible limbs. Since all of the manipulator actuators are base-mounted; higher payload capacity, smaller actuator sizes, and lower power dissipation can be obtained. The manipulator is suitable for alignment applications where only tip, tilt, and piston motions are significant. The direct kinematics of the manipulator is reduced to solving an eighth-degree polynomial in the square of tangent of half-angle between one of the limbs and the base plane. Hence, there are at most 16 assembly configurations for the manipulator. In addition, it is shown that the 16 solutions are eight pairs of reflected configurations with respect to the base plane. Numerical examples for the direct and inverse kinematics of the manipulator are also presented.

  4. Parallel Brownian dynamics simulations with the message-passing and PGAS programming models

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Sutmann, G.; Taboada, G. L.; Touriño, J.

    2013-04-01

    The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes.

  5. cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems

    PubMed Central

    Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo

    2014-01-01

    Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957

  6. Explicit spatial scattering for load balancing in conservatively synchronized parallel discrete-event simulations

    SciTech Connect

    Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip

    2010-01-01

    We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that

  7. Xyce parallel electronic simulator reference guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  8. Study of Fracture in SiC by Parallel Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Chatterjee, A.; Omeltchenko, A.; Kalia, R. K.; Vashishta, P.

    1997-03-01

    Large scale molecular-dynamics simulations are performed on parallel architectures to investigate dynamic fracture in SiC. The simulations are based on an empirical bond-order potential proposed by Tersoff.(J. Tersoff, Phys. Rev. B 39), 5566(1989) (M. Tang and S. Yip, Phys. Rev. B 52), 15150(1995) Results will be presented for crack-front morphology, crack-tip speed, and the effect of strain rate on dynamic fracture.

  9. Massively parallel simulation of flow and transport in variably saturated porous and fractured media

    SciTech Connect

    Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten

    2002-01-15

    This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.

  10. A Plane-Parallel Wind Solution for Testing Numerical Simulations of Photoevaporation

    NASA Astrophysics Data System (ADS)

    Hutchison, Mark A.; Laibe, Guillaume

    2016-04-01

    Here, we derive a Parker-wind-like solution for a stratified, plane-parallel atmosphere undergoing photoionisation. The difference compared to the standard Parker solar wind is that the sonic point is crossed only at infinity. The simplicity of the analytic solution makes it a convenient test problem for numerical simulations of photoevaporation in protoplanetary discs.

  11. Electro-optic directed XOR logic circuits based on parallel-cascaded micro-ring resonators.

    PubMed

    Tian, Yonghui; Zhao, Yongpeng; Chen, Wenjie; Guo, Anqi; Li, Dezhao; Zhao, Guolin; Liu, Zilong; Xiao, Huifu; Liu, Guipeng; Yang, Jianhong

    2015-10-01

    We report an electro-optic photonic integrated circuit which can perform the exclusive (XOR) logic operation based on two silicon parallel-cascaded microring resonators (MRRs) fabricated on the silicon-on-insulator (SOI) platform. PIN diodes embedded around MRRs are employed to achieve the carrier injection modulation. Two electrical pulse sequences regarded as two operands of operations are applied to PIN diodes to modulate two MRRs through the free carrier dispersion effect. The final operation result of two operands is output at the Output port in the form of light. The scattering matrix method is employed to establish numerical model of the device, and numerical simulator SG-framework is used to simulate the electrical characteristics of the PIN diodes. XOR operation with the speed of 100Mbps is demonstrated successfully. PMID:26480148

  12. Accelerating Markov chain Monte Carlo simulation through sequential updating and parallel computing

    NASA Astrophysics Data System (ADS)

    Ren, Ruichao

    Monte Carlo simulation is a statistical sampling method used in studies of physical systems with properties that cannot be easily obtained analytically. The phase behavior of the Restricted Primitive Model of electrolyte solutions on the simple cubic lattice is studied using grand canonical Monte Carlo simulations and finite-size scaling techniques. The transition between disordered and ordered, NaCl-like structures is continuous, second-order at high temperatures and discrete, first-order at low temperatures. The line of continuous transitions meets the line of first-order transitions at a tricritical point. A new algorithm-Random Skipping Sequential (RSS) Monte Carl---is proposed, justified and shown analytically to have better mobility over the phase space than the conventional Metropolis algorithm satisfying strict detailed balance. The new algorithm employs sequential updating, and yields greatly enhanced sampling statistics than the Metropolis algorithm with random updating. A parallel version of Markov chain theory is introduced and applied in accelerating Monte Carlo simulation via cluster computing. It is shown that sequential updating is the key to reduce the inter-processor communication or synchronization which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time by the new method for systems of large and moderate sizes.

  13. Direct methods for banded linear systems on massively parallel processor computers

    SciTech Connect

    Arbenz, P.; Gander, W.

    1995-12-01

    The authors discuss direct methods for solving systems of linear equations Ax = b, A {element_of} lR{sup nxn}, on massively parallel processor (MPP) computers. Here, A is a real banded n x n matrix with lower and upper half-bandwidth r and s, respectively. We assume that the matrix A has a narrow band, meaning r + s << n. Only in this case, it is worthwhile taking into account the zero structure of A, i.e. store the matrix by diagonals and modify algorithms.

  14. Design and fabrication of diffractive microlens arrays with continuous relief for parallel laser direct writing.

    PubMed

    Tan, Jiubin; Shan, Mingguang; Zhao, Chenguang; Liu, Jian

    2008-04-01

    Diffractive microlens arrays with continuous relief are designed, fabricated, and characterized by using Fermat's principle to create an array of spots on the photoresist-coated surface of a substrate for parallel laser direct writing. Experimental results indicate that a diffraction efficiency of 71.4% and a spot size of 1.97 microm (FWHM) can be achieved at normal incidence and a writing laser wavelength of 441.6 nm with an array of F/4 fabricated on fused silica, and the developed array can be used to improve the utilization ratio of writing laser energy. PMID:18382568

  15. A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

    PubMed

    Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

    2013-09-15

    A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can

  16. Direct Numerical Simulation of the Leidenfrost Effect

    NASA Astrophysics Data System (ADS)

    Tanguy, Sebastien; Rueda Villegas, Lucia; Fluid Mechanics Institute of Toulouse Team

    2015-11-01

    The development of numerical methods for the direct numerical simulation of two-phase flows with phase changes, is the main topic of this study. We propose a novel numerical method which allows dealing with both evaporation and boiling at the interface between a liquid and a gas. For instance it can occur for a Leidenfrost droplet; a water drop levitating above a hot plate which temperature is much higher than the boiling temperature. In this case, boiling occurs in the film of saturated vapor which is entrapped between the bottom of the drop and the plate, whereas the top of the water droplet evaporates in contact of ambient air. Thus, boiling and evaporation can occur simultaneously on different regions of the same liquid interface or occur successively at different times of the history of an evaporating droplet. Usual numerical methods are not able to perform computations in these transient regimes, therefore, we propose in this paper a novel numerical method to achieve this challenging task. Finally, we present several accurate validations against experimental results on Leidenfrost Droplets to strengthen the relevance of this new method.

  17. Direct numerical simulations of aeolian sand ripples

    PubMed Central

    Durán, Orencio; Claudin, Philippe; Andreotti, Bruno

    2014-01-01

    Aeolian sand beds exhibit regular patterns of ripples resulting from the interaction between topography and sediment transport. Their characteristics have been so far related to reptation transport caused by the impacts on the ground of grains entrained by the wind into saltation. By means of direct numerical simulations of grains interacting with a wind flow, we show that the instability turns out to be driven by resonant grain trajectories, whose length is close to a ripple wavelength and whose splash leads to a mass displacement toward the ripple crests. The pattern selection results from a compromise between this destabilizing mechanism and a diffusive downslope transport which stabilizes small wavelengths. The initial wavelength is set by the ratio of the sediment flux and the erosion/deposition rate, a ratio which increases linearly with the wind velocity. We show that this scaling law, in agreement with experiments, originates from an interfacial layer separating the saltation zone from the static sand bed, where momentum transfers are dominated by midair collisions. Finally, we provide quantitative support for the use of the propagation of these ripples as a proxy for remote measurements of sediment transport. PMID:25331873

  18. Direct numerical simulations of aeolian sand ripples.

    PubMed

    Durán, Orencio; Claudin, Philippe; Andreotti, Bruno

    2014-11-01

    Aeolian sand beds exhibit regular patterns of ripples resulting from the interaction between topography and sediment transport. Their characteristics have been so far related to reptation transport caused by the impacts on the ground of grains entrained by the wind into saltation. By means of direct numerical simulations of grains interacting with a wind flow, we show that the instability turns out to be driven by resonant grain trajectories, whose length is close to a ripple wavelength and whose splash leads to a mass displacement toward the ripple crests. The pattern selection results from a compromise between this destabilizing mechanism and a diffusive downslope transport which stabilizes small wavelengths. The initial wavelength is set by the ratio of the sediment flux and the erosion/deposition rate, a ratio which increases linearly with the wind velocity. We show that this scaling law, in agreement with experiments, originates from an interfacial layer separating the saltation zone from the static sand bed, where momentum transfers are dominated by midair collisions. Finally, we provide quantitative support for the use of the propagation of these ripples as a proxy for remote measurements of sediment transport. PMID:25331873

  19. Direct numerical simulation of active fiber composite

    NASA Astrophysics Data System (ADS)

    Kim, Seung J.; Hwang, Joon S.; Paik, Seung H.

    2003-08-01

    Active Fiber Composites (AFC) possess desirable characteristics for smart structure applications. One major advantage of AFC is the ability to create anisotropic laminate layers useful in applications requiring off-axis or twisting motions. AFC is naturally composed of two different constituents: piezoelectric fiber and matrix. Therefore, homogenization method, which is utilized in the analysis of laminated composite material, has been used to characterize the material properties. Using this approach, the global behaviors of the structures are predicted in an averaged sense. However, this approach has intrinsic limitations in describing the local behaviors in the level of the constituents. Actually, the failure analysis of AFC requires the knowledge of the local behaviors. Therefore, microscopic approach is necessary to predict the behaviors of AFC. In this work, a microscopic approach for the analysis of AFC was performed. Piezoelectric fiber and matrix were modeled separately and finite element method using three-dimensional solid elements was utilized. Because fine mesh is essential, high performance computing technology was applied to the solution of the immense degree-of-freedom problem. This approach is called Direct Numerical Simulation (DNS) of structure. Through the DNS of AFC, local stress distribution around the interface of fiber and matrix was analyzed.

  20. Direct Numerical Simulation of Cell Printing

    NASA Astrophysics Data System (ADS)

    Qiao, Rui; He, Ping

    2010-11-01

    Structural cell printing, i.e., printing three dimensional (3D) structures of cells held in a tissue matrix, is gaining significant attention in the biomedical community. The key idea is to use desktop printer or similar devices to print cells into 3D patterns with a resolution comparable to the size of mammalian cells, similar to that in living organs. Achieving such a resolution in vitro can lead to breakthroughs in areas such as organ transplantation and understanding of cell-cell interactions in truly 3D spaces. Although the feasibility of cell printing has been demonstrated in the recent years, the printing resolution and cell viability remain to be improved. In this work, we investigate one of the unit operations in cell printing, namely, the impact of a cell-laden droplet into a pool of highly viscous liquids using direct numerical simulations. The dynamics of droplet impact (e.g., crater formation and droplet spreading and penetration) and the evolution of cell shape and internal stress are quantified in details.

  1. Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP

    NASA Astrophysics Data System (ADS)

    Fan, Houfu; Li, Shaofan

    2016-06-01

    In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.

  2. The distributed diagonal force decomposition method for parallelizing molecular dynamics simulations.

    PubMed

    Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

    2011-11-15

    Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007

  3. Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

    NASA Technical Reports Server (NTRS)

    Yamakov, Vesselin I.

    2016-01-01

    This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.

  4. Parallel Simulation Algorithms for the Three Dimensional Strong-Strong Beam-Beam Interaction

    SciTech Connect

    Kabel, A.C.; /SLAC

    2008-03-17

    The strong-strong beam-beam effect is one of the most important effects limiting the luminosity of ring colliders. Little is known about it analytically, so most studies utilize numeric simulations. The two-dimensional realm is readily accessible to workstation-class computers (cf.,e.g.,[1, 2]), while three dimensions, which add effects such as phase averaging and the hourglass effect, require vastly higher amounts of CPU time. Thus, parallelization of three-dimensional simulation techniques is imperative; in the following we discuss parallelization strategies and describe the algorithms used in our simulation code, which will reach almost linear scaling of performance vs. number of CPUs for typical setups.

  5. Development of a MEMS electrostatic condenser lens array for nc-Si surface electron emitters of the Massive Parallel Electron Beam Direct-Write system

    NASA Astrophysics Data System (ADS)

    Kojima, A.; Ikegami, N.; Yoshida, T.; Miyaguchi, H.; Muroyama, M.; Yoshida, S.; Totsu, K.; Koshida, N.; Esashi, M.

    2016-03-01

    Developments of a Micro Electro-Mechanical System (MEMS) electrostatic Condenser Lens Array (CLA) for a Massively Parallel Electron Beam Direct Write (MPEBDW) lithography system are described. The CLA converges parallel electron beams for fine patterning. The structure of the CLA was designed on a basis of analysis by a finite element method (FEM) simulation. The lens was fabricated with precise machining and assembled with a nanocrystalline silicon (nc-Si) electron emitter array as an electron source of MPEBDW. The nc-Si electron emitter has the advantage that a vertical-emitted surface electron beam can be obtained without any extractor electrodes. FEM simulation of electron optics characteristics showed that the size of the electron beam emitted from the electron emitter was reduced to 15% by a radial direction, and the divergence angle is reduced to 1/18.

  6. Application of parallel computing to seismic damage process simulation of an arch dam

    NASA Astrophysics Data System (ADS)

    Zhong, Hong; Lin, Gao; Li, Jianbo

    2010-06-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  7. Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique

    SciTech Connect

    Kanarska, Y

    2010-03-24

    Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical simulation of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle

  8. Re-forming supercritical quasi-parallel shocks. I - One- and two-dimensional simulations

    NASA Technical Reports Server (NTRS)

    Thomas, V. A.; Winske, D.; Omidi, N.

    1990-01-01

    The process of reforming supercritical quasi-parallel shocks is investigated using one-dimensional and two-dimensional hybrid (particle ion, massless fluid electron) simulations both of shocks and of simpler two-stream interactions. It is found that the supercritical quasi-parallel shock is not steady. Instread of a well-defined shock ramp between upstream and downstream states that remains at a fixed position in the flow, the ramp periodically steepens, broadens, and then reforms upstream of its former position. It is concluded that the wave generation process is localized at the shock ramp and that the reformation process proceeds in the absence of upstream perturbations intersecting the shock.

  9. Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

    SciTech Connect

    Kononenko, Oleksiy

    2015-03-26

    ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.

  10. Parallel simulations of Grover's algorithm for closest match search in neutron monitor data

    NASA Astrophysics Data System (ADS)

    Kussainov, Arman; White, Yelena

    We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.

  11. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    NASA Astrophysics Data System (ADS)

    Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik

    2015-09-01

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  12. Application of parallel computing techniques to a large-scale reservoir simulation

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-02-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.

  13. Design of a real-time wind turbine simulator using a custom parallel architecture

    NASA Technical Reports Server (NTRS)

    Hoffman, John A.; Gluck, R.; Sridhar, S.

    1995-01-01

    The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

  14. A method for data handling numerical results in parallel OpenFOAM simulations

    NASA Astrophysics Data System (ADS)

    Anton, Alin; Muntean, Sebastian

    2015-12-01

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit®[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  15. Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

    NASA Technical Reports Server (NTRS)

    Lyons, Daniel T.; Desai, Prasun N.

    2005-01-01

    This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

  16. A method for data handling numerical results in parallel OpenFOAM simulations

    SciTech Connect

    Anton, Alin; Muntean, Sebastian

    2015-12-31

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  17. Advanced in turbulence physics and modeling by direct numerical simulations

    NASA Technical Reports Server (NTRS)

    Reynolds, W. C.

    1987-01-01

    The advent of direct numerical simulations of turbulence has opened avenues for research on turbulence physics and turbulence modeling. Direct numerical simulation provides values for anything that the scientist or modeler would like to know about the flow. An overview of some recent advances in the physical understanding of turbulence and in turbulence modeling obtained through such simulations is presented.

  18. Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.

    PubMed

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S

    2003-01-01

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models. PMID:12714301

  19. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  20. Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.; Lichtner, P. C.; Mills, R. T.

    2014-01-01

    To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.

  1. Long-time atomistic simulations with the Parallel Replica Dynamics method

    NASA Astrophysics Data System (ADS)

    Perez, Danny

    Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.

  2. Application of Parallel Discrete Event Simulation to the Space Surveillance Network

    NASA Astrophysics Data System (ADS)

    Jefferson, D.; Leek, J.

    2010-09-01

    In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to

  3. The relation between reconnected flux, the parallel electric field, and the reconnection rate in a three-dimensional kinetic simulation of magnetic reconnection

    SciTech Connect

    Wendel, D. E.; Olson, D. K.; Hesse, M.; Kuznetsova, M.; Adrian, M. L.; Aunai, N.; Karimabadi, H.; Daughton, W.

    2013-12-15

    We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first–order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.

  4. The relation between reconnected flux, the parallel electric field, and the reconnection rate in a three-dimensional kinetic simulation of magnetic reconnection

    NASA Astrophysics Data System (ADS)

    Wendel, D. E.; Olson, D. K.; Hesse, M.; Aunai, N.; Kuznetsova, M.; Karimabadi, H.; Daughton, W.; Adrian, M. L.

    2013-12-01

    We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.

  5. Vortex-induced vibration of two parallel risers: Experimental test and numerical simulation

    NASA Astrophysics Data System (ADS)

    Huang, Weiping; Zhou, Yang; Chen, Haiming

    2016-04-01

    The vortex-induced vibration of two identical rigidly mounted risers in a parallel arrangement was studied using Ansys- CFX and model tests. The vortex shedding and force were recorded to determine the effect of spacing on the two-degree-of-freedom oscillation of the risers. CFX was used to study the single riser and two parallel risers in 2-8 D spacing considering the coupling effect. Because of the limited width of water channel, only three different riser spacings, 2 D, 3 D, and 4 D, were tested to validate the characteristics of the two parallel risers by comparing to the numerical simulation. The results indicate that the lift force changes significantly with the increase in spacing, and in the case of 3 D spacing, the lift force of the two parallel risers reaches the maximum. The vortex shedding of the risers in 3 D spacing shows that a variable velocity field with the same frequency as the vortex shedding is generated in the overlapped area, thus equalizing the period of drag force to that of lift force. It can be concluded that the interaction between the two parallel risers is significant when the risers are brought to a small distance between them because the trajectory of riser changes from oval to curve 8 as the spacing is increased. The phase difference of lift force between the two risers is also different as the spacing changes.

  6. Simulation and Gaming: Directions, Issues, Ponderables.

    ERIC Educational Resources Information Center

    Uretsky, Michael

    1995-01-01

    Discusses the current use of simulation and gaming in a variety of settings. Describes advances in technology that facilitate the use of simulation and gaming, including computer power, computer networks, software, object-oriented programming, video, multimedia, virtual reality, and artificial intelligence. Considers the future use of simulation…

  7. Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas

    SciTech Connect

    Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.

    2006-07-15

    The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.

  8. Direct numerical simulation of turbulent aerosol coagulation

    NASA Astrophysics Data System (ADS)

    Reade, Walter Caswell

    There are numerous systems-including both industrial applications and natural occurring phenomena-in which the collision/coagulation rates of aerosols are of significant interest. Two examples are the production of fine powders (such as titanium dioxide) and the formation of rain drops in the atmosphere. During the last decade, it has become apparent that dense aerosol particles behave much differently in a turbulent fluid than has been previously assumed. Particles with a response time on the order of the small-scale fluid time scale tend to collect in regions of low vorticity. The result is a particle concentration field that can be highly non-uniform. Sundaram and Collins (1997) recently demonstrated the effect that turbulence can have on the particle collision rate of a monodisperse system. The collision rates of finite-inertia particles can be as much as two orders of magnitude greater than particles that precisely follow the fluid streamlines. Sundaram and Collins derived a general collision expression that explicitly accounted for the two phenomena that affect the collision rate-changes in the particle concentration field and changes in the particle relative velocities. The result of Sundaram and Collins has generated further interest in the turbulent-aerosol problem. This thesis shows that, in addition to changing the rate that an aerosol size distribution might form, turbulence has the potential of dramatically changing the shape of the distribution. This result is demonstrated using direct numerical simulation of a turbulent-aerosol system over a wide range of particle parameters, and a moderate range of turbulence levels. Results show that particles with a small (but finite) initial inertia have the greatest potential of forming broad size distributions. The shape of the resulting size distribution is also affected by the initial size of the particles. Observations are explained using the statistics identified by Sundaram and Collins (1997). A major

  9. Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

    PubMed Central

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  10. Parallel solutions for voxel-based simulations of reaction-diffusion systems.

    PubMed

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  11. Adaptive finite element simulation of flow and transport applications on parallel computers

    NASA Astrophysics Data System (ADS)

    Kirk, Benjamin Shelton

    The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the

  12. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    SciTech Connect

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-11-13

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.

  13. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations.

    PubMed

    Hepburn, I; Chen, W; De Schutter, E

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification. PMID:27497550

  14. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

    NASA Astrophysics Data System (ADS)

    Hepburn, I.; Chen, W.; De Schutter, E.

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.

  15. A Large Scale Simulation of Ultrasonic Wave Propagation in Concrete Using Parallelized EFIT

    NASA Astrophysics Data System (ADS)

    Nakahata, Kazuyuki; Tokunaga, Jyunichi; Kimoto, Kazushi; Hirose, Sohichi

    A time domain simulation tool for the ultrasonic propagation in concrete is developed using the elastodynamic finite integration technique (EFIT) and the image-based modeling. The EFIT is a grid-based time domain differential technique and easily treats the different boundary conditions in the inhomogeneous material such as concrete. Here, the geometry of concrete is determined by a scanned image of concrete and the processed color bitmap image is fed into the EFIT. Although the ultrasonic wave simulation in such a complex material requires much time to calculate, we here execute the EFIT by a parallel computing technique using a shared memory computer system. In this study, formulations of the EFIT and treatment of the different boundary conditions are briefly described and examples of shear horizontal wave propagations in reinforced concrete are demonstrated. The methodology and performance of parallelization for the EFIT are also discussed.

  16. A DIRECT METHOD TO DETERMINE THE PARALLEL MEAN FREE PATH OF SOLAR ENERGETIC PARTICLES WITH ADIABATIC FOCUSING

    SciTech Connect

    He, H.-Q.; Wan, W. E-mail: wanw@mail.iggcas.ac.cn

    2012-03-01

    The parallel mean free path of solar energetic particles (SEPs), which is determined by physical properties of SEPs as well as those of solar wind, is a very important parameter in space physics to study the transport of charged energetic particles in the heliosphere, especially for space weather forecasting. In space weather practice, it is necessary to find a quick approach to obtain the parallel mean free path of SEPs for a solar event. In addition, the adiabatic focusing effect caused by a spatially varying mean magnetic field in the solar system is important to the transport processes of SEPs. Recently, Shalchi presented an analytical description of the parallel diffusion coefficient with adiabatic focusing. Based on Shalchi's results, in this paper we provide a direct analytical formula as a function of parameters concerning the physical properties of SEPs and solar wind to directly and quickly determine the parallel mean free path of SEPs with adiabatic focusing. Since all of the quantities in the analytical formula can be directly observed by spacecraft, this direct method would be a very useful tool in space weather research. As applications of the direct method, we investigate the inherent relations between the parallel mean free path and various parameters concerning physical properties of SEPs and solar wind. Comparisons of parallel mean free paths with and without adiabatic focusing are also presented.

  17. Construction of a parallel processor for simulating manipulators and other mechanical systems

    NASA Technical Reports Server (NTRS)

    Hannauer, George

    1991-01-01

    This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

  18. Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations

    NASA Technical Reports Server (NTRS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.

  19. Spontaneous hot flow anomalies at quasi-parallel shocks: 2. Hybrid simulations

    NASA Astrophysics Data System (ADS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Abstract<p label="1">Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid <span class="hlt">simulations</span> to investigate the formation of Spontaneous Hot Flow Anomalies (SHFAs) upstream of quasi-<span class="hlt">parallel</span> bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-<span class="hlt">parallel</span> bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity, and enhancements in ion temperature. Using virtual spacecraft in the <span class="hlt">simulation</span>, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the <span class="hlt">simulation</span> data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly nonuniform plasma in the quasi-<span class="hlt">parallel</span> magnetosheath including large-scale density and magnetic field cavities.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24329381','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24329381"><span id="translatedtitle">Efficient <span class="hlt">parallelization</span> of short-range molecular dynamics <span class="hlt">simulations</span> on many-core systems.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Meyer, R</p> <p>2013-11-01</p> <p>This article introduces a highly <span class="hlt">parallel</span> algorithm for molecular dynamics <span class="hlt">simulations</span> with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high <span class="hlt">parallel</span> speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark <span class="hlt">simulations</span> on a typical 12-core machine show that the described algorithm achieves excellent <span class="hlt">parallel</span> efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These <span class="hlt">simulations</span> demonstrate that the algorithm scales well to large numbers of cores. PMID:24329381</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li class="active"><span>17</span></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_17 --> <div id="page_18" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="341"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhRvE..88e3309M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhRvE..88e3309M"><span id="translatedtitle">Efficient <span class="hlt">parallelization</span> of short-range molecular dynamics <span class="hlt">simulations</span> on many-core systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Meyer, R.</p> <p>2013-11-01</p> <p>This article introduces a highly <span class="hlt">parallel</span> algorithm for molecular dynamics <span class="hlt">simulations</span> with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high <span class="hlt">parallel</span> speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark <span class="hlt">simulations</span> on a typical 12-core machine show that the described algorithm achieves excellent <span class="hlt">parallel</span> efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These <span class="hlt">simulations</span> demonstrate that the algorithm scales well to large numbers of cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3148765','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3148765"><span id="translatedtitle"><span class="hlt">Parallel</span> Discrete Molecular Dynamics <span class="hlt">Simulation</span> With Speculation and In-Order Commitment*†</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Khan, Md. Ashfaquzzaman; Herbordt, Martin C.</p> <p>2011-01-01</p> <p>Discrete molecular dynamics <span class="hlt">simulation</span> (DMD) uses simplified and discretized models enabling <span class="hlt">simulations</span> to advance by event rather than by timestep. DMD is an instance of discrete event <span class="hlt">simulation</span> and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of <span class="hlt">parallelizing</span> DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes <span class="hlt">parallelism</span>, while in-order commitment ensures correctness. We analyze the potential of this <span class="hlt">parallelization</span> method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19930012473','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19930012473"><span id="translatedtitle">Large eddy <span class="hlt">simulations</span> and <span class="hlt">direct</span> numerical <span class="hlt">simulations</span> of high speed turbulent reacting flows</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Givi, P.; Frankel, S. H.; Adumitroaie, V.; Sabini, G.; Madnia, C. K.</p> <p>1993-01-01</p> <p>The primary objective of this research is to extend current capabilities of Large Eddy <span class="hlt">Simulations</span> (LES) and <span class="hlt">Direct</span> Numerical <span class="hlt">Simulations</span> (DNS) for the computational analyses of high speed reacting flows. Our efforts in the first two years of this research have been concentrated on a priori investigations of single-point Probability Density Function (PDF) methods for providing subgrid closures in reacting turbulent flows. In the efforts initiated in the third year, our primary focus has been on performing actual LES by means of PDF methods. The approach is based on assumed PDF methods and we have performed extensive analysis of turbulent reacting flows by means of LES. This includes <span class="hlt">simulations</span> of both three-dimensional (3D) isotropic compressible flows and two-dimensional reacting planar mixing layers. In addition to these LES analyses, some work is in progress to assess the extent of validity of our assumed PDF methods. This assessment is done by making detailed companions with recent laboratory data in predicting the rate of reactant conversion in <span class="hlt">parallel</span> reacting shear flows. This report provides a summary of our achievements for the first six months of the third year of this program.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19990008585','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19990008585"><span id="translatedtitle">Synchronous <span class="hlt">Parallel</span> Emulation and Discrete Event <span class="hlt">Simulation</span> System with Self-Contained <span class="hlt">Simulation</span> Objects and Active Event Objects</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Steinman, Jeffrey S. (Inventor)</p> <p>1998-01-01</p> <p>The present invention is embodied in a method of performing object-oriented <span class="hlt">simulation</span> and a system having inter-connected processor nodes operating in <span class="hlt">parallel</span> to <span class="hlt">simulate</span> mutual interactions of a set of discrete <span class="hlt">simulation</span> objects distributed among the nodes as a sequence of discrete events changing state variables of respective <span class="hlt">simulation</span> objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented <span class="hlt">simulation</span> is performed at each one of the nodes by assigning passive self-contained <span class="hlt">simulation</span> objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained <span class="hlt">simulation</span> objects of the one node, restricting the respective passive self-contained <span class="hlt">simulation</span> objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained <span class="hlt">simulation</span> object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1999APS..DPP.JP165L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1999APS..DPP.JP165L"><span id="translatedtitle"><span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> of Ultra-High Intensity Laser Plasma Interactions.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lasinski, B. F.; Still, C. H.; Langdon, A. B.; Wilks, S. C.; Hatchett, S. P.; Hinkel, D. E.</p> <p>1999-11-01</p> <p>We extend our previous <span class="hlt">simulations</span> of high intensity short pulse laser plasma interactionsfootnote B. F. Lasinski, A. B. Langdon, S. P. Hatchett, M. H. Key, and M. Tabak, Phys. Plasmas 6, 2041 (1999); S. C. Wilks and W. L. Kruer, IEEE Journal of Quantum Electronics 11, 1954 (1997). to 3D and to much larger systems in 2D using our new, modern, 3D, electromagnetic, fully relativistic, massively <span class="hlt">parallel</span> PIC code. Our <span class="hlt">simulation</span> parameters are guided by the recent Petawatt experiments at Livermore. We study the generation of hot electrons and energetic ions and the associated complex phenomena. Laser light filamentation and the formation of high static magnetic fields are described.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22218447','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22218447"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of three-dimensional molecular dynamic <span class="hlt">simulation</span> for laser-cluster interaction</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Holkundkar, Amol R.</p> <p>2013-11-15</p> <p>The objective of this article is to report the <span class="hlt">parallel</span> implementation of the 3D molecular dynamic <span class="hlt">simulation</span> code for laser-cluster interactions. The benchmarking of the code has been done by comparing the <span class="hlt">simulation</span> results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2009AGUFM.H23D0981O','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2009AGUFM.H23D0981O"><span id="translatedtitle">Adaptive particle-based pore-level modeling of incompressible fluid flow in porous media: a <span class="hlt">direct</span> and <span class="hlt">parallel</span> approach</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ovaysi, S.; Piri, M.</p> <p>2009-12-01</p> <p>We present a three-dimensional fully dynamic <span class="hlt">parallel</span> particle-based model for <span class="hlt">direct</span> pore-level <span class="hlt">simulation</span> of incompressible viscous fluid flow in disordered porous media. The model was developed from scratch and is capable of <span class="hlt">simulating</span> flow <span class="hlt">directly</span> in three-dimensional high-resolution microtomography images of naturally occurring or man-made porous systems. It reads the images as input where the position of the solid walls are given. The entire medium, i.e., solid and fluid, is then discretized using particles. The model is based on Moving Particle Semi-implicit (MPS) technique. We modify this technique in order to improve its stability. The model handles highly irregular fluid-solid boundaries effectively. It takes into account viscous pressure drop in addition to the gravity forces. It conserves mass and can automatically detect any false connectivity with fluid particles in the neighboring pores and throats. It includes a sophisticated algorithm to automatically split and merge particles to maintain hydraulic connectivity of extremely narrow conduits. Furthermore, it uses novel methods to handle particle inconsistencies and open boundaries. To handle the computational load, we present a fully <span class="hlt">parallel</span> version of the model that runs on distributed memory computer clusters and exhibits excellent scalability. The model is used to <span class="hlt">simulate</span> unsteady-state flow problems under different conditions starting from straight noncircular capillary tubes with different cross-sectional shapes, i.e., circular/elliptical, square/rectangular and triangular cross-sections. We compare the predicted dimensionless hydraulic conductances with the data available in the literature and observe an excellent agreement. We then test the scalability of our <span class="hlt">parallel</span> model with two samples of an artificial sandstone, samples A and B, with different volumes and different distributions (non-uniform and uniform) of solid particles among the processors. An excellent linear scalability is</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMD.....8..473H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMD.....8..473H"><span id="translatedtitle">A generic <span class="hlt">simulation</span> cell method for developing extensible, efficient and readable <span class="hlt">parallel</span> computational models</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Honkonen, I.</p> <p>2015-03-01</p> <p>I present a method for developing extensible and modular computational models without sacrificing serial or <span class="hlt">parallel</span> performance or source code readability. By using a generic <span class="hlt">simulation</span> cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic <span class="hlt">simulation</span> cell method presented here, generic <span class="hlt">simulation</span> cell class (gensimcell), also includes support for <span class="hlt">parallel</span> programming by allowing model developers to select which <span class="hlt">simulation</span> variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic <span class="hlt">simulation</span> cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at <a href="https://github.com/nasailja/gensimcell"target="_blank">https://github.com/nasailja/gensimcell</a> for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19990040320','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19990040320"><span id="translatedtitle">Experimental Studies of the Interaction Between a <span class="hlt">Parallel</span> Shear Flow and a <span class="hlt">Directionally</span>-Solidifying Front</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Zhang, Meng; Maxworthy, Tony</p> <p>1999-01-01</p> <p>It has long been recognized that flow in the melt can have a profound influence on the dynamics of a solidifying interface and hence the quality of the solid material. In particular, flow affects the heat and mass transfer, and causes spatial and temporal variations in the flow and melt composition. This results in a crystal with nonuniform physical properties. Flow can be generated by buoyancy, expansion or contraction upon phase change, and thermo-soluto capillary effects. In general, these flows can not be avoided and can have an adverse effect on the stability of the crystal structures. This motivates crystal growth experiments in a microgravity environment, where buoyancy-driven convection is significantly suppressed. However, transient accelerations (g-jitter) caused by the acceleration of the spacecraft can affect the melt, while convection generated from the effects other than buoyancy remain important. Rather than bemoan the presence of convection as a source of interfacial instability, Hurle in the 1960s suggested that flow in the melt, either forced or natural convection, might be used to stabilize the interface. Delves considered the imposition of both a parabolic velocity profile and a Blasius boundary layer flow over the interface. He concluded that fast stirring could stabilize the interface to perturbations whose wave vector is in the <span class="hlt">direction</span> of the fluid velocity. Forth and Wheeler considered the effect of the asymptotic suction boundary layer profile. They showed that the effect of the shear flow was to generate travelling waves <span class="hlt">parallel</span> to the flow with a speed proportional to the Reynolds number. There have been few quantitative, experimental works reporting on the coupling effect of fluid flow and morphological instabilities. Huang studied plane Couette flow over cells and dendrites. It was found that this flow could greatly enhance the planar stability and even induce the cell-planar transition. A rotating impeller was buried inside the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012AGUFMOS34A..07R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012AGUFMOS34A..07R"><span id="translatedtitle">Field-Scale, Massively <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Production from Oceanic Gas Hydrate Deposits</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Reagan, M. T.; Moridis, G. J.; Freeman, C. M.; Pan, L.; Boyle, K. L.; Johnson, J. N.; Husebo, J. A.</p> <p>2012-12-01</p> <p>The quantity of hydrocarbon gases trapped in natural hydrate accumulations is enormous, leading to significant interest in the evaluation of their potential as an energy source. It has been shown that large volumes of gas can be readily produced at high rates for long times from some types of methane hydrate accumulations by means of depressurization-induced dissociation, and using conventional technologies with horizontal or vertical well configurations. However, these systems are currently assessed using simplified or reduced-scale 3D or even 2D production <span class="hlt">simulations</span>. In this study, we use the massively <span class="hlt">parallel</span> TOUGH+HYDRATE code (pT+H) to assess the production potential of a large, deep-ocean hydrate reservoir and develop strategies for effective production. The <span class="hlt">simulations</span> model a full 3D system of over 24 km2 extent, examining the productivity of vertical and horizontal wells, single or multiple wells, and explore variations in reservoir properties. Systems of up to 2.5M gridblocks, running on thousands of supercomputing nodes, are required to <span class="hlt">simulate</span> such large systems at the highest level of detail. The <span class="hlt">simulations</span> reveal the challenges inherent in producing from deep, relatively cold systems with extensive water-bearing channels and connectivity to large aquifers, including the difficulty of achieving depressurizing, the challenges of high water removal rates, and the complexity of production design. Also highlighted are new frontiers in large-scale reservoir <span class="hlt">simulation</span> of coupled flow, transport, thermodynamics, and phase behavior, including the construction of large meshes, the use <span class="hlt">parallel</span> numerical solvers and MPI, and large-scale, <span class="hlt">parallel</span> 3D visualization of results.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22230824','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22230824"><span id="translatedtitle">Massively <span class="hlt">parallel</span> Monte Carlo for many-particle <span class="hlt">simulations</span> on GPUs</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.</p> <p>2013-12-01</p> <p>Current trends in <span class="hlt">parallel</span> processors call for the design of efficient massively <span class="hlt">parallel</span> algorithms for scientific computing. <span class="hlt">Parallel</span> algorithms for Monte Carlo <span class="hlt">simulations</span> of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively <span class="hlt">parallel</span> method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1033544','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1033544"><span id="translatedtitle">Petascale <span class="hlt">direct</span> numerical <span class="hlt">simulation</span> of blood flow on 200K cores and heterogeneous architectures</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sampath, Rahul S; Veerapaneni, Shravan; Biros, George; Zorin, Denis; Vuduc, Richard; Vetter, Jeffrey S; Moon, Logan; Malhotra, Dhairya; Shringarpure, Aashay; Rahimian, Abtin; Lashuk, Ilya; Chandramowlishwaran, Aparna</p> <p>2010-01-01</p> <p>We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the <span class="hlt">direct</span> <span class="hlt">simulation</span> of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). <span class="hlt">Directly</span> <span class="hlt">simulating</span> blood is a challenging multiscale, multiphysics problem. We report <span class="hlt">simulations</span> with up to 260 million deformable RBCs. The largest <span class="hlt">simulation</span> amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest <span class="hlt">simulation</span>, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for highly non-uniform spatial distributions of RBCs. The new method has been implemented in the software library MOBO (for 'Moving Boundaries'). We designed MOBO to support <span class="hlt">parallelism</span> at all levels, including inter-node distributed memory <span class="hlt">parallelism</span>, intra-node shared memory <span class="hlt">parallelism</span>, data <span class="hlt">parallelism</span> (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest <span class="hlt">simulation</span>, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001JChPh.114.9772S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001JChPh.114.9772S"><span id="translatedtitle">A novel <span class="hlt">parallel</span>-rotation algorithm for atomistic Monte Carlo <span class="hlt">simulation</span> of dense polymer systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.</p> <p>2001-06-01</p> <p>We develop and test a new elementary Monte Carlo move for use in the off-lattice <span class="hlt">simulation</span> of polymer systems. This novel <span class="hlt">Parallel</span>-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The <span class="hlt">parallel</span>-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo <span class="hlt">simulation</span>. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo <span class="hlt">simulation</span> of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice <span class="hlt">simulation</span> of realistic models of dense polymer systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016SPIE.9805E..0NS&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016SPIE.9805E..0NS&link_type=ABSTRACT"><span id="translatedtitle">Modeling of fatigue crack induced nonlinear ultrasonics using a highly <span class="hlt">parallelized</span> explicit local interaction <span class="hlt">simulation</span> approach</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shen, Yanfeng; Cesnik, Carlos E. S.</p> <p>2016-04-01</p> <p>This paper presents a <span class="hlt">parallelized</span> modeling technique for the efficient <span class="hlt">simulation</span> of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction <span class="hlt">Simulation</span> Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly <span class="hlt">parallelized</span> supercomputing on powerful graphic cards. Both the explicit contact formulation and the <span class="hlt">parallel</span> feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015CoPhC.194...18N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015CoPhC.194...18N"><span id="translatedtitle">Computational performance of a smoothed particle hydrodynamics <span class="hlt">simulation</span> for shared-memory <span class="hlt">parallel</span> computing</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide</p> <p>2015-09-01</p> <p>The computational performance of a smoothed particle hydrodynamics (SPH) <span class="hlt">simulation</span> is investigated for three types of current shared-memory <span class="hlt">parallel</span> computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several <span class="hlt">parallel</span> implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH <span class="hlt">simulation</span> on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs <span class="hlt">parallelized</span> by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/974630','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/974630"><span id="translatedtitle"><span class="hlt">Parallel</span> Agent-Based <span class="hlt">Simulations</span> on Clusters of GPUs and Multi-Core Processors</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K</p> <p>2010-01-01</p> <p>An effective latency-hiding mechanism is presented in the <span class="hlt">parallelization</span> of agent-based model <span class="hlt">simulations</span> (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art <span class="hlt">parallel</span> computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct <span class="hlt">parallel</span> platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular <span class="hlt">simulator</span> in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016PASJ...68...54I&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016PASJ...68...54I&link_type=ABSTRACT"><span id="translatedtitle">Implementation and performance of FDPS: a framework for developing <span class="hlt">parallel</span> particle <span class="hlt">simulation</span> codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro</p> <p>2016-08-01</p> <p>We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle <span class="hlt">Simulators</span>). FDPS is an application-development framework which helps researchers to develop <span class="hlt">simulation</span> programs using particle methods for large-scale distributed-memory <span class="hlt">parallel</span> supercomputers. A particle-based <span class="hlt">simulation</span> program for distributed-memory <span class="hlt">parallel</span> computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory <span class="hlt">parallel</span> computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient <span class="hlt">parallel</span> execution of particle-based <span class="hlt">simulations</span> as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale <span class="hlt">parallel</span> supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016PASJ..tmp...65I','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016PASJ..tmp...65I"><span id="translatedtitle">Implementation and performance of FDPS: a framework for developing <span class="hlt">parallel</span> particle <span class="hlt">simulation</span> codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro</p> <p>2016-06-01</p> <p>We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle <span class="hlt">Simulators</span>). FDPS is an application-development framework which helps researchers to develop <span class="hlt">simulation</span> programs using particle methods for large-scale distributed-memory <span class="hlt">parallel</span> supercomputers. A particle-based <span class="hlt">simulation</span> program for distributed-memory <span class="hlt">parallel</span> computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory <span class="hlt">parallel</span> computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient <span class="hlt">parallel</span> execution of particle-based <span class="hlt">simulations</span> as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale <span class="hlt">parallel</span> supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016RScI...87g6101S&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016RScI...87g6101S&link_type=ABSTRACT"><span id="translatedtitle">Note: Application of a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator for <span class="hlt">simulation</span> of hip joint motion</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shan, X. L.; Cheng, G.; Liu, X. Z.</p> <p>2016-07-01</p> <p>In the paper, a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator, which has two moving platforms, is proposed. The <span class="hlt">parallel</span> manipulator is adopted to <span class="hlt">simulate</span> hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the <span class="hlt">simulated</span> motions. The experimental results indicate that the 2(3HUS+S) <span class="hlt">parallel</span> manipulator can realize the <span class="hlt">simulation</span> of many kinds of hip joint motions without changing the structure size.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27475608','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27475608"><span id="translatedtitle">Note: Application of a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator for <span class="hlt">simulation</span> of hip joint motion.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shan, X L; Cheng, G; Liu, X Z</p> <p>2016-07-01</p> <p>In the paper, a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator, which has two moving platforms, is proposed. The <span class="hlt">parallel</span> manipulator is adopted to <span class="hlt">simulate</span> hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the <span class="hlt">simulated</span> motions. The experimental results indicate that the 2(3HUS+S) <span class="hlt">parallel</span> manipulator can realize the <span class="hlt">simulation</span> of many kinds of hip joint motions without changing the structure size. PMID:27475608</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_18 --> <div id="page_19" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="361"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1170468','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1170468"><span id="translatedtitle">Future <span class="hlt">Directions</span> in <span class="hlt">Simulating</span> Solar Geoengineering</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kravitz, Benjamin S.; Robock, Alan; Boucher, Olivier</p> <p>2014-08-05</p> <p>Solar geoengineering is a proposed set of technologies to temporarily alleviate some of the consequences of anthropogenic greenhouse gas emissions. The Geoengineering Model Intercomparison Project (GeoMIP) created a framework of geoengineering <span class="hlt">simulations</span> in climate models that have been performed by modeling centers throughout the world (B. Kravitz et al., The Geoengineering Model Intercomparison Project (GeoMIP), Atmospheric Science Letters, 12(2), 162-167, doi:10.1002/asl.316, 2011). These experiments use state-of-the-art climate models to <span class="hlt">simulate</span> solar geoengineering via uniform solar reduction, creation of stratospheric sulfate aerosol layers, or injecting sea spray into the marine boundary layer. GeoMIP has been quite successful in its mission of revealing robust features and key uncertainties of the modeled effects of solar geoengineering.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/936704','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/936704"><span id="translatedtitle">De Novo Ultrascale Atomistic <span class="hlt">Simulations</span> On High-End <span class="hlt">Parallel</span> Supercomputers</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H</p> <p>2006-09-04</p> <p>We present a de novo hierarchical <span class="hlt">simulation</span> framework for first-principles based predictive <span class="hlt">simulations</span> of materials and their validation on high-end <span class="hlt">parallel</span> supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) <span class="hlt">simulations</span> explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) <span class="hlt">simulations</span> are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling <span class="hlt">simulation</span> algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical <span class="hlt">simulation</span> with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition <span class="hlt">parallelization</span> framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic <span class="hlt">simulations</span>--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the <span class="hlt">parallel</span> efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21499769','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21499769"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Martinez, E.; Monasterio, P.R.; Marian, J.</p> <p>2011-02-20</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910070181&hterms=Geomagnetic+pulsations&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D30%26Ntt%3DGeomagnetic%2Bpulsations','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910070181&hterms=Geomagnetic+pulsations&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D30%26Ntt%3DGeomagnetic%2Bpulsations"><span id="translatedtitle">Steepening of <span class="hlt">parallel</span> propagating hydromagnetic waves into magnetic pulsations - A <span class="hlt">simulation</span> study</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Akimoto, K.; Winske, D.; Onsager, T. G.; Thomsen, M. F.; Gary, S. P.</p> <p>1991-01-01</p> <p>The steepening mechanism of <span class="hlt">parallel</span> propagating low-frequency MHD-like waves observed upstream of the earth's quasi-<span class="hlt">parallel</span> bow shock has been investigated by means of electromagnetic hybrid <span class="hlt">simulations</span>. It is shown that an ion beam through the resonant electromagnetic ion/ion instability excites large-amplitude waves, which consequently pitch angle scatter, decelerate, and eventually magnetically trap beam ions in regions where the wave amplitudes are largest. As a result, the beam ions become bunched in both space and gyrophase. As these higher-density, nongyrotropic beam segments are formed, the hydromagnetic waves rapidly steepen, resulting in magnetic pulsations, with properties generally in agreement with observations. This steepening process operates on the scale of the linear growth time of the resonant ion/ion instability. Many of the pulsations generated by this mechanism are left-hand polarized in the spacecraft frame.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMDD....8.2369H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMDD....8.2369H"><span id="translatedtitle">A <span class="hlt">parallelization</span> scheme to <span class="hlt">simulate</span> reactive transport in the subsurface environment with OGS#IPhreeqc</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.</p> <p>2015-03-01</p> <p>This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to <span class="hlt">simulate</span> thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible <span class="hlt">parallelization</span> scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and <span class="hlt">parallelization</span> scheme have been tested and verified in terms of precision and performance.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011JCoPh.230.1359M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011JCoPh.230.1359M"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Martínez, E.; Monasterio, P. R.; Marian, J.</p> <p>2011-02-01</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JSP...162..701U','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JSP...162..701U"><span id="translatedtitle"><span class="hlt">Parallel</span> Tempering Monte Carlo <span class="hlt">Simulations</span> of Spherical Fixed-Connectivity Model for Polymerized Membranes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Usui, Satoshi; Koibuchi, Hiroshi</p> <p>2016-02-01</p> <p>We study the first order phase transition of the fixed-connectivity triangulated surface model using the <span class="hlt">Parallel</span> Tempering Monte Carlo (PTMC) technique on relatively large lattices. From the PTMC results, we find that the transition is considerably stronger than the reported ones predicted by the conventional Metropolis MC (MMC) technique and the flat histogram MC technique. We also confirm that the results of the PTMC on relatively smaller lattices are in good agreement with those known results. This implies that the PTMC is successfully used to <span class="hlt">simulate</span> the first order phase transitions. The <span class="hlt">parallel</span> computation in the PTMC is implemented by OpenMP, where the speed of the PTMC on multi-core CPUs is considerably faster than that on the single-core CPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/658737','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/658737"><span id="translatedtitle"><span class="hlt">Parallel</span> traffic flow <span class="hlt">simulation</span> of freeway networks: Phase 2. Final report 1994--1995</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Chronopoulos, A.</p> <p>1997-07-01</p> <p>Explicit and implicit numerical methods for solving simple macroscopic traffic flow continuum models have been studied and efficiently implemented in traffic <span class="hlt">simulation</span> codes in the past. The authors have already studied and implemented explicit methods for solving the high-order flow conservation traffic model. Implicit methods allow much larger time step size than explicit methods, for the same accuracy. However, at each time step a nonlinear system must be solved. They use the Newton method coupled with a linear iterative (Orthomin). They accelerate the convergence of Orthomin with <span class="hlt">parallel</span> incomplete LU factorization preconditionings. The authors implemented this implicit method on a 16 processor nCUBE2 <span class="hlt">parallel</span> computer and obtained significant execution time speedup.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2004LNP...650..255G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2004LNP...650..255G"><span id="translatedtitle">Small-World Synchronized Computing Networks for Scalable <span class="hlt">Parallel</span> Discrete-Event <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Guclu, Hasan; Korniss, Gyorgy; Toroczkai, Zoltan; Novotny, Mark A.</p> <p></p> <p>We study the scalability of <span class="hlt">parallel</span> discrete-event <span class="hlt">simulations</span> for arbitrary short-range interacting systems with asynchronous dynamics. When the synchronization topology mimics that of the short-range interacting underlying system, the virtual time horizon (corresponding to the progress of the processing elements) exhibits Kardar-Parisi-Zhang-like kinetic roughening. Although the virtual times, on average, progress at a nonzero rate, their statistical spread diverges with the number of processing elements, hindering efficient data collection. We show that when the synchronization topology is extended to include quenched random communication links between the processing elements, they make a close-to-uniform progress with a nonzero rate, without global synchronization. We discuss in detail a coarse-grained description for the small-world synchronized virtual time horizon and compare the findings to those obtained by <span class="hlt">simulating</span> the <span class="hlt">simulations</span> based on the exact algorithmic rules.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/919137','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/919137"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> design : mathematical formulation, version 2.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.</p> <p>2004-06-01</p> <p>This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively <span class="hlt">parallel</span> SPICE-style circuit <span class="hlt">simulator</span> developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit <span class="hlt">simulation</span>, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit <span class="hlt">simulation</span>, such as voltage limiting, are also described in detail.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920260','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920260"><span id="translatedtitle"><span class="hlt">Parallel</span> Beam Dynamics <span class="hlt">Simulation</span> Tools for Future Light SourceLinac Modeling</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Qiang, Ji; Pogorelov, Ilya v.; Ryne, Robert D.</p> <p>2007-06-25</p> <p>Large-scale modeling on <span class="hlt">parallel</span> computers is playing an increasingly important role in the design of future light sources. Such modeling provides a means to accurately and efficiently explore issues such as limits to beam brightness, emittance preservation, the growth of instabilities, etc. Recently the IMPACT codes suite was enhanced to be applicable to future light source design. <span class="hlt">Simulations</span> with IMPACT-Z were performed using up to one billion <span class="hlt">simulation</span> particles for the main linac of a future light source to study the microbunching instability. Combined with the time domain code IMPACT-T, it is now possible to perform large-scale start-to-end linac <span class="hlt">simulations</span> for future light sources, including the injector, main linac, chicanes, and transfer lines. In this paper we provide an overview of the IMPACT code suite, its key capabilities, and recent enhancements pertinent to accelerator modeling for future linac-based light sources.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003CoPhC.155..159A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003CoPhC.155..159A"><span id="translatedtitle">FLY. A <span class="hlt">parallel</span> tree N-body code for cosmological <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.</p> <p>2003-10-01</p> <p>FLY is a <span class="hlt">parallel</span> treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: <span class="hlt">parallel</span> program. The user can select the number of processors >=1 Has the code been vectorized or <span class="hlt">parallelized</span>?: <span class="hlt">parallelized</span> Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: <span class="hlt">Parallel</span> tree N-body code for cosmological <span class="hlt">simulations</span> Nature of physical problem: FLY is a <span class="hlt">parallel</span> collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles <span class="hlt">simulation</span> on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JMagR.265....1C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JMagR.265....1C"><span id="translatedtitle"><span class="hlt">Direct</span> NOE <span class="hlt">simulation</span> from long MD trajectories</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Chalmers, G.; Glushka, J. N.; Foley, B. L.; Woods, R. J.; Prestegard, J. H.</p> <p>2016-04-01</p> <p>A software package, MD2NOE, is presented which calculates Nuclear Overhauser Effect (NOE) build-up curves <span class="hlt">directly</span> from molecular dynamics (MD) trajectories. It differs from traditional approaches in that it calculates correlation functions <span class="hlt">directly</span> from the trajectory instead of extracting inverse sixth power distance terms as an intermediate step in calculating NOEs. This is particularly important for molecules that sample conformational states on a timescale similar to molecular reorientation. The package is tested on sucrose and results are shown to differ in small but significant ways from those calculated using an inverse sixth power assumption. Results are also compared to experiment and found to be in reasonable agreement despite an expected underestimation of water viscosity by the water model selected.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24146009','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24146009"><span id="translatedtitle"><span class="hlt">Direct</span> numerical <span class="hlt">simulation</span> of turbulent mixing.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Statsenko, V P; Yanilkin, Yu V; Zhmaylo, V A</p> <p>2013-11-28</p> <p>The results of three-dimensional numerical <span class="hlt">simulations</span> of turbulent flows obtained by various authors are reviewed. The paper considers the turbulent mixing (TM) process caused by the development of the main types of instabilities: those due to gravitation (with either a fixed or an alternating-sign acceleration), shift and shock waves. The problem of a buoyant jet is described as an example of the mixed-type problem. Comparison is made with experimental data on the TM zone width, profiles of density, velocity and turbulent energy and degree of homogeneity. PMID:24146009</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016GeoJI.204...94K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016GeoJI.204...94K"><span id="translatedtitle">3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and <span class="hlt">direct</span> solvers <span class="hlt">parallelized</span> on symmetric multiprocessor computers - Part II: <span class="hlt">direct</span> data-space inverse solution</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.</p> <p>2016-01-01</p> <p>Following the creation described in Part I of a deformable edge finite-element <span class="hlt">simulator</span> for 3-D magnetotelluric (MT) responses using <span class="hlt">direct</span> solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. <span class="hlt">Direct</span> solvers <span class="hlt">parallelized</span> on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward <span class="hlt">simulation</span>. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21511583','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21511583"><span id="translatedtitle"><span class="hlt">Direct</span> <span class="hlt">Simulation</span> of Shock Layer Plasmas</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Farbar, E. D.; Boyd, I. D.</p> <p>2011-05-20</p> <p>Approximate models of the electric field used with the DSMC method all impose quasi-neutrality everywhere in the shock layer plasma. The shortcomings of these models are examined in this study by <span class="hlt">simulating</span> a weak shock layer plasma with a coupled DSMC-Particle-In-Cell (PIC) method. The stagnation streamline of an axisymmetric shock layer is <span class="hlt">simulated</span> for entry velocities in air that correspond to both lunar and Mars return trajectories. The atmospheric densities, particle diameters and chemical reaction rates are varied from the actual values to make the computations tractable while retaining the mean free path of air at 85 km altitude. In contrast to DSMC flow field predictions, regions of non-neutrality are predicted by the DSMC-PIC method, and the electrons are predicted to be isothermal. Perhaps the most important result of this study is that the DSMC-PIC results at both reentry energies yield a 14% increase in heat flux to the vehicle surface relative to the DSMC results. Rather unintuitively, this is mostly due to an increase in ion flux to the surface, rather than the potential energy gained by each ion as it traverses the plasma sheath. In this study, an approximate electric field model is presented, with the goal of accounting for this heat flux augmentation without the need for a computationally expensive DSMC-PIC calculation of the entire flow-field. Convective heat flux results obtained with new electric field model are compared to results from the rigorous DSMC-PIC calculations.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4824128','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4824128"><span id="translatedtitle">BioFVM: an efficient, <span class="hlt">parallelized</span> diffusive transport solver for 3-D biological <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul</p> <p>2016-01-01</p> <p>Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can <span class="hlt">simulate</span> release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been <span class="hlt">parallelized</span> with OpenMP, allowing efficient <span class="hlt">simulations</span> on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger <span class="hlt">simulator</span>. Availability and implementation: BioFVM is written in C ++ with <span class="hlt">parallelization</span> in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011AGUFMNG51D1672H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011AGUFMNG51D1672H"><span id="translatedtitle">Using Speculative Execution to Reduce Communication in a <span class="hlt">Parallel</span> Large Scale Earthquake <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Heien, E. M.; Yikilmaz, M. B.; Sachs, M. K.; Rundle, J. B.; Turcotte, D. L.; Kellogg, L. H.</p> <p>2011-12-01</p> <p>Earthquake <span class="hlt">simulations</span> on <span class="hlt">parallel</span> systems can be communication intensive due to local events (rupture waves) which have global effects (stress transfer). These events require global communication to transmit the effects of increased stress to model elements on other computing nodes. We describe a method of using speculative execution in a large scale <span class="hlt">parallel</span> computation to decrease communication and improve <span class="hlt">simulation</span> speed. This method exploits the tendency of earthquake ruptures to remain physically localized even though their effects on stress will be over long ranges. In this method we assume the stress transfer caused by a rupture remains localized and avoid global communication until the rupture has a high probability of passing to another node. We then calculate the stress state of the system to ensure that the rupture in fact remained localized, proceeding if the assumption was correct or rolling back the calculation otherwise. Using this method we are able to reduce communication frequency by 78% percent, in turn decreasing communication time by up to 66% and improving <span class="hlt">simulation</span> speed by up to 45%.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20110010844','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20110010844"><span id="translatedtitle">A Computer <span class="hlt">Simulation</span> of the System-Wide Effects of <span class="hlt">Parallel</span>-Offset Route Maneuvers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl</p> <p>2010-01-01</p> <p>Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying <span class="hlt">parallel</span>-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing <span class="hlt">parallel</span>-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and <span class="hlt">directly</span> compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23600445','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23600445"><span id="translatedtitle">Accelerating groundwater flow <span class="hlt">simulation</span> in MODFLOW using JASMIN-based <span class="hlt">parallel</span> computing.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Cheng, Tangpei; Mo, Zeyao; Shao, Jingli</p> <p>2014-01-01</p> <p>To accelerate the groundwater flow <span class="hlt">simulation</span> process, this paper reports our work on developing an efficient <span class="hlt">parallel</span> <span class="hlt">simulator</span> through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and <span class="hlt">parallel</span> algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is <span class="hlt">simulating</span> flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine times faster than the GMG-based MODFLOW program. The third test is a simplified transient flow problem with the order of tens of millions of cells to examine the scalability. Compared to 32 cores, <span class="hlt">parallel</span> efficiency of 77 and 68% are obtained on 512 and 1024 cores, respectively, which indicates impressive scalability. PMID:23600445</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_19 --> <div id="page_20" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="381"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014EL....10656005H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014EL....10656005H"><span id="translatedtitle"><span class="hlt">Direct</span> <span class="hlt">simulation</span> of critical Casimir forces</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hobrecht, Hendrik; Hucht, Alfred</p> <p>2014-06-01</p> <p>We present a new Monte Carlo method to calculate Casimir forces acting on objects in a near-critical fluid, considering the two basic cases of a wall and a sphere embedded in a two-dimensional Ising medium. During the <span class="hlt">simulation</span>, the objects are moved through the system with appropriate statistical weights, and consequently are attracted or repelled from the system boundaries depending on the boundary conditions. The distribution function of the object position is utilized to obtain the residual free energy, or Casimir potential, of the configuration as well as the corresponding Casimir force. The results are in perfect agreement with known exact results. The method can easily be generalized to more complicated geometries, to higher dimensions, and also to colloidal suspensions with many particles.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=20020010587&hterms=Weeratunga&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DWeeratunga','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=20020010587&hterms=Weeratunga&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DWeeratunga"><span id="translatedtitle"><span class="hlt">Simulation</span> of Unsteady Combustion in a Ramjet Engine Using a Highly <span class="hlt">Parallel</span> Computer</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Menon, Suresh; Weeratunga, Sisira; Cooper, D. M. (Technical Monitor)</p> <p>1994-01-01</p> <p>Combustion instability in ramjets is a complex phenomenon that involve nonlinear interaction between acoustic waves, vortex motion and unsteady heat release in the combustor. To numerically <span class="hlt">simulate</span> this 3-D, transient phenomenon, enormous computer resources (time, memory and disk storage) are required. Although current generation vector supercomputers are capable of providing adequate resources for <span class="hlt">simulations</span> of this nature, their high cost and limited availability, makes such machines less than satisfactory for routine use. The primary focus of this study is to assess the feasibility of using highly <span class="hlt">parallel</span> computer systems as a cost-effective alternative for conducting such unsteady flow <span class="hlt">simulations</span>. Towards this end, a large-eddy <span class="hlt">simulation</span> model for combustion instability was implemented on the Intel iPSC/860 and a careful study was conducted to determine the benefits and the problems associated with the use of such machines for transient <span class="hlt">simulations</span>. Details of this study along with the results obtained from the unsteady combustion <span class="hlt">simulations</span> carried out on the iPSC/860 are discussed in this paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20150000244','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20150000244"><span id="translatedtitle">LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Approach</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon</p> <p>2014-01-01</p> <p>This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers <span class="hlt">directed</span> by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our <span class="hlt">simulation</span> approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the <span class="hlt">simulation</span>. The conjunctions that exceed a threshold probability of collision are then engaged by a <span class="hlt">simulated</span> network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new <span class="hlt">simulations</span> with three updated aspects: 1) By utilizing a highly <span class="hlt">parallel</span> <span class="hlt">simulation</span> approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The <span class="hlt">simulation</span> time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new <span class="hlt">simulation</span> approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our <span class="hlt">simulation</span> approach to <span class="hlt">parallelize</span> the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..DPPUP8047C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..DPPUP8047C"><span id="translatedtitle">Enhancements, <span class="hlt">Parallelization</span> and Future <span class="hlt">Directions</span> of the V3FIT 3-D Equilibrium Reconstruction Code</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cianciosa, M. R.; Hanson, J. D.; Maurer, D. A.; Hartwell, G. J.; Archmiller, M. C.; Ma, X.; Herfindal, J.</p> <p>2014-10-01</p> <p>Three-dimensional equilibrium reconstruction is spreading beyond its original application to stellarators. Three-dimensional effects in nominally axisymmetric systems, including quasi-helical states in reversed field pinches and error fields in tokamaks, are becoming increasingly important. V3FIT is a fully three dimensional equilibrium reconstruction code in widespread use throughout the fusion community. The code has recently undergone extensive revision to prepare for the next generation of equilibrium reconstruction problems. The most notable changes are the abstraction of the equilibrium model, the propagation of experimental errors to the reconstructed results, support for multicolor soft x-ray emissivity cameras, and recent efforts to add <span class="hlt">parallelization</span> for efficient computation on multi-processor system. Work presented will contain discussions on these new capabilities. We will compare probability distributions of reconstructed parameters with results from whole shot reconstructions. We will show benchmarking and profiling results of initial performance improvements through the addition of OpenMP and MPI support. We will discuss future <span class="hlt">directions</span> of the V3FIT code including steps taken for support of the W-7X stellarator. Work supported by US. Department of Energy Grant No. DEFG-0203-ER-54692B.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ApSS..290..301Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ApSS..290..301Z"><span id="translatedtitle"><span class="hlt">Direct</span> nanoimprinting of single crystalline gold: Experiments and dislocation dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhang, J.; Zhang, Y.; Mara, N. A.; Lou, J.; Nicola, L.</p> <p>2014-01-01</p> <p>This paper addresses the feasibility of <span class="hlt">direct</span> nanoimprinting and highlights the challenges involved in this technique. Our study focuses on experimental work supported by dislocation dynamics <span class="hlt">simulations</span>. A gold single crystal is imprinted by a tungsten indenter patterned with <span class="hlt">parallel</span> lines of various spacings. Dedicated dislocation dynamics <span class="hlt">simulations</span> give insight in the plastic deformation occurring into the crystal during imprinting. We find that good pattern transfer is achieved when the lines are sufficiently spaced such that dislocation activity can be effective in assisting deformation of the region underneath each line. Yet, the edges of the obtained imprints are not smooth, partly due to dislocation glide.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19900007132','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19900007132"><span id="translatedtitle">Stochastic <span class="hlt">simulation</span> of charged particle transport on the massively <span class="hlt">parallel</span> processor</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Earl, James A.</p> <p>1988-01-01</p> <p>Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively <span class="hlt">Parallel</span> Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These <span class="hlt">simulations</span> have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001APS..DPPKP1112L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001APS..DPPKP1112L"><span id="translatedtitle"><span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> of Short-Pulse High Intensity Laser Plasma Interactions.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lasinski, B. F.; Still, C. H.; Langdon, A. B.</p> <p>2001-10-01</p> <p>We extend our previous <span class="hlt">simulations</span> of high intensity short pulse laser plasma interactions footnote B. F. Lasinski, A. B. Langdon, S. P. Hatchett, M. H. Key, and M. Tabak, Phys. Plasmas 6, 2041 (1999); S. C. Wilks and W. L. Kruer, IEEE Journal of Quantum Electronics 11, 1954 (1997). to 3D and to much larger systems in 2D using our new, modern, 3D, electromagnetic, fully relativistic, massively <span class="hlt">parallel</span> PIC code. We study the generation of hot electrons and energetic ions and the associated complex phenomena. Laser light filamentation and the formation of high static magnetic fields are described.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1028177','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1028177"><span id="translatedtitle">A <span class="hlt">parallel</span> multigrid preconditioner for the <span class="hlt">simulation</span> of large fracture networks</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sampath, Rahul S; Barai, Pallab; Nukala, Phani K</p> <p>2010-01-01</p> <p>Computational modeling of a fracture in disordered materials using discrete lattice models requires the solution of a linear system of equations every time a new lattice bond is broken. Solving these linear systems of equations successively is the most expensive part of fracture <span class="hlt">simulations</span> using large three-dimensional networks. In this paper, we present a <span class="hlt">parallel</span> multigrid preconditioned conjugate gradient algorithm to solve these linear systems. Numerical experiments demonstrate that this algorithm performs significantly better than the algorithms previously used to solve this problem.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1035294','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1035294"><span id="translatedtitle">Understanding Performance of <span class="hlt">Parallel</span> Scientific <span class="hlt">Simulation</span> Codes using Open|SpeedShop</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ghosh, K K</p> <p>2011-11-07</p> <p>Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, <span class="hlt">parallel</span>, scientific <span class="hlt">simulation</span> codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/5574659','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/5574659"><span id="translatedtitle">Forced-convection boiling tests performed in <span class="hlt">parallel</span> <span class="hlt">simulated</span> LMR fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Rose, S.D.; Carbajo, J.J.; Levin, A.E.; Lloyd, D.B.; Montgomery, B.H.; Wantland, J.L.</p> <p>1985-04-21</p> <p>Forced-convection tests have been carried out using <span class="hlt">parallel</span> <span class="hlt">simulated</span> Liquid Metal Reactor fuel assemblies in an engineering-scale sodium loop, the Thermal-Hydraulic Out-of-Reactor Safety facility. The tests, performed under single- and two-phase conditions, have shown that for low forced-convection flow there is significant flow augmentation by thermal convection, an important phenomenon under degraded shutdown heat removal conditions in an LMR. The power and flows required for boiling and dryout to occur are much higher than decay heat levels. The experimental evidence supports analytical results that heat removal from an LMR is possible with a degraded shutdown heat removal system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/427933','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/427933"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span> and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, L.</p> <p>1995-12-31</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed and its operating modes for 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using constant current model and Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of 1 kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500 W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging PF mode for the entire load range and requires a narrow variation in switching frequency.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015CoPhC.188...21J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015CoPhC.188...21J"><span id="translatedtitle">Numerical approach to the <span class="hlt">parallel</span> gradient operator in tokamak scrape-off layer turbulence <span class="hlt">simulations</span> and application to the GBS code</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jolliet, S.; Halpern, F. D.; Loizu, J.; Mosetto, A.; Riva, F.; Ricci, P.</p> <p>2015-03-01</p> <p>This paper presents two discretisation schemes for the <span class="hlt">parallel</span> gradient operator used in scrape-off layer plasma turbulence <span class="hlt">simulations</span>. First, a simple model describing the propagation of electrostatic shear-Alfvén waves, and retaining the key elements of the <span class="hlt">parallel</span> dynamics, is used to test the accuracy of the different schemes against analytical predictions. The most promising scheme is then tested in <span class="hlt">simulations</span> of limited scrape-off layer turbulence with the flux-driven 3D fluid code GBS (Ricci et al., 2012): the new approach is successfully benchmarked against the original <span class="hlt">parallel</span> gradient discretisation implemented in GBS. Finally, GBS <span class="hlt">simulations</span> using a radially varying safety profile, which were inapplicable with the original scheme are carried out for the first time: the well-known stabilisation of resistive ballooning modes at negative magnetic shear is recovered. The main conclusion of this paper is that a simple approach to the <span class="hlt">parallel</span> gradient, namely centred finite differences in the poloidal and toroidal <span class="hlt">direction</span>, is able to <span class="hlt">simulate</span> scrape-off layer turbulence provided that a higher resolution and higher convergence order are used.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577"><span id="translatedtitle">Evaluating the performance of <span class="hlt">parallel</span> subsurface <span class="hlt">simulators</span>: An illustrative example with PFLOTRAN</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Hammond, G E; Lichtner, P C; Mills, R T</p> <p>2014-01-01</p> <p>[1] To better inform the subsurface scientist on the expected performance of <span class="hlt">parallel</span> <span class="hlt">simulators</span>, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's <span class="hlt">parallel</span> layout and code design, PFLOTRAN's <span class="hlt">parallel</span> performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2005SPIE.6019..523L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2005SPIE.6019..523L"><span id="translatedtitle"><span class="hlt">Simulation</span> of optical devices using <span class="hlt">parallel</span> finite-difference time-domain method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Li, Kang; Kong, Fanmin; Mei, Liangmo; Liu, Xin</p> <p>2005-11-01</p> <p>This paper presents a new <span class="hlt">parallel</span> finite-difference time-domain (FDTD) numerical method in a low-cost network environment to stimulate optical waveguide characteristics. The PC motherboard based cluster is used, as it is relatively low-cost, reliable and has high computing performance. Four clusters are networked by fast Ethernet technology. Due to the simplicity nature of FDTD algorithm, a native Ethernet packet communication mechanism is used to reduce the overhead of the communication between the adjacent clusters. To validate the method, a microcavity ring resonator based on semiconductor waveguides is chosen as an instance of FDTD <span class="hlt">parallel</span> computation. Speed-up rate under different division density is calculated. From the result we can conclude that when the decomposing size reaches a certain point, a good <span class="hlt">parallel</span> computing speed up will be maintained. This <span class="hlt">simulation</span> shows that through the overlapping of computation and communication method and controlling the decomposing size, the overhead of the communication of the shared data will be conquered. The result indicates that the implementation can achieve significant speed up for the FDTD algorithm. This will enable us to tackle the larger real electromagnetic problem by the low-cost PC clusters.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=20010061417&hterms=Automobiles&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DAutomobiles','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=20010061417&hterms=Automobiles&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DAutomobiles"><span id="translatedtitle"><span class="hlt">Direct</span> Numerical <span class="hlt">Simulation</span> of Automobile Cavity Tones</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kurbatskii, Konstantin; Tam, Christopher K. W.</p> <p>2000-01-01</p> <p>The Navier Stokes equation is solved computationally by the Dispersion-Relation-Preserving (DRP) scheme for the flow and acoustic fields associated with a laminar boundary layer flow over an automobile door cavity. In this work, the flow Reynolds number is restricted to R(sub delta*) < 3400; the range of Reynolds number for which laminar flow may be maintained. This investigation focuses on two aspects of the problem, namely, the effect of boundary layer thickness on the cavity tone frequency and intensity and the effect of the size of the computation domain on the accuracy of the numerical <span class="hlt">simulation</span>. It is found that the tone frequency decreases with an increase in boundary layer thickness. When the boundary layer is thicker than a certain critical value, depending on the flow speed, no tone is emitted by the cavity. Computationally, solutions of aeroacoustics problems are known to be sensitive to the size of the computation domain. Numerical experiments indicate that the use of a small domain could result in normal mode type acoustic oscillations in the entire computation domain leading to an increase in tone frequency and intensity. When the computation domain is expanded so that the boundaries are at least one wavelength away from the noise source, the computed tone frequency and intensity are found to be computation domain size independent.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015AIPC.1702m0002P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015AIPC.1702m0002P"><span id="translatedtitle">Unravelling how βCaMKII controls the <span class="hlt">direction</span> of plasticity at <span class="hlt">parallel</span> fibre-Purkinje cell synapses</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Pinto, Thiago M.; Schilstra, Maria J.; Steuber, Volker; Roque, Antonio C.</p> <p>2015-12-01</p> <p>Long-term plasticity at <span class="hlt">parallel</span> fibre (PF)-Purkinje cell (PC) synapses is thought to mediate cerebellar motor learning. It is known that calcium-calmodulin dependent protein kinase II (CaMKII) is essential for plasticity in the cerebellum. Recently, Van Woerden et al. demonstrated that the β isoform of CaMKII regulates the bidirectional inversion of PF-PC plasticity. Because the cellular events that underlie these experimental findings are still poorly understood, our work aims at unravelling how β CaMKII controls the <span class="hlt">direction</span> of plasticity at PF-PC synapses. We developed a bidirectional plasticity model that replicates the experimental observations by Van Woerden et al. <span class="hlt">Simulation</span> results obtained from this model indicate the mechanisms that underlie the bidirectional inversion of cerebellar plasticity. As suggested by Van Woerden et al., the filamentous actin binding enables β CaMKII to regulate the bidirectional plasticity at PF-PC synapses. Our model suggests that the reversal of long-term plasticity in PCs is based on a combination of mechanisms that occur at different calcium concentrations.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23320725','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23320725"><span id="translatedtitle">A coarse-grained model for DNA-functionalized spherical colloids, revisited: effective pair potential from <span class="hlt">parallel</span> replica <span class="hlt">simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Theodorakis, Panagiotis E; Dellago, Christoph; Kahl, Gerhard</p> <p>2013-01-14</p> <p>We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006)] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong <span class="hlt">directionality</span> of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a <span class="hlt">parallel</span> replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard <span class="hlt">simulations</span> in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings. PMID:23320725</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22089677','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22089677"><span id="translatedtitle">A <span class="hlt">PARALLEL</span> MONTE CARLO CODE FOR <span class="hlt">SIMULATING</span> COLLISIONAL N-BODY SYSTEMS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.</p> <p>2013-02-15</p> <p>We present a new <span class="hlt">parallel</span> code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a <span class="hlt">parallel</span> random number generation scheme as well as a <span class="hlt">parallel</span> sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all <span class="hlt">simulations</span>. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the <span class="hlt">parallel</span> sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013JCoPh.247..153N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013JCoPh.247..153N"><span id="translatedtitle">GPU-accelerated Classical Trajectory Calculation <span class="hlt">Direct</span> <span class="hlt">Simulation</span> Monte Carlo applied to shock waves</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Norman, Paul; Valentini, Paolo; Schwartzentruber, Thomas</p> <p>2013-08-01</p> <p>In this work we outline a Classical Trajectory Calculation <span class="hlt">Direct</span> <span class="hlt">Simulation</span> Monte Carlo (CTC-DSMC) implementation that uses the no-time-counter scheme with a cross-section determined by the interatomic potential energy surface (PES). CTC-DSMC solutions for translational and rotational relaxation in one-dimensional shock waves are compared <span class="hlt">directly</span> to pure Molecular Dynamics <span class="hlt">simulations</span> employing an identical PES, where exact agreement is demonstrated for all cases. For the flows considered, long-lived collisions occur within the <span class="hlt">simulations</span> and their implications for multi-body collisions as well as algorithm implications for the CTC-DSMC method are discussed. A <span class="hlt">parallelization</span> technique for CTC-DSMC <span class="hlt">simulations</span> using a heterogeneous multicore CPU/GPU system is demonstrated. Our approach shows good scaling as long as a sufficiently large number of collisions are calculated simultaneously per GPU (˜100,000) at each DSMC iteration. We achieve a maximum speedup of 140× on a 4 GPU/CPU system vs. the performance on one CPU core in serial for a diatomic nitrogen shock. The <span class="hlt">parallelization</span> approach presented here significantly reduces the cost of CTC-DSMC <span class="hlt">simulations</span> and has the potential to scale to large CPU/GPU clusters, which could enable future application to 3D flows in strong thermochemical nonequilibrium.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015A%26C....12..109H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015A%26C....12..109H"><span id="translatedtitle">L-PICOLA: A <span class="hlt">parallel</span> code for fast dark matter <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Howlett, C.; Manera, M.; Percival, W. J.</p> <p>2015-09-01</p> <p>Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-<span class="hlt">parallel</span> code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body <span class="hlt">simulation</span>. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the <span class="hlt">simulation</span> and <span class="hlt">simulate</span> the past lightcone at run-time, with optional replication of the <span class="hlt">simulation</span> volume. Through comparisons to fully non-linear N-Body <span class="hlt">simulations</span> we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_20 --> <div id="page_21" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="401"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/18334421','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/18334421"><span id="translatedtitle">Gait <span class="hlt">simulation</span> via a 6-DOF <span class="hlt">parallel</span> robot with iterative learning control.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Aubin, Patrick M; Cowley, Matthew S; Ledoux, William R</p> <p>2008-03-01</p> <p>We have developed a robotic gait <span class="hlt">simulator</span> (RGS) by leveraging a 6-degree of freedom <span class="hlt">parallel</span> robot, with the goal of overcoming three significant challenges of gait <span class="hlt">simulation</span>, including: 1) operating at near physiologically correct velocities; 2) inputting full scale ground reaction forces; and 3) <span class="hlt">simulating</span> motion in all three planes (sagittal, coronal and transverse). The robot will eventually be employed with cadaveric specimens, but as a means of exploring the capability of the system, we have first used it with a prosthetic foot. Gait data were recorded from one transtibial amputee using a motion analysis system and force plate. Using the same prosthetic foot as the subject, the RGS accurately reproduced the recorded kinematics and kinetics and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller. After six gait iterations the controller reduced the root mean square (RMS) error between the <span class="hlt">simulated</span> and in situ; vertical ground reaction force to 35 N during a 1.5 s <span class="hlt">simulation</span> of the stance phase of gait with a prosthetic foot. This paper addresses the design, methodology and validation of the novel RGS. PMID:18334421</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777"><span id="translatedtitle">PCSIM: A <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Environment for Neural Circuits Fully Integrated with Python</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus</p> <p>2008-01-01</p> <p>The <span class="hlt">Parallel</span> Circuit <span class="hlt">SIMulator</span> (PCSIM) is a software package for <span class="hlt">simulation</span> of neural circuits. It is primarily designed for distributed <span class="hlt">simulation</span> of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit <span class="hlt">simulator</span> with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural <span class="hlt">simulations</span>. PMID:19543450</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1093069','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1093069"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive fluid-structure interaction <span class="hlt">simulation</span> of explosions impacting on building structures</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Deiterding, Ralf; Wood, Stephen L</p> <p>2013-01-01</p> <p>We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside <span class="hlt">simulations</span> verifying the coupled fluid-structure solver and assessing its <span class="hlt">parallel</span> scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the <span class="hlt">simulation</span> of a prototypical blast explosion in a realistic multistory building are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20140009920','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20140009920"><span id="translatedtitle"><span class="hlt">Simulation</span>/Emulation Techniques: Compressing Schedules With <span class="hlt">Parallel</span> (HW/SW) Development</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Mangieri, Mark L.; Hoang, June</p> <p>2014-01-01</p> <p>NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of <span class="hlt">simulators</span> and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of <span class="hlt">simulators</span> and emulators in advance of deliverable hardware to achieve <span class="hlt">parallel</span> design and development on a compressed schedule.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015SPIE.9424E..0JB','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015SPIE.9424E..0JB"><span id="translatedtitle"><span class="hlt">Simulating</span> massively <span class="hlt">parallel</span> electron beam inspection for sub-20 nm defects</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt</p> <p>2015-03-01</p> <p>SEMATECH has initiated a program to develop massively-<span class="hlt">parallel</span> electron beam defect inspection (MPEBI). Here we use JMONSEL <span class="hlt">simulations</span> to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these <span class="hlt">simulated</span> images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1096496','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1096496"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> users%3CU%2B2019%3E guide, version 6.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.</p> <p>2013-08-01</p> <p>This manual describes the use of the Xyce <span class="hlt">Parallel</span> Electronic <span class="hlt">Simulator</span>. Xyce has been designed as a SPICE-compatible, high-performance analog circuit <span class="hlt">simulator</span>, and has been written to support the <span class="hlt">simulation</span> needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale <span class="hlt">parallel</span> computing platforms (up to thousands of processors). This includes support for most popular <span class="hlt">parallel</span> and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a <span class="hlt">parallel</span> code in the most general sense of the phrase - a message passing <span class="hlt">parallel</span> implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory <span class="hlt">parallel</span> platforms. Attention has been paid to the specific nature of circuit-<span class="hlt">simulation</span> problems to ensure that optimal <span class="hlt">parallel</span> efficiency is achieved as the number of processors grows.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1194329','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1194329"><span id="translatedtitle">A Many-Task <span class="hlt">Parallel</span> Approach for Multiscale <span class="hlt">Simulations</span> of Subsurface Flow and Reactive Transport</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.</p> <p>2014-12-16</p> <p>Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall <span class="hlt">simulation</span> domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in <span class="hlt">parallel</span>; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo <span class="hlt">simulations</span> of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing <span class="hlt">parallel</span> continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale <span class="hlt">simulations</span>, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface <span class="hlt">simulations</span> with minimal overhead.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4755232','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4755232"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan</p> <p>2015-01-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016APS..MAR.T1021D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016APS..MAR.T1021D"><span id="translatedtitle">Optimized <span class="hlt">simulations</span> of Olami-Feder-Christensen systems using <span class="hlt">parallel</span> algorithms</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Dominguez, Rachele; Necaise, Rance; Montag, Eric</p> <p></p> <p>The sequential nature of the Olami-Feder-Christensen (OFC) model for earthquake <span class="hlt">simulations</span> limits the benefits of <span class="hlt">parallel</span> computing approaches because of the frequent communication required between processors. We developed a <span class="hlt">parallel</span> version of the OFC algorithm for multi-core processors. Our data, even for relatively small system sizes and low numbers of processors, indicates that increasing the number of processors provides significantly faster <span class="hlt">simulations</span>; producing more efficient results than previous attempts that used network-based Beowulf clusters. Our algorithm optimizes performance by exploiting the multi-core processor architecture, minimizing communication time in contrast to the networked Beowulf-cluster approaches. Our multi-core algorithm is the basis for a new algorithm using GPUs that will drastically increase the number of processors available. Previous studies incorporating realistic structural features of faults into OFC models have revealed spatial and temporal patterns observed in real earthquake systems. The computational advances presented here will allow for studying interacting networks of faults, rather than individual faults, further enhancing our understanding of the relationship between the earth's structure and the triggering process. Support for this project comes from the Chenery Research Fund, the Rashkind Family Endowment, the Walter Williams Craigie Teaching Endowment, and the Schapiro Undergraduate Research Fellowship.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26123630','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26123630"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C</p> <p>2015-08-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/16851481','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/16851481"><span id="translatedtitle">On the efficiency of exchange in <span class="hlt">parallel</span> tempering monte carlo <span class="hlt">simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Predescu, Cristian; Predescu, Mihaela; Ciobanu, Cristian V</p> <p>2005-03-10</p> <p>We introduce the concept of effective fraction, defined as the expected probability that a configuration from the lowest index replica successfully reaches the highest index replica during a replica exchange Monte Carlo <span class="hlt">simulation</span>. We then argue that the effective fraction represents an adequate measure of the quality of the sampling technique, as far as swapping is concerned. Under the hypothesis that the correlation between successive exchanges is negligible, we propose a technique for the computation of the effective fraction, a technique that relies solely on the values of the acceptance probabilities obtained at the end of the <span class="hlt">simulation</span>. The effective fraction is then utilized for the study of the efficiency of a popular swapping scheme in the context of <span class="hlt">parallel</span> tempering in the canonical ensemble. For large dimensional oscillators, we show that the swapping probability that minimizes the computational effort is 38.74%. By studying the <span class="hlt">parallel</span> tempering swapping efficiency for a 13-atom Lennard-Jones cluster, we argue that the value of 38.74% remains roughly the optimal probability for most systems with continuous distributions that are likely to be encountered in practice. PMID:16851481</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W"><span id="translatedtitle">MDSLB: A new static load balancing method for <span class="hlt">parallel</span> molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang</p> <p>2014-02-01</p> <p>Large-scale <span class="hlt">parallelization</span> of molecular dynamics <span class="hlt">simulations</span> is facing challenges which seriously affect the <span class="hlt">simulation</span> efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in <span class="hlt">parallel</span>, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.204..107B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.204..107B"><span id="translatedtitle">A scalable <span class="hlt">parallel</span> Stokesian Dynamics method for the <span class="hlt">simulation</span> of colloidal suspensions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bülow, F.; Hamberger, P.; Nirschl, H.; Dörfler, W.</p> <p>2016-07-01</p> <p>We have developed a new method for the efficient numerical <span class="hlt">simulation</span> of colloidal suspensions. This method is designed and especially well-suited for <span class="hlt">parallel</span> code execution, but it can also be applied to single-core programs. It combines the Stokesian Dynamics method with a variant of the widely used Barnes-Hut algorithm in order to reduce computational costs. This combination and the inherent <span class="hlt">parallelization</span> of the method make <span class="hlt">simulations</span> of large numbers of particles within days possible. The level of accuracy can be determined by the user and is limited by the truncation of the used multipole expansion. Compared to the original Stokesian Dynamics method the complexity can be reduced from O(N2) to linear complexity for dilute suspensions of strongly clustered particles, N being the number of particles. In case of non-clustered particles in a dense suspension, the complexity depends on the particle configuration and is between O(N) and O(Pnp,max2) , where P is the number of used processes and np,max = ⌈ N / P ⌉ , respectively.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24732497','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24732497"><span id="translatedtitle">pWeb: A High-Performance, <span class="hlt">Parallel</span>-Computing Framework for Web-Browser-Based Medical <span class="hlt">Simulation</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Halic, Tansel; Ahn, Woojin; De, Suvranu</p> <p>2014-01-01</p> <p>This work presents a pWeb - a new language and compiler for <span class="hlt">parallelization</span> of client-side compute intensive web applications such as surgical <span class="hlt">simulations</span>. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical <span class="hlt">simulations</span> and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of <span class="hlt">parallel</span> programming languages as well as the fork/join <span class="hlt">parallel</span> model which is not supported by web workers. The language compiler automatically generates an equivalent <span class="hlt">parallel</span> script that complies with the HTML5 standard. A case study on realistic rendering for surgical <span class="hlt">simulations</span> demonstrates enhanced performance with a compact set of instructions. PMID:24732497</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19740017560','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19740017560"><span id="translatedtitle">Channel <span class="hlt">simulation</span> for <span class="hlt">direct</span>-detection optical communication systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Tycz, M.; Fitzmaurice, M. W.</p> <p>1974-01-01</p> <p>A technique is described for <span class="hlt">simulating</span> the random modulation imposed by atmospheric scintillation and transmitter pointing jitter on a <span class="hlt">direct</span>-detection optical communication system. The system is capable of providing signal fading statistics which obey log-normal, beta, Rayleigh, Ricean, or chi-square density functions. Experimental tests of the performance of the channel <span class="hlt">simulator</span> are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19740008077','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19740008077"><span id="translatedtitle">Channel <span class="hlt">simulation</span> for <span class="hlt">direct</span> detection optical communication systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Tycz, M.; Fitzmaurice, M. W.</p> <p>1974-01-01</p> <p>A technique is described for <span class="hlt">simulating</span> the random modulation imposed by atmospheric scintillation and transmitter pointing jitter on a <span class="hlt">direct</span> detection optical communication system. The system is capable of providing signal fading statistics which obey log normal, beta, Rayleigh, Ricean or chi-squared density functions. Experimental tests of the performance of the Channel <span class="hlt">Simulator</span> are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016APS..MARP22005J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016APS..MARP22005J"><span id="translatedtitle">Accurate, efficient, and scalable <span class="hlt">parallel</span> <span class="hlt">simulation</span> of mesoscale electrostatic/magnetostatic problems accelerated by a fast multipole method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jiang, Xikai; Karpeev, Dmitry; Li, Jiyuan; de Pablo, Juan; Hernandez-Ortiz, Juan; Heinonen, Olle</p> <p></p> <p>Boundary integrals arise in many electrostatic and magnetostatic problems. In computational modeling of these problems, although the integral is performed only on the boundary of a domain, its <span class="hlt">direct</span> evaluation needs O(N2) operations, where N is number of unknowns on the boundary. The O(N2) scaling impedes a wider usage of the boundary integral method in scientific and engineering communities. We have developed a <span class="hlt">parallel</span> computational approach that utilize the Fast Multipole Method to evaluate the boundary integral in O(N) operations. To demonstrate the accuracy, efficiency, and scalability of our approach, we consider two test cases. In the first case, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space using a hybrid finite element-boundary integral method. In the second case, we solve an electrostatic problem involving the polarization of dielectric objects in free space using the boundary element method. The results from test cases show that our <span class="hlt">parallel</span> approach can enable highly efficient and accurate <span class="hlt">simulations</span> of mesoscale electrostatic/magnetostatic problems. Computing resources was provided by Blues, a high-performance cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory. Work at Argonne was supported by U. S. DOE, Office of Science under Contract No. DE-AC02-06CH11357.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CG.....89..174K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CG.....89..174K"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">simulation</span> of particle transport in an advection field applied to volcanic explosive eruptions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Künzli, Pierre; Tsunematsu, Kae; Albuquerque, Paul; Falcone, Jean-Luc; Chopard, Bastien; Bonadonna, Costanza</p> <p>2016-04-01</p> <p>Volcanic ash transport and dispersal models typically describe particle motion via a turbulent velocity field. Particles are advected inside this field from the moment they leave the vent of the volcano until they deposit on the ground. Several techniques exist to <span class="hlt">simulate</span> particles in an advection field such as finite difference Eulerian, Lagrangian-puff or pure Lagrangian techniques. In this paper, we present a new flexible <span class="hlt">simulation</span> tool called TETRAS (TEphra TRAnsport <span class="hlt">Simulator</span>) based on a hybrid Eulerian-Lagrangian model. This scheme offers the advantages of being numerically stable with no numerical diffusion and easily parallelizable. It also allows us to output particle atmospheric concentration or ground mass load at any given time. The model is validated using the advection-diffusion analytical equation. We also obtained a good agreement with field observations of the tephra deposit associated with the 2450 BP Pululagua (Ecuador) and the 1996 Ruapehu (New Zealand) eruptions. As this kind of model can lead to computationally intensive <span class="hlt">simulations</span>, a <span class="hlt">parallelization</span> on a distributed memory architecture was developed. A related performance model, taking into account load imbalance, is proposed and its accuracy tested.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19940023055','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19940023055"><span id="translatedtitle">Efficient massively <span class="hlt">parallel</span> <span class="hlt">simulation</span> of dynamic channel assignment schemes for wireless cellular communications</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.</p> <p>1994-01-01</p> <p>Fast, efficient <span class="hlt">parallel</span> algorithms are presented for discrete event <span class="hlt">simulations</span> of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To <span class="hlt">simulate</span> the model, conservative methods are used; i.e., methods in which no errors occur in the course of the <span class="hlt">simulation</span> and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial <span class="hlt">simulation</span> running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200..324N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200..324N"><span id="translatedtitle">MaMiCo: Software design for <span class="hlt">parallel</span> molecular-continuum flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim</p> <p>2016-03-01</p> <p>The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum <span class="hlt">simulations</span>, retaining sequential and <span class="hlt">parallel</span> performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow <span class="hlt">simulations</span> which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum <span class="hlt">simulations</span>. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid <span class="hlt">simulations</span> is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_21 --> <div id="page_22" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="421"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012JASMS..23.1609S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012JASMS..23.1609S"><span id="translatedtitle">Application of <span class="hlt">Parallel</span> Hybrid Algorithm in Massively <span class="hlt">Parallel</span> GPGPU—The Improved Effective and Efficient Method for Calculating Coulombic Interactions in <span class="hlt">Simulations</span> of Many Ions with SIMION</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Saito, Kenichiro; Koizumi, Eiko; Koizumi, Hideya</p> <p>2012-09-01</p> <p>In our previous study, we introduced a new hybrid approach to effectively approximate the total force on each ion during a trajectory calculation in mass spectrometry device <span class="hlt">simulations</span>, and the algorithm worked successfully with SIMION. We took one step further and applied the method in massively <span class="hlt">parallel</span> general-purpose computing with GPU (GPGPU) to test its performance in <span class="hlt">simulations</span> with thousands to over a million ions. We took extra care to minimize the barrier synchronization and data transfer between the host (CPU) and the device (GPU) memory, and took full advantage of the latency hiding. <span class="hlt">Parallel</span> codes were written in CUDA C++ and implemented to SIMION via the user-defined Lua program. In this study, we tested the <span class="hlt">parallel</span> hybrid algorithm with a couple of basic models and analyzed the performance by comparing it to that of the original, fully-explicit method written in serial code. The Coulomb explosion <span class="hlt">simulation</span> with 128,000 ions was completed in 309 s, over 700 times faster than the 63 h taken by the original explicit method in which we evaluated two-body Coulomb interactions explicitly on one ion with each of all the other ions. The <span class="hlt">simulation</span> of 1,024,000 ions was completed in 2650 s. In another example, we applied the hybrid method on a <span class="hlt">simulation</span> of ions in a simple quadrupole ion storage model with 100,000 ions, and it only took less than 10 d. Based on our estimate, the same <span class="hlt">simulation</span> is expected to take 5-7 y by the explicit method in serial code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920870','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920870"><span id="translatedtitle">6th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Schulz, M; Trinitis, C</p> <p>2007-07-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and many-core, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in <span class="hlt">parallel</span> algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 6th year ParSim continues to build a bridge between computer science and the application disciplines and to help with fostering cooperations between the different fields. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a shorter turn-around time. This offers the unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, ten papers with authors in ten countries were submitted to ParSim, and after a quick turn-around, yet thorough review process we decided to accept three of them for publication and presentation during the ParSim session. These three papers show the use of <span class="hlt">simulation</span> in a range of different application fields including earthquake and turbulence <span class="hlt">simulation</span>. At the same time, they also address computer science aspects and discuss different <span class="hlt">parallelization</span> strategies, programming models and environments, as well as scalability. We are confident that this provides an attractive program and that ParSim will yet again be an informal setting for lively discussions and for fostering new</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/6818542','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/6818542"><span id="translatedtitle">Forced-to-natural convection transition tests in <span class="hlt">parallel</span> <span class="hlt">simulated</span> liquid metal reactor fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Levin, A.E. ); Montgomery, B.H. )</p> <p>1990-01-01</p> <p>The Thermal-Hydraulic Out of Reactor Safety (THORS) Program at Oak Ridge National Laboratory (ORNL) had as its objective the testing of <span class="hlt">simulated</span>, electrically heated liquid metal reactor (LMR) fuel assemblies in an engineering-scale, sodium loop. Between 1971 and 1985, the THORS Program operated 11 <span class="hlt">simulated</span> fuel bundles in conditions covering a wide range of normal and off-normal conditions. The last test series in the Program, THORS-SHRS Assembly 1, employed two <span class="hlt">parallel</span>, 19-pin, full-length, <span class="hlt">simulated</span> fuel assemblies of a design consistent with the large LMR (Large Scale Prototype Breeder -- LSPB) under development at that time. These bundles were installed in the THORS Facility, allowing single- and <span class="hlt">parallel</span>-bundle testing in thermal-hydraulic conditions up to and including sodium boiling and dryout. As the name SHRS (Shutdown Heat Removal System) implies, a major objective of the program was testing under conditions expected during low-power reactor operation, including low-flow forced convection, natural convection, and forced-to-natural convection transition at various powers. The THORS-SHRS Assembly 1 experimental program was divided up into four phases. Phase 1 included preliminary and shakedown tests, including the collection of baseline steady-state thermal-hydraulic data. Phase 2 comprised natural convection testing. Forced convection testing was conducted in Phase 3. The final phase of testing included forced-to-natural convection transition tests. Phases 1, 2, and 3 have been discussed in previous papers. The fourth phase is described in this paper. 3 refs., 2 figs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://files.eric.ed.gov/fulltext/ED210037.pdf','ERIC'); return false;" href="http://files.eric.ed.gov/fulltext/ED210037.pdf"><span id="translatedtitle">Multiple-Instruction, Multiple-Data Path Computers: <span class="hlt">Parallel</span> Processing Impact on Flight <span class="hlt">Simulation</span> Software. Final Report.</span></a></p> <p><a target="_blank" href="http://www.eric.ed.gov/ERICWebPortal/search/extended.jsp?_pageLabel=advanced">ERIC Educational Resources Information Center</a></p> <p>Lord, Robert E.; And Others</p> <p></p> <p>The purpose of this study was to evaluate the <span class="hlt">parallel</span> processing impact of multiple-instruction multiple-data path (MIMD) computers on flight <span class="hlt">simulation</span> software. Basic mathematical functions and arithmetic expressions from typical flight <span class="hlt">simulation</span> software were selected and run on an MIMD computer to evaluate the improvement in execution time…</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910032684&hterms=elements+hardware&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D60%26Ntt%3Delements%2Bhardware','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910032684&hterms=elements+hardware&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D60%26Ntt%3Delements%2Bhardware"><span id="translatedtitle">A comparison of real-time blade-element and rotor-map helicopter <span class="hlt">simulations</span> using <span class="hlt">parallel</span> processing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Corliss, Lloyd; Du Val, Ronald W.; Gillman, Herbert, III; Huynh, Loc C.</p> <p>1990-01-01</p> <p>In recent efforts by NASA, the Army, and Advanced Rotorcraft Technology, Inc. (ART), the application of <span class="hlt">parallel</span> processing techniques to real-time <span class="hlt">simulation</span> have been studied. Traditionally, real-time helicopter <span class="hlt">simulations</span> have omitted the modeling of high-frequency phenomena in order to achieve real-time operation on affordable computers. <span class="hlt">Parallel</span> processing technology can now provide the means for significantly improving the fidelity of real-time <span class="hlt">simulation</span>, and one specific area for improvement is the modeling of rotor dynamics. This paper focuses on the results of a piloted <span class="hlt">simulation</span> in which a traditional rotor-map mathematical model was compared with a more sophisticated blade-element mathematical model that had been implemented using <span class="hlt">parallel</span> processing hardware and software technology.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013AGUFMIN23A1416G&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013AGUFMIN23A1416G&link_type=ABSTRACT"><span id="translatedtitle">Accelerating Dust Storm <span class="hlt">Simulation</span> by Balancing Task Allocation in <span class="hlt">Parallel</span> Computing Environment</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.</p> <p>2013-12-01</p> <p>Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm <span class="hlt">simulation</span> is a data and computing intensive process. Normally, a <span class="hlt">simulation</span> for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a <span class="hlt">parallel</span> fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm <span class="hlt">simulation</span>, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the <span class="hlt">paralleling</span>. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/966572','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/966572"><span id="translatedtitle">8th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Trinitis, C; Bader, M; Schulz, M</p> <p>2009-06-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Significant progress in CPU architecture (multi- and many-core CPUs, SMT, transactional memory, virtualization support, shared caches etc.) system scalability, and interconnect technology, continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 8th year, ParSim continues to build a bridge between application disciplines and computer science and to help fostering closer cooperations between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. We believe that this offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables participants to present and discuss their work within the scope of both the session and the host conference. This year, five papers from authors in five countries were submitted to Par-Sim, and we selected three of them. They cover a range of different application fields including mechanical engineering, material science, and structural engineering <span class="hlt">simulations</span>. We are confident that this resulted in an attractive special session and that this will be an informal setting for lively discussions as well as for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Jan Westerholm, Juha</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JPhCS.718d2007B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JPhCS.718d2007B"><span id="translatedtitle">Predictions of hydrodynamic <span class="hlt">simulations</span> for <span class="hlt">direct</span> dark matter detection</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bozorgnia, Nassim; Calore, Francesca; Schaller, Matthieu; Lovell, Mark; Bertone, Gianfranco; Frenk, Carlos S.; Crain, Robert A.; Navarro, Julio F.; Schaye, Joop; Theuns, Tom</p> <p>2016-05-01</p> <p>We study the effects of galaxy formation on dark matter <span class="hlt">direct</span> detection using hydrodynamic <span class="hlt">simulations</span> obtained from the “Evolution and Assembly of GaLaxies and their Environments” (EAGLE) and APOSTLE projects. We extract the local dark matter density and velocity distribution of the <span class="hlt">simulated</span> Milky Way analogues, and use them <span class="hlt">directly</span> to perform an analysis of current <span class="hlt">direct</span> detection data. The local dark matter density of the Milky Way-like galaxies is 0.41–0.73 GeV/cm3, and a Maxwellian distribution (with best fit peak speed of 223–289 km/s) describes well the local dark matter speed distribution. We find that the consistency between the result of different <span class="hlt">direct</span> detection experiments cannot be improved by using the dark matter distribution of the <span class="hlt">simulated</span> haloes.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/974699','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/974699"><span id="translatedtitle">Massively <span class="hlt">parallel</span> <span class="hlt">simulation</span> with DOE's ASCI supercomputers : an overview of the Los Alamos Crestone project</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Weaver, R. P.; Gittings, M. L.</p> <p>2004-01-01</p> <p>The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively <span class="hlt">parallel</span> hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-<span class="hlt">parallel</span> versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest <span class="hlt">simulations</span> we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these <span class="hlt">simulations</span>. Key features of any massively <span class="hlt">parallel</span> system</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014PhDT........13Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014PhDT........13Z"><span id="translatedtitle">Scalable <span class="hlt">parallel</span> programming for high performance seismic <span class="hlt">simulation</span> on petascale heterogeneous supercomputers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhou, Jun</p> <p></p> <p>The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale <span class="hlt">simulations</span> are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" <span class="hlt">simulations</span> with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable <span class="hlt">parallel</span> programming techniques for high performance seismic <span class="hlt">simulation</span> running on petascale heterogeneous supercomputers. A real world earthquake <span class="hlt">simulation</span> code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake <span class="hlt">simulation</span> workflow has also been developed to support the efficient production sets of <span class="hlt">simulations</span>. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhDT.......119R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhDT.......119R"><span id="translatedtitle"><span class="hlt">Parallel</span> Algorithms for Monte Carlo Particle Transport <span class="hlt">Simulation</span> on Exascale Computing Architectures</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Romano, Paul Kollath</p> <p></p> <p>Monte Carlo particle transport methods are being considered as a viable option for high-fidelity <span class="hlt">simulation</span> of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in <span class="hlt">parallel</span> efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear <span class="hlt">parallel</span> scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the <span class="hlt">simulation</span>. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed <span class="hlt">simulations</span>. The analysis demonstrated that load imbalances in domain decomposed <span class="hlt">simulations</span> arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006CoPhC.175..440B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006CoPhC.175..440B"><span id="translatedtitle">A package of Linux scripts for the <span class="hlt">parallelization</span> of Monte Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Badal, Andreu; Sempau, Josep</p> <p>2006-09-01</p> <p>Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) <span class="hlt">simulation</span> involves a prohibitively large amount of time. This limitation can be overcome by having recourse to <span class="hlt">parallel</span> computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a <span class="hlt">parallelization</span> scheme of a MC <span class="hlt">simulation</span> that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the <span class="hlt">parallel</span> calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to <span class="hlt">parallelize</span> PENELOPE, a code for MC <span class="hlt">simulation</span> of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in <span class="hlt">parallel</span> easily, without requiring specific libraries or significant alterations of the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016ApJ...823....7H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016ApJ...823....7H"><span id="translatedtitle">Ion Dynamics at a Rippled Quasi-<span class="hlt">parallel</span> Shock: 2D Hybrid <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hao, Yufei; Lu, Quanming; Gao, Xinliang; Wang, Shui</p> <p>2016-05-01</p> <p>In this paper, two-dimensional hybrid <span class="hlt">simulations</span> are performed to investigate ion dynamics at a rippled quasi-<span class="hlt">parallel</span> shock. The results show that the ripples around the shock front are inherent structures of a quasi-<span class="hlt">parallel</span> shock, and the re-formation of the shock is not synchronous along the surface of the shock front. By following the trajectories of the upstream ions, we find that these ions behave differently when they interact with the shock front at different positions along the shock surface. The upstream particles are transmitted more easily through the upper part of a ripple, and the corresponding bulk velocity downstream is larger, where a high-speed jet is formed. In the lower part of the ripple, the upstream particles tend to be reflected by the shock. Ions reflected by the shock may suffer multiple-stage acceleration when moving along the shock surface or trapped between the upstream waves and the shock front. Finally, these ions may escape further upstream or move downstream; therefore, superthermal ions can be found both upstream and downstream.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S"><span id="translatedtitle">MPI <span class="hlt">parallelization</span> of Vlasov codes for the <span class="hlt">simulation</span> of nonlinear laser-plasma interactions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.</p> <p>2003-10-01</p> <p>The <span class="hlt">simulation</span> of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive <span class="hlt">parallelization</span>. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for <span class="hlt">parallel</span> computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880008905','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880008905"><span id="translatedtitle">Experiences with serial and <span class="hlt">parallel</span> algorithms for channel routing using <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Brouwer, Randall Jay</p> <p>1988-01-01</p> <p>Two algorithms for channel routing using <span class="hlt">simulated</span> annealing are presented. <span class="hlt">Simulated</span> annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of <span class="hlt">simulated</span> annealing. The algorithm was implemented as a serial program for a workstation, and a <span class="hlt">parallel</span> program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010PhPl...17g3107W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010PhPl...17g3107W"><span id="translatedtitle">Three-dimensional <span class="hlt">parallel</span> UNIPIC-3D code for <span class="hlt">simulations</span> of high-power microwave devices</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan</p> <p>2010-07-01</p> <p>This paper introduces a self-developed, three-dimensional <span class="hlt">parallel</span> fully electromagnetic particle <span class="hlt">simulation</span> code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to <span class="hlt">simulate</span> the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the <span class="hlt">simulated</span> HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20160006398','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20160006398"><span id="translatedtitle"><span class="hlt">Parallel</span> Adjective High-Order CFD <span class="hlt">Simulations</span> Characterizing SOFIA Cavity Acoustics</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak</p> <p>2016-01-01</p> <p>This paper presents large-scale MPI-<span class="hlt">parallel</span> computational uid dynamics <span class="hlt">simulations</span> for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These <span class="hlt">simulations</span> focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy <span class="hlt">simulations</span>. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1050409','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1050409"><span id="translatedtitle">Mechanisms for the convergence of time-<span class="hlt">parallelized</span>, parareal turbulent plasma <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Reynolds-Barredo, J.; Newman, David E; Sanchez, R.; Samaddar, D.; Berry, Lee A; Elwasif, Wael R</p> <p>2012-01-01</p> <p>Parareal is a recent algorithm able to <span class="hlt">parallelize</span> the time dimension in spite of its sequential nature. It has been applied to several linear and nonlinear problems and, very recently, to a <span class="hlt">simulation</span> of fully-developed, two-dimensional drift wave turbulence. The mere fact that parareal works in such a turbulent regime is in itself somewhat unexpected, due to the characteristic sensitivity of turbulence to any change in initial conditions. This fundamental property of any turbulent system should render the iterative correction procedure characteristic of the parareal method inoperative, but this seems not to be the case. In addition, the choices that must be made to implement parareal (division of the temporal domain, election of the coarse solver and so on) are currently made using trial-and-error approaches. Here, we identify the mechanisms responsible for the convergence of parareal of these <span class="hlt">simulations</span> of drift wave turbulence. We also investigate which conditions these mechanisms impose on any successful parareal implementation. The results reported here should be useful to guide future implementations of parareal within the much wider context of fully-developed fluid and plasma turbulent <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..MARM27008W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..MARM27008W"><span id="translatedtitle">Large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> of short pulse laser interaction with metals</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Chengping; Zhigilei, Leonid; Computational Materials Group Team</p> <p>2014-03-01</p> <p>Taking advantage of petascale supercomputing architectures, large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> (108-109 atoms) are performed to study the microscopic mechanisms of short pulse laser interaction with metals. The results of the <span class="hlt">simulations</span> reveal a complex picture of highly non-equilibrium processes responsible for material modification and/or ejection. At low laser fluences below the ablation threshold, fast melting and resolidification occur under conditions of extreme heating and cooling rates resulting in surface microstructure modification. At higher laser fluences in the spallation regime, the material is ejected by the relaxation of laser-induced stresses and proceeds through the nucleation, growth and percolation of multiple voids in the sub-surface region of the irradiated target. At a fluence of ~ 2.5 times the spallation threshold, the top part of the target reaches the conditions for an explosive decomposition into vapor and small droplets, marking the transition to the phase explosion regime of laser ablation. The dynamics of plume formation and the characteristics of the ablation plume are obtained from the <span class="hlt">simulations</span> and compared with the results of time-resolved plume imaging experiments. Financial support for this work was provided by NSF (DMR-0907247 and CMMI-1301298) and AFOSR (FA9550-10-1-0541). Computational support was provided by the OLCF (MAT048) and XSEDE (TG-DMR110090).</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015APS..DFD.E9006P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015APS..DFD.E9006P"><span id="translatedtitle">A 3D MPI-<span class="hlt">Parallel</span> GPU-accelerated framework for <span class="hlt">simulating</span> ocean wave energy converters</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Pathak, Ashish; Raessi, Mehdi</p> <p>2015-11-01</p> <p>We present an MPI-<span class="hlt">parallel</span> GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC <span class="hlt">simulations</span>. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables <span class="hlt">simulations</span> at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical <span class="hlt">simulations</span> of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_22 --> <div id="page_23" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="441"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22043417','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22043417"><span id="translatedtitle">The role of the electron convection term for the <span class="hlt">parallel</span> electric field and electron acceleration in MHD <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.</p> <p>2011-08-15</p> <p>There has been a great concern about the origin of the <span class="hlt">parallel</span> electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to <span class="hlt">simulate</span> magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with <span class="hlt">simulation</span> results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when <span class="hlt">parallel</span> current flows. The electron convection term enables us to describe a situation in which a <span class="hlt">parallel</span> electric field and <span class="hlt">parallel</span> electron acceleration coexist, which is impossible for ideal or resistive MHD.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016CoPhC.204...74Z&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016CoPhC.204...74Z&link_type=ABSTRACT"><span id="translatedtitle"><span class="hlt">Parallel</span> two-level domain decomposition based Jacobi-Davidson algorithms for pyramidal quantum dot <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhao, Tao; Hwang, Feng-Nan; Cai, Xiao-Chuan</p> <p>2016-07-01</p> <p>We consider a quintic polynomial eigenvalue problem arising from the finite volume discretization of a quantum dot <span class="hlt">simulation</span> problem. The problem is solved by the Jacobi-Davidson (JD) algorithm. Our focus is on how to achieve the quadratic convergence of JD in a way that is not only efficient but also scalable when the number of processor cores is large. For this purpose, we develop a projected two-level Schwarz preconditioned JD algorithm that exploits multilevel domain decomposition techniques. The pyramidal quantum dot calculation is carefully studied to illustrate the efficiency of the proposed method. Numerical experiments confirm that the proposed method has a good scalability for problems with hundreds of millions of unknowns on a <span class="hlt">parallel</span> computer with more than 10,000 processor cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/372178','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/372178"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span>, and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, R.L.</p> <p>1996-07-01</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed /and its operating modes for a 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using a constant current model and the Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of a 1-kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500-W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging power factor (PF) mode for the entire load range and requires a narrow variation in switching frequency, to adequately regulate the output power.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W"><span id="translatedtitle"><span class="hlt">Simulation</span> of the Quasi-Monoenergetic Protons Generation by <span class="hlt">Parallel</span> Laser Pulses Interaction with Foils</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu</p> <p>2014-11-01</p> <p>A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-<span class="hlt">parallel</span> laser pulses is proposed. Particle-in-cell <span class="hlt">simulation</span> shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 × 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4° is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014snam.conf04304F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014snam.conf04304F"><span id="translatedtitle">Hybrid <span class="hlt">parallel</span> strategy for the <span class="hlt">simulation</span> of fast transient accidental situations at reactor scale</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.</p> <p>2014-06-01</p> <p>This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to <span class="hlt">simulate</span> the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The <span class="hlt">parallel</span> acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/19051924','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/19051924"><span id="translatedtitle">Non-equilibrium molecular dynamics <span class="hlt">simulation</span> of nanojet injection with adaptive-spatial decomposition <span class="hlt">parallel</span> algorithm.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shin, Hyun-Ho; Yoon, Woong-Sup</p> <p>2008-07-01</p> <p>An Adaptive-Spatial Decomposition <span class="hlt">parallel</span> algorithm was developed to increase computation efficiency for molecular dynamics <span class="hlt">simulations</span> of nano-fluids. Injection of a liquid argon jet with a scale of 17.6 molecular diameters was investigated. A solid annular platinum injector was also solved simultaneously with the liquid injectant by adopting a solid modeling technique which incorporates phantom atoms. The viscous heat was naturally discharged through the solids so the liquid boiling problem was avoided with no separate use of temperature controlling methods. Parametric investigations of injection speed, wall temperature, and injector length were made. A sudden pressure drop at the orifice exit causes flash boiling of the liquid departing the nozzle exit with strong evaporation on the surface of the liquids, while rendering a slender jet. The elevation of the injection speed and the wall temperature causes an activation of the surface evaporation concurrent with reduction in the jet breakup length and the drop size. PMID:19051924</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015AAS...22514029G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015AAS...22514029G"><span id="translatedtitle">Supernova Emulators: Connecting Massively <span class="hlt">Parallel</span> SN Ia Radiative Transfer <span class="hlt">Simulations</span> to Data with Gaussian Processes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Goldstein, Daniel; Thomas, Rollin; Kasen, Daniel</p> <p>2015-01-01</p> <p>Collaboration between the type Ia supernova (SN Ia) modeling and observation communities hinges on our ability to <span class="hlt">directly</span> connect <span class="hlt">simulations</span> to data. Here we introduce supernova emulation, a method for facilitating such a connection. Emulation allows us to instantaneously predict the observables (light curves, spectra, spectral time series) generated by arbitrary SN Ia radiative transfer <span class="hlt">simulations</span>, with estimates of prediction error. Emulators learn the mapping between physically meaningful <span class="hlt">simulation</span> inputs and the resulting synthetic observables from a training set of <span class="hlt">simulation</span> input-output pairs. In our emulation framework, we model PCA-decomposed representations of <span class="hlt">simulated</span> observables as an ensemble of Gaussian Processes. As a proof of concept, we train a bolometric light curve (BLC) emulator on a grid of 400 <span class="hlt">simulation</span> inputs and BLCs synthesized with the publicly available, gray, time-dependent Monte Carlo expanding atmospheres code, SMOKE. We emulate SMOKE <span class="hlt">simulations</span> evaluated at a set of 100 out-of-sample input parameters, and achieve excellent agreement between the emulator predictions and the <span class="hlt">simulated</span> BLCs. In addition to predicting <span class="hlt">simulation</span> outputs, emulators allow us to infer the regions of <span class="hlt">simulation</span> input parameter space that correspond to observed SN Ia light curves and spectra. We present a Bayesian framework for solving this inverse problem using Markov Chain Monte Carlo sampling. We fit published bolometric light curves with our emulator and obtain reconstructed masses (nickel mass, total ejecta mass) in agreement with reconstructions from semi-analytic models. We discuss applications of emulation to supernova cosmology and physics, including how emulators can be used to identify and quantify astrophysical sources of systematic error affecting SNe Ia as distance indicators for cosmology.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1165004','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1165004"><span id="translatedtitle">Acceleration of the matrix multiplication of Radiance three phase daylighting <span class="hlt">simulations</span> with <span class="hlt">parallel</span> computing on heterogeneous hardware of personal computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.</p> <p>2013-05-23</p> <p>Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting <span class="hlt">simulation</span> program, has been used to conduct daylighting <span class="hlt">simulations</span> for complex fenestration systems. Depending on the configurations, the <span class="hlt">simulation</span> can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight <span class="hlt">simulation</span> by conducting <span class="hlt">parallel</span> computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in <span class="hlt">parallel</span> using OpenCL. The speed of new approach was evaluated using various daylighting <span class="hlt">simulation</span> cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting <span class="hlt">simulation</span>, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3306636','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3306636"><span id="translatedtitle">Macro-scale phenomena of arterial coupled cells: a massively <span class="hlt">parallel</span> <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Shaikh, Mohsin Ahmed; Wall, David J. N.; David, Tim</p> <p>2012-01-01</p> <p>Impaired mass transfer characteristics of blood-borne vasoactive species such as adenosine triphosphate in regions such as an arterial bifurcation have been hypothesized as a prospective mechanism in the aetiology of atherosclerotic lesions. Arterial endothelial cells (ECs) and smooth muscle cells (SMCs) respond differentially to altered local haemodynamics and produce coordinated macro-scale responses via intercellular communication. Using a computationally designed arterial segment comprising large populations of mathematically modelled coupled ECs and SMCs, we investigate their response to spatial gradients of blood-borne agonist concentrations and the effect of micro-scale-driven perturbation on the macro-scale. Altering homocellular (between same cell type) and heterocellular (between different cell types) intercellular coupling, we <span class="hlt">simulated</span> four cases of normal and pathological arterial segments experiencing an identical gradient in the concentration of the agonist. Results show that the heterocellular calcium (Ca2+) coupling between ECs and SMCs is important in eliciting a rapid response when the vessel segment is stimulated by the agonist gradient. In the absence of heterocellular coupling, homocellular Ca2+ coupling between SMCs is necessary for propagation of Ca2+ waves from downstream to upstream cells axially. Desynchronized intracellular Ca2+ oscillations in coupled SMCs are mandatory for this propagation. Upon decoupling the heterocellular membrane potential, the arterial segment looses the inhibitory effect of ECs on the Ca2+ dynamics of the underlying SMCs. The full system comprises hundreds of thousands of coupled nonlinear ordinary differential equations <span class="hlt">simulated</span> on the massively <span class="hlt">parallel</span> Blue Gene architecture. The use of massively <span class="hlt">parallel</span> computational architectures shows the capability of this approach to address macro-scale phenomena driven by elementary micro-scale components of the system. PMID:21920960</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008SPIE.6924E..0YT','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008SPIE.6924E..0YT"><span id="translatedtitle">Massively-<span class="hlt">parallel</span> FDTD <span class="hlt">simulations</span> to address mask electromagnetic effects in hyper-NA immersion lithography</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Tirapu Azpiroz, Jaione; Burr, Geoffrey W.; Rosenbluth, Alan E.; Hibbs, Michael</p> <p>2008-03-01</p> <p>In the Hyper-NA immersion lithography regime, the electromagnetic response of the reticle is known to deviate in a complicated manner from the idealized Thin-Mask-like behavior. Already, this is driving certain RET choices, such as the use of polarized illumination and the customization of reticle film stacks. Unfortunately, full 3-D electromagnetic mask <span class="hlt">simulations</span> are computationally intensive. And while OPC-compatible mask electromagnetic field (EMF) models can offer a reasonable tradeoff between speed and accuracy for full-chip OPC applications, full understanding of these complex physical effects demands higher accuracy. Our paper describes recent advances in leveraging High Performance Computing as a critical step towards lithographic modeling of the full manufacturing process. In this paper, highly accurate full 3-D electromagnetic <span class="hlt">simulation</span> of very large mask layouts are conducted in <span class="hlt">parallel</span> with reasonable turnaround time, using a Blue- Gene/L supercomputer and a Finite-Difference Time-Domain (FDTD) code developed internally within IBM. A 3-D <span class="hlt">simulation</span> of a large 2-D layout spanning 5μm×5μm at the wafer plane (and thus (20μm×20μm×0.5μm at the mask) results in a <span class="hlt">simulation</span> with roughly 12.5GB of memory (grid size of 10nm at the mask, single-precision computation, about 30 bytes/grid point). FDTD is flexible and easily parallelizable to enable full <span class="hlt">simulations</span> of such large layout in approximately an hour using one BlueGene/L "midplane" containing 512 dual-processor nodes with 256MB of memory per processor. Our scaling studies on BlueGene/L demonstrate that <span class="hlt">simulations</span> up to 100μm × 100μm at the mask can be computed in a few hours. Finally, we will show that the use of a subcell technique permits accurate <span class="hlt">simulation</span> of features smaller than the grid discretization, thus improving on the tradeoff between computational complexity and <span class="hlt">simulation</span> accuracy. We demonstrate the correlation of the real and quadrature components that comprise the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011AGUFM.S51D..06W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011AGUFM.S51D..06W"><span id="translatedtitle">3D frequency modeling of elastic seismic wave propagation via a structured massively <span class="hlt">parallel</span> <span class="hlt">direct</span> Helmholtz solver</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, S.; De Hoop, M. V.; Xia, J.; Li, X.</p> <p>2011-12-01</p> <p>We consider the modeling of elastic seismic wave propagation on a rectangular domain via the discretization and solution of the inhomogeneous coupled Helmholtz equation in 3D, by exploiting a <span class="hlt">parallel</span> multifrontal sparse <span class="hlt">direct</span> solver equipped with Hierarchically Semi-Separable (HSS) structure to reduce the computational complexity and storage. In particular, we are concerned with solving this equation on a large domain, for a large number of different forcing terms in the context of seismic problems in general, and modeling in particular. We resort to a parsimonious mixed grid finite differences scheme for discretizing the Helmholtz operator and Perfect Matched Layer boundaries, resulting in a non-Hermitian matrix. We make use of a nested dissection based domain decomposition, and introduce an approximate <span class="hlt">direct</span> solver by developing a <span class="hlt">parallel</span> HSS matrix compression, factorization, and solution approach. We cast our massive <span class="hlt">parallelization</span> in the framework of the multifrontal method. The assembly tree is partitioned into local trees and a global tree. The local trees are eliminated independently in each processor, while the global tree is eliminated through massive communication. The solver for the inhomogeneous equation is a <span class="hlt">parallel</span> hybrid between multifrontal and HSS structure. The computational complexity associated with the factorization is almost linear with the size of the Helmholtz matrix. Our numerical approach can be compared with the spectral element method in 3D seismic applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010JGRB..11512101H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010JGRB..11512101H"><span id="translatedtitle">A <span class="hlt">parallel</span> 3-D staggered grid pseudospectral time domain method for ground-penetrating radar wave <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Huang, Qinghua; Li, Zhanhui; Wang, Yanbin</p> <p>2010-12-01</p> <p>We presented a <span class="hlt">parallel</span> 3-D staggered grid pseudospectral time domain (PSTD) method for <span class="hlt">simulating</span> ground-penetrating radar (GPR) wave propagation. We took the staggered grid method to weaken the global effect in PSTD and developed a modified fast Fourier transform (FFT) spatial derivative operator to eliminate the wraparound effect due to the implicit periodical boundary condition in FFT operator. After the above improvements, we achieved the <span class="hlt">parallel</span> PSTD computation based on an overlap domain decomposition method without any absorbing condition for each subdomain, which can significantly reduce the required grids in each overlap subdomain comparing with other proposed algorithms. We test our <span class="hlt">parallel</span> technique for some numerical models and obtained consistent results with the analytical ones and/or those of the nonparallel PSTD method. The above numerical tests showed that our <span class="hlt">parallel</span> PSTD algorithm is effective in <span class="hlt">simulating</span> 3-D GPR wave propagation, with merits of saving computation time, as well as more flexibility in dealing with complicated models without losing the accuracy. The application of our <span class="hlt">parallel</span> PSTD method in applied geophysics and paleoseismology based on GPR data confirmed the efficiency of our algorithm and its potential applications in various subdisciplines of solid earth geophysics. This study would also provide a useful <span class="hlt">parallel</span> PSTD approach to the <span class="hlt">simulation</span> of other geophysical problems on distributed memory PC cluster.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19890016258','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19890016258"><span id="translatedtitle">Feasibility of using the Massively <span class="hlt">Parallel</span> Processor for large eddy <span class="hlt">simulations</span> and other Computational Fluid Dynamics applications</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Bruno, John</p> <p>1984-01-01</p> <p>The results of an investigation into the feasibility of using the MPP for <span class="hlt">direct</span> and large eddy <span class="hlt">simulations</span> of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively <span class="hlt">Parallel</span> Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical <span class="hlt">simulations</span> in CFD.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200...57J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200...57J"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji</p> <p>2016-03-01</p> <p>Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer <span class="hlt">simulations</span> and data analyses, including molecular dynamics (MD) <span class="hlt">simulations</span>. In this study, we develop hybrid (MPI+OpenMP) <span class="hlt">parallelization</span> schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD <span class="hlt">simulations</span>. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to <span class="hlt">simulate</span> one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22253805','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22253805"><span id="translatedtitle"><span class="hlt">Parallel</span> kinetic Monte Carlo <span class="hlt">simulation</span> framework incorporating accurate models of adsorbate lateral interactions</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nielsen, Jens; D’Avezac, Mayeul; Hetherington, James; Stamatakis, Michail</p> <p>2013-12-14</p> <p>Ab initio kinetic Monte Carlo (KMC) <span class="hlt">simulations</span> have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These <span class="hlt">simulations</span> necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for <span class="hlt">simulating</span> catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce <span class="hlt">parallelization</span> with OpenMP. We further benchmark our framework by <span class="hlt">simulating</span> a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008APS..DFD.EN004S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008APS..DFD.EN004S"><span id="translatedtitle"><span class="hlt">Direct</span> Numerical <span class="hlt">Simulations</span> of the Flow around a Golf Ball: Effect of Rotation</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Smith, Clinton; Beratlis, Nikolaos; Squires, Kyle; Balaras, Elias; Tsunoda, Masaya</p> <p>2008-11-01</p> <p>Golf ball flight is affected by rotation of the ball (lift generation) and dimpling on the surface (drag reduction). <span class="hlt">Direct</span> Numerical <span class="hlt">Simulation</span> (DNS) is being developed for the flow around a rotating golf ball using an immersed boundary method. Adding to the computational cost is that the moving body must be re-located as the ball rotates. In the present effort, interface-tracking of the moving body is optimized using the Approximate Nearest Neighbor (ANN) approach. The code is <span class="hlt">parallelized</span> using domain decomposition and message passing interface (MPI), and <span class="hlt">parallel</span> performance results are presented for a range of grid sizes. Results are presented from a series of validation cases for flow over a smooth sphere and a golf ball.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19890013456','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19890013456"><span id="translatedtitle"><span class="hlt">Direct</span> numerical <span class="hlt">simulation</span> of compressible free shear flows</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Lele, Sanjiva K.</p> <p>1989-01-01</p> <p><span class="hlt">Direct</span> numerical <span class="hlt">simulations</span> of compressible free shear layers in open domains are conducted. Compact finite-difference schemes of spectral-like accuracy are used for the <span class="hlt">simulations</span>. Both temporally-growing and spatially-growing mixing layers are studied. The effect of intrinsic compressibility on the evolution of vortices is studied. The use of convective Mach number is validated. Details of vortex roll up and pairing are studied. Acoustic radiation from vortex roll up, pairing and shape oscillations is studied and quantified.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19870015209','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19870015209"><span id="translatedtitle"><span class="hlt">Direct</span> <span class="hlt">simulation</span> of a turbulent oscillating boundary layer</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Spalart, Philippe R.; Baldwin, Barrett S.</p> <p>1987-01-01</p> <p>The turbulent boundary layer driven by a freestream velocity that varies sinusoidally in time around a zero mean is considered. The flow has a rich behavior including strong pressure gradients, inflection points, and reversal. A theory for the velocity and stress profiles at high Reynolds number is formulated. Well-resolved <span class="hlt">direct</span> Navier-Stokes <span class="hlt">simulations</span> are conducted over a narrow range of Reynolds numbers, and the results are compared with the theoretical predictions. The flow is also computed over a wide range of Reynolds numbers using a new algebraic turbulence model; the results are compared with the <span class="hlt">direct</span> <span class="hlt">simulations</span> and the theory.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20110014769','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20110014769"><span id="translatedtitle"><span class="hlt">Simulating</span> a <span class="hlt">Direction</span>-Finder Search for an ELT</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Bream, Bruce</p> <p>2005-01-01</p> <p>A computer program <span class="hlt">simulates</span> the operation of <span class="hlt">direction</span>-finding equipment engaged in a search for an emergency locator transmitter (ELT) aboard an aircraft that has crashed. The <span class="hlt">simulated</span> equipment is patterned after the equipment used by the Civil Air Patrol to search for missing aircraft. The program is designed to be used for training in radio <span class="hlt">direction</span>-finding and/or searching for missing aircraft without incurring the expense and risk of using real aircraft and ground search resources. The program places a hidden ELT on a map and enables the user to search for the location of the ELT by moving a 14 NASA Tech Briefs, March 2005 small aircraft image around the map while observing signal-strength and <span class="hlt">direction</span> readings on a <span class="hlt">simulated</span> <span class="hlt">direction</span>- finding locator instrument. As the <span class="hlt">simulated</span> aircraft is turned and moved on the map, the program updates the readings on the <span class="hlt">direction</span>-finding instrument to reflect the current position and heading of the aircraft relative to the location of the ELT. The software is distributed in a zip file that contains an installation program. The software runs on the Microsoft Windows 9x, NT, and XP operating systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19890015601','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19890015601"><span id="translatedtitle"><span class="hlt">Simulation</span> of electrostatic ion instabilities in the presence of <span class="hlt">parallel</span> currents and transverse electric fields</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Nishikawa, K.-I.; Ganguli, G.; Lee, Y. C.; Palmadesso, P. J.</p> <p>1989-01-01</p> <p>A spatially two-dimensional electrostatic PIC <span class="hlt">simulation</span> code was used to study the stability of a plasma equilibrium characterized by a localized transverse dc electric field and a field-aligned drift for L is much less than Lx, where Lx is the <span class="hlt">simulation</span> length in the x <span class="hlt">direction</span> and L is the scale length associated with the dc electric field. It is found that the dc electric field and the field-aligned current can together play a synergistic role to enable the excitation of electrostatic waves even when the threshold values of the field aligned drift and the E x B drift are individually subcritical. The <span class="hlt">simulation</span> results show that the growing ion waves are associated with small vortices in the linear stage, which evolve to the nonlinear stage dominated by larger vortices with lower frequencies.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_23 --> <div id="page_24" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="461"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/929325','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/929325"><span id="translatedtitle">Progress on H5Part: A Portable High Performance <span class="hlt">Parallel</span> DataInterface for Electromagnetics <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger,Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt</p> <p>2007-06-22</p> <p>Significant problems facing all experimental andcomputationalsciences arise from growing data size and complexity. Commonto allthese problems is the need to perform efficient data I/O ondiversecomputer architectures. In our scientific application, thelargestparallel particle <span class="hlt">simulations</span> generate vast quantitiesofsix-dimensional data. Such a <span class="hlt">simulation</span> run produces data foranaggregate data size up to several TB per run. Motived by the needtoaddress data I/O and access challenges, we have implemented H5Part,anopen source data I/O API that simplifies the use of the HierarchicalDataFormat v5 library (HDF5). HDF5 is an industry standard forhighperformance, cross-platform data storage and retrieval that runsonall contemporary architectures from large <span class="hlt">parallel</span> supercomputerstolaptops. H5Part, which is oriented to the needs of the particlephysicsand cosmology communities, provides support for parallelstorage andretrieval of particles, structured and in the future unstructuredmeshes.In this paper, we describe recent work focusing on I/O supportforparticles and structured meshes and provide data showing performance onmodernsupercomputer architectures like the IBM POWER 5.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016NewA...43...49B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016NewA...43...49B"><span id="translatedtitle">Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively <span class="hlt">parallel</span> star formation <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.</p> <p>2016-02-01</p> <p>We present an algorithm for solving the radiative transfer problem on massively <span class="hlt">parallel</span> computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation <span class="hlt">simulations</span> with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse <span class="hlt">simulations</span> resembling the early stages of protostar and disc formation.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013ApJ...776...46E','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013ApJ...776...46E"><span id="translatedtitle">Monte Carlo <span class="hlt">Simulations</span> of Nonlinear Particle Acceleration in <span class="hlt">Parallel</span> Trans-relativistic Shocks</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M.</p> <p>2013-10-01</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and "smooth" the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor γ0 = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2000DPS....32.6532C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2000DPS....32.6532C"><span id="translatedtitle">Giant Impacts During Planet Formation: <span class="hlt">Parallel</span> Tree Code <span class="hlt">Simulations</span> Using Smooth Particle Hydrodynamics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cohen, R.; Bodenheimer, P.; Asphaug, E.</p> <p>2000-12-01</p> <p>There is both theoretical and observational evidence that giant planets collided with objects with mass >= Mearth during their evolution. These impacts may help shorten planetary formation timescales by changing the opacity of the planetary atmosphere to allow quicker cooling. They may also redistribute heavy metals within giant planets, affect the core/envelope mass ratio, and help determine the ratio of emitted to absorbed energy within giant planets. Thus, the researchers propose to <span class="hlt">simulate</span> the impact of a ~ Earth-mass object onto a proto-giant-planet with SPH. Results of the SPH collision models will be input into a steady-state planetary evolution code and the effect of impacts on formation timescales, core/envelope mass ratios, density profiles, and thermal emissions of giant planets will be quantified. The collision will be modelled using a modified version of an SPH routine which <span class="hlt">simulates</span> the collision of two polytropes. The Saumon-Chabrier and Tillotson equations of state will replace the polytropic equation of state. The <span class="hlt">parallel</span> tree algorithm of Olson & Packer will be used for the domain decomposition and neighbor search necessary to calculate pressure and self-gravity efficiently. This work is funded by the NASA Graduate Student Researchers Program.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013NIMPA.732..233R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013NIMPA.732..233R"><span id="translatedtitle">A study of Gd-based <span class="hlt">parallel</span> plate avalanche counter for thermal neutrons by MC <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Rhee, J. T.; Kim, H. G.; Ahmad, Farzana; Jeon, Y. J.; Jamil, M.</p> <p>2013-12-01</p> <p>In this work, we demonstrate the feasibility and characteristics of a single-gap <span class="hlt">parallel</span> plate avalanche counter (PPAC) as a low energy neutron detector, based on Gd-converter coating. Upon falling on the Gd-converter surface, the incident low energy neutrons produce internal conversion electrons which are evaluated and detected. For estimating the performance of the Gd-based PPAC, a <span class="hlt">simulation</span> study has been performed using GEANT4 Monte Carlo (MC) code. The detector response as a function of incident neutron energies in the range of 25-100 meV has been evaluated with two different physics lists. Using the QGSP_BIC_HP physics list and assuming 5 μm converter thickness, 11.8%, 18.48%, and 30.28% detection efficiencies have been achieved for the forward-, the backward-, and the total response of the converter-based PPAC. On the other hand, considering the same converter thickness and detector configuration, with the QGSP_BERT_HP physics list efficiencies of 12.19%, 18.62%, and 30.81%, respectively, were obtained. These <span class="hlt">simulation</span> results are briefly discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968"><span id="translatedtitle">A <span class="hlt">parallel</span> overset-curvilinear-immersed boundary framework for <span class="hlt">simulating</span> complex 3D incompressible flows</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis</p> <p>2013-01-01</p> <p>We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to <span class="hlt">simulate</span> a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient <span class="hlt">parallel</span> computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by <span class="hlt">simulating</span> the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20120014386','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20120014386"><span id="translatedtitle"><span class="hlt">Simulated</span> Wake Characteristics Data for Closely Spaced <span class="hlt">Parallel</span> Runway Operations Analysis</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Guerreiro, Nelson M.; Neitzke, Kurt W.</p> <p>2012-01-01</p> <p>A <span class="hlt">simulation</span> experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced <span class="hlt">parallel</span> runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were <span class="hlt">simulated</span> to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22270730','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22270730"><span id="translatedtitle">MONTE CARLO <span class="hlt">SIMULATIONS</span> OF NONLINEAR PARTICLE ACCELERATION IN <span class="hlt">PARALLEL</span> TRANS-RELATIVISTIC SHOCKS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M. E-mail: ambykov@yahoo.com</p> <p>2013-10-10</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and 'smooth' the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor γ{sub 0} = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2004PhDT........10N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2004PhDT........10N"><span id="translatedtitle"><span class="hlt">Direct</span> spectral/hp element <span class="hlt">simulation</span> of piloted jet non-premixed flames</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nastase, Cristian R.</p> <p>2004-11-01</p> <p>The spectral/hp element method is used for <span class="hlt">direct</span> numerical <span class="hlt">simulation</span> (DNS) of piloted non premixed methane jet flames. This method combines the accuracy of spectral methods with versatility of finite element methods, and allows accurate <span class="hlt">simulations</span> of complex flows on structured and unstructured grids. Here, the methodology is extended for <span class="hlt">simulation</span> of multi-species, reactive flows using the discontinuous Galerkin formulation. <span class="hlt">Parallel</span> computation is performed via MPI standards coupled with a domain decomposition methodology. The overall computational scheme allows for an efficient partitioning of the flow configuration. Tests performed with up to 64 processors show quasi-linear <span class="hlt">parallel</span> performance and scalability. The flame configurations are similar to the piloted jet non-premixed flame considered at the Combustion Research Facility at the Sandia National Laboratories. For a momentum dominated flame, the <span class="hlt">simulated</span> results portray many of the features observed experimentally. This pertains to both the spatial and the compositional structures of the flow. For a buoyancy controlled flame (at elevated gravity levels), the results indicate an increase in both the turbulence levels and flow acceleration. Departure from equilibrium, including localized extinction is observed on a significant portion of this flame.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/385558','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/385558"><span id="translatedtitle"><span class="hlt">Parallel</span> contact detection algorithm for transient solid dynamics <span class="hlt">simulations</span> using PRONTO3D</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.</p> <p>1996-09-01</p> <p>An efficient, scalable, <span class="hlt">parallel</span> algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD <span class="hlt">parallel</span> computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in <span class="hlt">parallel</span> computation by devising a dynamic (adaptive) processor load balancing scheme.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/28186','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/28186"><span id="translatedtitle">Automated integration of genomic physical mapping data via <span class="hlt">parallel</span> <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Slezak, T.</p> <p>1994-06-01</p> <p>The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via <span class="hlt">simulated</span> annealing performed on a network of 40+ Unix machines in <span class="hlt">parallel</span>, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the <span class="hlt">parallel</span> net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003PhDT.......106F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003PhDT.......106F"><span id="translatedtitle">Numerical investigation of <span class="hlt">parallel</span> airfoil-vortex interaction using large eddy <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Felten, Frederic N.</p> <p></p> <p>Helicopter Blade-Vortex Interaction (BVI) occurs under certain conditions of powered descent or during extreme maneuvering. The vibration and acoustic problems associated with the interaction of rotor tip vortices and the following blades are major aerodynamic concerns for the helicopter community. Researchers have performed numerous experimental and computational studies over the last two decades in order to gain a better understanding of the physical mechanisms involved in BVI. The most severe interaction, in terms of generated noise, happens when the vortex filament is <span class="hlt">parallel</span> to the blade, thus affecting a great portion of it. The majority of the previous numerical studies of <span class="hlt">parallel</span> BVI fall within a potential flow framework, therefore excluding all viscous phenomena. Some Navier-Stokes approaches using dissipative numerical methods in conjunction with RANS-type turbulence models have also been attempted, but with limited success. In this work, the situation is improved by increasing the fidelity of both the numerical method and the turbulence model. A kinetic-energy conserving finite-volume scheme using a collocated-mesh arrangement, specially designed for <span class="hlt">simulation</span> of turbulence in complex geometries, was implemented. For the turbulence model, a cost-effective zonal hybrid RANS/LES technique is used. A BANS zone covers the boundary layers on the airfoil and the wake region behind, while the remainder of the flow field, including the region occupied by the vortex makes up the dynamic LES zone. The concentrated tip vortex is not attenuated as it is convected downstream and over a NACA 0012 airfoil. The lift, drag, moment and friction coefficients induced by the passage of the vortex are monitored in time and compared with experimental data.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1995PhDT.......219M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1995PhDT.......219M"><span id="translatedtitle">Three-Dimensional <span class="hlt">Parallel</span> Lattice Boltzmann Hydrodynamic <span class="hlt">Simulations</span> of Turbulent Flows in Interstellar Dark Clouds</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Muders, Dirk</p> <p>1995-08-01</p> <p>Exploring the clumpy and filamentary structure of interstellar molecular clouds is one of the key problems of modern astrophysics. So far, we have little knowledge of the physical processes that cause the structure, but turbulence is suspected to be essential. In this thesis I study turbulent flows and how they contribute to the structure of interstellar dark clouds. To this end, three-dimensional numerical hydrodynamic <span class="hlt">simulations</span> are needed since the detailed turbulent spatial and velocity structure cannot be analytically calculated. I employ the ``Lattice Boltzmann Method'', a recently developed numerical method which solves the Boltzmann equation in a discretized phase space. Mesoscopic particle packets move with fixed velocities on a Cartesian lattice and at each time step they exchange mass according to given rules. Because of its mainly local operations the method is well suited for application on <span class="hlt">parallel</span> or clustered computers. As part of my thesis I have developed a <span class="hlt">parallelized</span> ``Lattice Boltzmann Method'' hydrodynamics code. I have improved the numerical stability for Reynolds numbers of up to 104.5 and Mach numbers of up to 0.9 and I have extended the method to include a second miscible fluid phase. The code has been used on the three currently most powerful workstations at the ``Max-Planck-Institut für Radioastronomie'' in Bonn and on the massively <span class="hlt">parallel</span> mainframe CM-5 at the ``Gesellschaft für Mathematik und Datenverarbeitung'' in St. Augustin. The <span class="hlt">simulations</span> consist of collimated shear flows and the motion of molecular clumps through an ambient medium. The dependence of the emerging structure on Reynolds and Mach numbers is studied. The main results are (1) that distinct clumps and filaments appear only at the transition between laminar and fully turbulent flow at Reynolds numbers between 500 and 5000 and (2) that subsonic viscous shear flows are capable of producing the dark cloud velocity structure. The unexpectedly low Reynolds numbers can</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://files.eric.ed.gov/fulltext/ED317073.pdf','ERIC'); return false;" href="http://files.eric.ed.gov/fulltext/ED317073.pdf"><span id="translatedtitle">A Comparative Analysis of <span class="hlt">Simulated</span> and <span class="hlt">Direct</span> Oral Proficiency Interviews.</span></a></p> <p><a target="_blank" href="http://www.eric.ed.gov/ERICWebPortal/search/extended.jsp?_pageLabel=advanced">ERIC Educational Resources Information Center</a></p> <p>Stansfield, Charles W.</p> <p></p> <p>The <span class="hlt">simulated</span> oral proficiency interview (SOPI) is a semi-<span class="hlt">direct</span> speaking test that models the format of the oral proficiency interview (OPI). The OPI is a method of assessing general speaking proficiency in a second language. The SOPI is a tape-recorded test consisting of six parts: simple personal background questions posed in a simulated…</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5289523','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5289523"><span id="translatedtitle">Particle <span class="hlt">simulation</span> on radio frequency stabilization of flute modes in a tandem mirror. I. <span class="hlt">Parallel</span> antenna</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kadoya, Y.; Abe, H.</p> <p>1988-04-01</p> <p>A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied <span class="hlt">parallel</span> to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The <span class="hlt">parallel</span> electric field E/sub <span class="hlt">parallel</span>/ perturbs the electron velocity v/sub <span class="hlt">parallel</span>/ <span class="hlt">parallel</span> to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product <v/sub <span class="hlt">parallel</span>/B/sub perpendicular/> is then shown to stabilize the flute modes. The stabilizing wave power threshold, the frequency dependency, and the dependence on delchemically bondE/sub <span class="hlt">parallel</span>/chemically bond all agree with the theoretical predictions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JCAP...05..024B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JCAP...05..024B"><span id="translatedtitle"><span class="hlt">Simulated</span> Milky Way analogues: implications for dark matter <span class="hlt">direct</span> searches</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bozorgnia, Nassim; Calore, Francesca; Schaller, Matthieu; Lovell, Mark; Bertone, Gianfranco; Frenk, Carlos S.; Crain, Robert A.; Navarro, Julio F.; Schaye, Joop; Theuns, Tom</p> <p>2016-05-01</p> <p>We study the implications of galaxy formation on dark matter <span class="hlt">direct</span> detection using high resolution hydrodynamic <span class="hlt">simulations</span> of Milky Way-like galaxies <span class="hlt">simulated</span> within the EAGLE and APOSTLE projects. We identify Milky Way analogues that satisfy observational constraints on the Milky Way rotation curve and total stellar mass. We then extract the dark matter density and velocity distribution in the Solar neighbourhood for this set of Milky Way analogues, and use them to analyse the results of current <span class="hlt">direct</span> detection experiments. For most Milky Way analogues, the event rates in <span class="hlt">direct</span> detection experiments obtained from the best fit Maxwellian distribution (with peak speed of 223–289 km/s) are similar to those obtained <span class="hlt">directly</span> from the <span class="hlt">simulations</span>. As a consequence, the allowed regions and exclusion limits set by <span class="hlt">direct</span> detection experiments in the dark matter mass and spin-independent cross section plane shift by a few GeV compared to the Standard Halo Model, at low dark matter masses. For each dark matter mass, the halo-to-halo variation of the local dark matter density results in an overall shift of the allowed regions and exclusion limits for the cross section. However, the compatibility of the possible hints for a dark matter signal from DAMA and CDMS-Si and null results from LUX and SuperCDMS is not improved.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910043539&hterms=reaction+time+computer&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D10%26Ntt%3Dreaction%2Btime%2Bcomputer','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910043539&hterms=reaction+time+computer&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D10%26Ntt%3Dreaction%2Btime%2Bcomputer"><span id="translatedtitle">Computer <span class="hlt">simulation</span> in template-<span class="hlt">directed</span> oligonucleotide synthesis</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kanavarioti, Anastassia; Benasconi, Claude F.</p> <p>1990-01-01</p> <p>It is commonly assumed that template-<span class="hlt">directed</span> polymerizations have played a key role in prebiotic evolution. A computer <span class="hlt">simulation</span> that models up to 33 competing reactions was used to investigate the product distribution in a template-<span class="hlt">directed</span> oligonucleotide synthesis as a function of time and concentration of the reactants. The study focuses on the poly(C)-<span class="hlt">directed</span> elongation reaction of oligoguanylates, and how it is affected by the competing processes of hydrolysis and dimerization of the activated monomer, which have the potential of severely curtailing the elongation and reducing the size and yield of the synthesized polymers. The <span class="hlt">simulations</span> show that realistic and probably prebiotically plausible conditions can be found where hydrolysis and dimerization are either negligible or where a high degree of polymerization can be attained even in the face of substantial hydrolysis and/or dimerization.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020060457','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020060457"><span id="translatedtitle">A Three Dimensional <span class="hlt">Parallel</span> Time Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid Systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2001-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability will be presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020073408','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020073408"><span id="translatedtitle">A Three-Dimensional <span class="hlt">Parallel</span> Time-Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid System</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2002-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability are presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/957425','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/957425"><span id="translatedtitle"><span class="hlt">Parallel</span> Higher-order Finite Element Method for Accurate Field Computations in Wakefield and PIC <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.; Ko, K.; /SLAC</p> <p>2009-06-19</p> <p>Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) <span class="hlt">parallel</span> higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale <span class="hlt">simulation</span> of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell) approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_24 --> <div id="page_25" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li class="active"><span>25</span></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="481"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012AGUFM.H13G1442S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012AGUFM.H13G1442S"><span id="translatedtitle"><span class="hlt">Simulation</span> of hydraulic fracture networks in three dimensions utilizing massively <span class="hlt">parallel</span> computing platforms</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Settgast, R. R.; Johnson, S.; Fu, P.; Walsh, S. D.; Ryerson, F. J.; Antoun, T.</p> <p>2012-12-01</p> <p>Hydraulic fracturing has been an enabling technology for commercially stimulating fracture networks for over half of a century. It has become one of the most widespread technologies for engineering subsurface fracture systems. Despite the ubiquity of this technique in the field, understanding and prediction of the hydraulic induced propagation of the fracture network in realistic, heterogeneous reservoirs has been limited. A number of developments in multiscale modeling in recent years have allowed researchers in related fields to tackle the modeling of complex fracture propagation as well as the mechanics of heterogeneous materials. These developments, combined with advances in quantifying solution uncertainties, provide possibilities for the geologic modeling community to capture both the fracturing behavior and longer-term permeability evolution of rock masses under hydraulic loading across both dynamic and viscosity-dominated regimes. Here we will demonstrate the first phase of this effort through illustrations of fully three-dimensional, tightly coupled hydromechanical <span class="hlt">simulations</span> of hydraulically induced fracture network propagation run on massively <span class="hlt">parallel</span> computing scales, and discuss preliminary results regarding the mechanisms by which fracture interactions and the accompanying changes to the stress field can lead to deleterious or beneficial changes to the fracture network.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/876729','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/876729"><span id="translatedtitle">Performance Evaluation of Lattice-Boltzmann Magnetohydrodynamics<span class="hlt">Simulations</span> on Modern <span class="hlt">Parallel</span> Vector Systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Carter, Jonathan; Oliker, Leonid</p> <p>2006-01-09</p> <p>The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on such platforms has become major concern in high performance computing. The latest generation of custom-built <span class="hlt">parallel</span> vector systems have the potential to address this concern for numerical algorithms with sufficient regularity in their computational structure. In this work, we explore two and three dimensional implementations of a lattice-Boltzmann magnetohydrodynamics (MHD) physics application, on some of today's most powerful supercomputing platforms. Results compare performance between the vector-based Cray X1, Earth <span class="hlt">Simulator</span>, and newly-released NEC SX-8, with the commodity-based superscalar platforms of the IBM Power3, IntelItanium2, and AMD Opteron. Overall results show that the SX-8 attains unprecedented aggregate performance across our evaluated applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5256735','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5256735"><span id="translatedtitle">Monte Carlo <span class="hlt">simulation</span> of photoelectron energization in <span class="hlt">parallel</span> electric fields: Electroglow on Uranus</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Singhal, R.P.; Bhardwaj, A. )</p> <p>1991-09-01</p> <p>A Monte Carlo <span class="hlt">simulation</span> of photoelectron energization and energy degradation in H{sub 2} gas in the presence of <span class="hlt">parallel</span> electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26589009','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26589009"><span id="translatedtitle">Evaluating the Accuracy of Hessian Approximations for <span class="hlt">Direct</span> Dynamics <span class="hlt">Simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Zhuang, Yu; Siebert, Matthew R; Hase, William L; Kay, Kenneth G; Ceotto, Michele</p> <p>2013-01-01</p> <p><span class="hlt">Direct</span> dynamics <span class="hlt">simulations</span> are a very useful and general approach for studying the atomistic properties of complex chemical systems, since an electronic structure theory representation of a system's potential energy surface is possible without the need for fitting an analytic potential energy function. In this paper, recently introduced compact finite difference (CFD) schemes for approximating the Hessian [J. Chem. Phys.2010, 133, 074101] are tested by employing the monodromy matrix equations of motion. Several systems, including carbon dioxide and benzene, are <span class="hlt">simulated</span>, using both analytic potential energy surfaces and on-the-fly <span class="hlt">direct</span> dynamics. The results show, depending on the molecular system, that electronic structure theory Hessian <span class="hlt">direct</span> dynamics can be accelerated up to 2 orders of magnitude. The CFD approximation is found to be robust enough to deal with chaotic motion, concomitant with floppy and stiff mode dynamics, Fermi resonances, and other kinds of molecular couplings. Finally, the CFD approximations allow parametrical tuning of different CFD parameters to attain the best possible accuracy for different molecular systems. Thus, a <span class="hlt">direct</span> dynamics <span class="hlt">simulation</span> requiring the Hessian at every integration step may be replaced with an approximate Hessian updating by tuning the appropriate accuracy. PMID:26589009</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1997PhDT.......250L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1997PhDT.......250L"><span id="translatedtitle"><span class="hlt">Simulations</span> of the loading and radiated sound of airfoils and wings in unsteady flow using computational aeroacoustics and <span class="hlt">parallel</span> computers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lockard, David Patrick</p> <p></p> <p>This thesis makes contributions towards the use of computational aeroacoustics (CAA) as a tool for noise analysis. CAA uses numerical methods to <span class="hlt">simulate</span> acoustic phenomena. CAA algorithms have been shown to reproduce wave propagation much better than traditional computational fluid dynamics (CFD) methods. In the current approach, a finite-difference, time-domain algorithm is used to <span class="hlt">simulate</span> unsteady, compressible flows. Dispersion-relation-preserving methodology is used to extend the range of frequencies that can be represented properly by the scheme. Since CAA algorithms are relatively inefficient at obtaining a steady-state solution, multigrid methods are applied to accelerate the convergence. All of the calculations are performed on <span class="hlt">parallel</span> computers. Excellent speedup ratios are obtained for the explicit, time-stepping algorithm used in this research. A common problem in the area of broadband noise is the prediction of the acoustic field generated by a vortical gust impinging on a solid body. The problem is modeled initially in two-dimensions by a flat plate experiencing a uniform mean flow with a sinusoidal, vertical velocity perturbation. Good agreement is obtained with results from semi-analytic methods for several gust frequencies. Then, a cascade of plates is used to <span class="hlt">simulate</span> a turbomachinery blade row. A new approach is used to impose the vortical disturbance inside the computational domain rather than imposing it at the computational boundary. The influence of the mean flow on the radiated noise is examined by considering NACA0012 and RAE2822 airfoils. After a steady-state is obtained from the multigrid method, the un-steady <span class="hlt">simulation</span> is used to model the vortical gust's interaction with the airfoil. The mean loading on the airfoil is shown to have a significant effect on the <span class="hlt">directivity</span> of the sound with the strongest influence observed for high frequencies. Camber is shown to have a similar effect as the angle of attack. A three-dimensional problem</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001IJNMF..35..593K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001IJNMF..35..593K"><span id="translatedtitle"><span class="hlt">Direct</span> numerical <span class="hlt">simulation</span> of wall turbulent flows with microbubbles</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Kanai, Akihiro; Miyata, Hideaki</p> <p>2001-03-01</p> <p>The marker-density-function (MDF) method has been developed to conduct <span class="hlt">direct</span> numerical <span class="hlt">simulation</span> (DNS) for bubbly flows. The method is applied to turbulent bubbly channel flows to elucidate the interaction between bubbles and wall turbulence. The <span class="hlt">simulation</span> is designed to clarify the structure of the turbulent boundary layer containing microbubbles and the mechanism of frictional drag reduction. It is deduced from the numerical tests that the interaction between bubbles and wall turbulence depends on the Weber and Froude numbers. The reduction of the frictional resistance on the wall is attained and its mechanism is explained from the modulation of the three-dimensional structure of the turbulent flow. Copyright</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1015777','DOE-PATENT-XML'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1015777"><span id="translatedtitle">Chaining <span class="hlt">direct</span> memory access data transfer operations for compute nodes in a <span class="hlt">parallel</span> computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/doepatents">DOEpatents</a></p> <p>Archer, Charles J.; Blocksome, Michael A.</p> <p>2010-09-28</p> <p>Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a <span class="hlt">parallel</span> computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1170375','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1170375"><span id="translatedtitle">Self-pacing <span class="hlt">direct</span> memory access data transfer operations for compute nodes in a <span class="hlt">parallel</span> computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Blocksome, Michael A</p> <p>2015-02-17</p> <p>Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a <span class="hlt">parallel</span> computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22036752','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22036752"><span id="translatedtitle">Oscillation modes of <span class="hlt">direct</span> current microdischarges with <span class="hlt">parallel</span>-plate geometry</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Stefanovic, Ilija; Kuschel, Thomas; Winter, Joerg; Skoro, Nikola; Maric, Dragana; Petrovic, Zoran Lj</p> <p>2011-10-15</p> <p>Two different oscillation modes in microdischarge with <span class="hlt">parallel</span>-plate geometry have been observed: relaxation oscillations with frequency range between 1.23 and 2.1 kHz and free-running oscillations with 7 kHz frequency. The oscillation modes are induced by increasing power supply voltage or discharge current. For a given power supply voltage, there is a spontaneous transition from one to other oscillation mode and vice versa. Before the transition from relaxation to free-running oscillations, the spontaneous increase of oscillation frequency of relaxation oscillations form 1.3 kHz to 2.1 kHz is measured. Fourier transform spectra of relaxation oscillations reveal chaotic behavior of microdischarges. Volt-ampere (V-A) characteristics associated with relaxation oscillations describes periodical transition between low current, diffuse discharge, and normal glow. However, free-running oscillations appear in subnormal glow only.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011APS..DFDH13010C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011APS..DFDH13010C"><span id="translatedtitle">Towards <span class="hlt">direct</span> numerical <span class="hlt">simulation</span> of pressure swirl injectors with realistic geometries</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Czajkowski, Mark; Desjardins, Olivier</p> <p>2011-11-01</p> <p>Atomization of hydrocarbon fuels is of critical importance to the transportation sector, in particular for aircraft gas turbine engines. In this work, <span class="hlt">simulations</span> of a Delevan pressure swirl injector with realistic geometry was investigated. Results were compared with <span class="hlt">simulations</span> performed by Fuster et al. (Int J Multiphase Flow, 2009) of a swirl jet at lower density ratios. The pressure swirl injector is used for many applications and is a component within air-blast injectors commonly found in gas turbines and aeroengines. <span class="hlt">Direct</span> numerical <span class="hlt">simulation</span> of the pressure swirl injection process has the potential to provide much-needed information about the complex physics of atomization in swirling flows, but has yet to be used due to the interaction of a complex turbulent multiphase flow with complicated injector geometries. A variety of novel numerical methods are used to facilitate the numerical <span class="hlt">simulations</span> including a conservative implementation of immersed boundaries used to represent the injector geometry, an accurate interface transport scheme with mass conservation properties based on a discontinuous Galerkin discretization of the conservative level set method, and a novel discretization of the Navier-Stokes convective term allowing for robust <span class="hlt">simulations</span> at high density ratios. <span class="hlt">Simulations</span> were conducted by combining the methods with a fully <span class="hlt">parallelized</span> computational code called NGA.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2009AGUFM.H31B0792H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2009AGUFM.H31B0792H"><span id="translatedtitle">Massively <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Uranium Migration at the Hanford 300 Area</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hammond, G. E.; Lichtner, P. C.</p> <p>2009-12-01</p> <p>Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to <span class="hlt">simulate</span> uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to <span class="hlt">simulate</span> geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to <span class="hlt">simulating</span> radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration <span class="hlt">direction</span>, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components <span class="hlt">simulated</span>, 3D <span class="hlt">simulation</span> of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22403238','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22403238"><span id="translatedtitle"><span class="hlt">Direct</span> numerical <span class="hlt">simulation</span> of turbulent flow in a rotating square duct</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Dai, Yi-Jun; Huang, Wei-Xi Xu, Chun-Xiao; Cui, Gui-Xiang</p> <p>2015-06-15</p> <p>A fully developed turbulent flow in a rotating straight square duct is <span class="hlt">simulated</span> by <span class="hlt">direct</span> numerical <span class="hlt">simulations</span> at Re{sub τ} = 300 and 0 ≤ Ro{sub τ} ≤ 40. The rotating axis is <span class="hlt">parallel</span> to two opposite walls of the duct and normal to the main flow. Variations of the turbulence statistics with the rotation rate are presented, and a comparison with the rotating turbulent channel flow is discussed. Rich secondary flow patterns in the cross section are observed by varying the rotation rate. The appearance of a pair of additional vortices above the pressure wall is carefully examined, and the underlying mechanism is explained according to the budget analysis of the mean momentum equations.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014PhFl...26e3603L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014PhFl...26e3603L"><span id="translatedtitle"><span class="hlt">Direct</span> numerical <span class="hlt">simulation</span> of steady state, three dimensional, laminar flow around a wall mounted cube</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Liakos, Anastasios; Malamataris, Nikolaos A.</p> <p>2014-05-01</p> <p>The topology and evolution of flow around a surface mounted cubical object in three dimensional channel flow is examined for low to moderate Reynolds numbers. <span class="hlt">Direct</span> numerical <span class="hlt">simulations</span> were performed via a home made <span class="hlt">parallel</span> finite element code. The computational domain has been designed according to actual laboratory experiment conditions. Analysis of the results is performed using the three dimensional theory of separation. Our findings indicate that a tornado-like vortex by the side of the cube is present for all Reynolds numbers for which flow was <span class="hlt">simulated</span>. A horseshoe vortex upstream from the cube was formed at Reynolds number approximately 1266. Pressure distributions are shown along with three dimensional images of the tornado-like vortex and the horseshoe vortex at selected Reynolds numbers. Finally, and in accordance to previous work, our results indicate that the upper limit for the Reynolds number for which steady state results are physically realizable is roughly 2000.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013NewA...23....6D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013NewA...23....6D"><span id="translatedtitle">Swarm-NG: A CUDA library for <span class="hlt">Parallel</span> n-body Integrations with focus on <span class="hlt">simulations</span> of planetary systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Dindar, Saleh; Ford, Eric B.; Juric, Mario; Yeo, Young In; Gao, Jianwei; Boley, Aaron C.; Nelson, Benjamin; Peters, Jörg</p> <p>2013-10-01</p> <p>We present Swarm-NG, a C++ library for the efficient <span class="hlt">direct</span> integration of many n-body systems using a Graphics Processing Unit (GPU), such as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body <span class="hlt">simulations</span> with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems with 3…15 bodies each, as is typical for the study of planetary systems. Swarm-NG <span class="hlt">parallelizes</span> the <span class="hlt">simulation</span>, including both the numerical integration of the equations of motion and the evaluation of forces using NVIDIA's "Compute Unified Device Architecture" (CUDA) on the GPU. Swarm-NG includes optimized implementations of 4th order time-symmetrized Hermite integration and mixed variable symplectic integration, as well as several sample codes for other algorithms to illustrate how non-CUDA-savvy users may themselves introduce customized integrators into the Swarm-NG framework. To optimize performance, we analyze the effect of GPU-specific parameters on performance under double precision. For an ensemble of 131072 planetary systems, each containing three bodies, the NVIDIA Tesla M2070 GPU outperforms a 6-core Intel Xeon X5675 CPU by a factor of ˜2.75. Thus, we conclude that modern GPUs offer an attractive alternative to a cluster of CPUs for the integration of an ensemble of many few-body systems. Applications of Swarm-NG include studying the late stages of planet formation, testing the stability of planetary systems and evaluating the goodness-of-fit between many planetary system models and observations of extrasolar planet host stars (e.g., radial velocity, astrometry, transit timing). While Swarm-NG focuses on the <span class="hlt">parallel</span> integration of many planetary systems, the underlying integrators could be applied to a wide variety of problems that require repeatedly integrating a set of ordinary differential equations many times using different initial conditions and/or parameter values.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2015EGUGA..17.6111S&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2015EGUGA..17.6111S&link_type=ABSTRACT"><span id="translatedtitle">A heterogeneous and <span class="hlt">parallel</span> computing framework for high-resolution hydrodynamic <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Smith, Luke; Liang, Qiuhua</p> <p>2015-04-01</p> <p>Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of <span class="hlt">parallel</span> processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic <span class="hlt">simulations</span> across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. <span class="hlt">Simulations</span> for the three-day event could be performed</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19840017642','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19840017642"><span id="translatedtitle"><span class="hlt">Direct</span> <span class="hlt">simulations</span> of chemically reacting turbulent mixing layers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Riley, J. J.; Metcalfe, R. W.</p> <p>1984-01-01</p> <p>The report presents the results of <span class="hlt">direct</span> numerical <span class="hlt">simulations</span> of chemically reacting turbulent mixing layers. The work consists of two parts: (1) the development and testing of a spectral numerical computer code that treats the diffusion reaction equations; and (2) the <span class="hlt">simulation</span> of a series of cases of chemical reactions occurring on mixing layers. The reaction considered is a binary, irreversible reaction with no heat release. The reacting species are nonpremixed. The results of the numerical tests indicate that the high accuracy of the spectral methods observed for rigid body rotation are also obtained when diffusion, reaction, and more complex flows are considered. In the <span class="hlt">simulations</span>, the effects of vortex rollup and smaller scale turbulence on the overall reaction rates are investigated. The <span class="hlt">simulation</span> results are found to be in approximate agreement with similarity theory. Comparisons of <span class="hlt">simulation</span> results with certain modeling hypotheses indicate limitations in these hypotheses. The nondimensional product thickness computed from the <span class="hlt">simulations</span> is compared with laboratory values and is found to be in reasonable agreement, especially since there are no adjustable constants in the method.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1988fri..rept.....M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1988fri..rept.....M"><span id="translatedtitle"><span class="hlt">Direct</span> <span class="hlt">simulations</span> of chemically reacting turbulent mixing layers, part 2</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Metcalfe, Ralph W.; McMurtry, Patrick A.; Jou, Wen-Huei; Riley, James J.; Givi, Peyman</p> <p>1988-06-01</p> <p>The results of <span class="hlt">direct</span> numerical <span class="hlt">simulations</span> of chemically reacting turbulent mixing layers are presented. This is an extension of earlier work to a more detailed study of previous three dimensional <span class="hlt">simulations</span> of cold reacting flows plus the development, validation, and use of codes to <span class="hlt">simulate</span> chemically reacting shear layers with heat release. Additional analysis of earlier <span class="hlt">simulations</span> showed good agreement with self similarity theory and laboratory data. <span class="hlt">Simulations</span> with a two dimensional code including the effects of heat release showed that the rate of chemical product formation, the thickness of the mixing layer, and the amount of mass entrained into the layer all decrease with increasing rates of heat release. Subsequent three dimensional <span class="hlt">simulations</span> showed similar behavior, in agreement with laboratory observations. Baroclinic torques and thermal expansion in the mixing layer were found to produce changes in the flame vortex structure that act to diffuse the pairing vortices, resulting in a net reduction in vorticity. Previously unexplained anomalies observed in the mean velocity profiles of reacting jets and mixing layers were shown to result from vorticity generation by baroclinic torques.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880016473','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880016473"><span id="translatedtitle"><span class="hlt">Direct</span> <span class="hlt">simulations</span> of chemically reacting turbulent mixing layers, part 2</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Metcalfe, Ralph W.; Mcmurtry, Patrick A.; Jou, Wen-Huei; Riley, James J.; Givi, Peyman</p> <p>1988-01-01</p> <p>The results of <span class="hlt">direct</span> numerical <span class="hlt">simulations</span> of chemically reacting turbulent mixing layers are presented. This is an extension of earlier work to a more detailed study of previous three dimensional <span class="hlt">simulations</span> of cold reacting flows plus the development, validation, and use of codes to <span class="hlt">simulate</span> chemically reacting shear layers with heat release. Additional analysis of earlier <span class="hlt">simulations</span> showed good agreement with self similarity theory and laboratory data. <span class="hlt">Simulations</span> with a two dimensional code including the effects of heat release showed that the rate of chemical product formation, the thickness of the mixing layer, and the amount of mass entrained into the layer all decrease with increasing rates of heat release. Subsequent three dimensional <span class="hlt">simulations</span> showed similar behavior, in agreement with laboratory observations. Baroclinic torques and thermal expansion in the mixing layer were found to produce changes in the flame vortex structure that act to diffuse the pairing vortices, resulting in a net reduction in vorticity. Previously unexplained anomalies observed in the mean velocity profiles of reacting jets and mixing layers were shown to result from vorticity generation by baroclinic torques.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013JChPh.139g4114B&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013JChPh.139g4114B&link_type=ABSTRACT"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-01</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution time/<span class="hlt">parallel</span> execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22303583','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22303583"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup ((serial execution time)/(<span class="hlt">parallel</span> execution time) ) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li class="active"><span>25</span></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_25 --> <center> <div class="footer-extlink text-muted"><small>Some links on this page may take you to non-federal websites. Their policies may differ from this site.</small> </div> </center> <div id="footer-wrapper"> <div class="footer-content"> <div id="footerOSTI" class=""> <div class="row"> <div class="col-md-4 text-center col-md-push-4 footer-content-center"><small><a href="http://www.science.gov/disclaimer.html">Privacy and Security</a></small> <div class="visible-sm visible-xs push_footer"></div> </div> <div class="col-md-4 text-center col-md-pull-4 footer-content-left"> <img src="http://www.osti.gov/images/DOE_SC31.png" alt="U.S. Department of Energy" usemap="#doe" height="31" width="177"><map style="display:none;" name="doe" id="doe"><area shape="rect" coords="1,3,107,30" href="http://www.energy.gov" alt="U.S. Deparment of Energy"><area shape="rect" coords="114,3,165,30" href="http://www.science.energy.gov" alt="Office of Science"></map> <a ref="http://www.osti.gov" style="margin-left: 15px;"><img src="http://www.osti.gov/images/footerimages/ostigov53.png" alt="Office of Scientific and Technical Information" height="31" width="53"></a> <div class="visible-sm visible-xs push_footer"></div> </div> <div class="col-md-4 text-center footer-content-right"> <a href="http://www.osti.gov/nle"><img src="http://www.osti.gov/images/footerimages/NLElogo31.png" alt="National Library of Energy" height="31" width="79"></a> <a href="http://www.science.gov"><img src="http://www.osti.gov/images/footerimages/scigov77.png" alt="science.gov" height="31" width="98"></a> <a href="http://worldwidescience.org"><img src="http://www.osti.gov/images/footerimages/wws82.png" alt="WorldWideScience.org" height="31" width="90"></a> </div> </div> </div> </div> </div> <p><br></p> </div><!-- container --> </body> </html>