Science.gov

Sample records for parallel pde-based simulations

  1. Parallel PDE-Based Simulations Using the Common Component Architecture

    SciTech Connect

    McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

    2006-03-05

    Summary. The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of componentbased software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and generalpurpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications.

  2. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  3. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  4. Parallel system simulation

    SciTech Connect

    Tai, H.M.; Saeks, R.

    1984-03-01

    A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.

  5. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  6. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  7. A PDE-Based Regularization Algorithm Toward Reducing Speckle Tracking Noise: A Feasibility Study for Ultrasound Breast Elastography.

    PubMed

    Guo, Li; Xu, Yan; Xu, Zhengfu; Jiang, Jingfeng

    2015-10-01

    Obtaining accurate ultrasonically estimated displacements along both axial (parallel to the acoustic beam) and lateral (perpendicular to the beam) directions is an important task for various clinical elastography applications (e.g., modulus reconstruction and temperature imaging). In this study, a partial differential equation (PDE)-based regularization algorithm was proposed to enhance motion tracking accuracy. More specifically, the proposed PDE-based algorithm, utilizing two-dimensional (2D) displacement estimates from a conventional elastography system, attempted to iteratively reduce noise contained in the original displacement estimates by mathematical regularization. In this study, tissue incompressibility was the physical constraint used by the above-mentioned mathematical regularization. This proposed algorithm was tested using computer-simulated data, a tissue-mimicking phantom, and in vivo breast lesion data. Computer simulation results demonstrated that the method significantly improved the accuracy of lateral tracking (e.g., a factor of 17 at 0.5% compression). From in vivo breast lesion data investigated, we have found that, as compared with the conventional method, higher quality axial and lateral strain images (e.g., at least 78% improvements among the estimated contrast-to-noise ratios of lateral strain images) were obtained. Our initial results demonstrated that this conceptually and computationally simple method could be useful for improving the image quality of ultrasound elastography with current clinical equipment as a post-processing tool. PMID:25452434

  8. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  9. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  10. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  11. A new parallel simulation technique

    NASA Astrophysics Data System (ADS)

    Blanco-Pillado, Jose J.; Olum, Ken D.; Shlaer, Benjamin

    2012-01-01

    We develop a "semi-parallel" simulation technique suggested by Pretorius and Lehner, in which the simulation spacetime volume is divided into a large number of small 4-volumes that have only initial and final surfaces. Thus there is no two-way communication between processors, and the 4-volumes can be simulated independently and potentially at different times. This technique allows us to simulate much larger volumes than we otherwise could, because we are not limited by total memory size. No processor time is lost waiting for other processors. We compare a cosmic string simulation we developed using the semi-parallel technique with our previous MPI-based code for several test cases and find a factor of 2.6 improvement in the total amount of processor time required to accomplish the same job for strings evolving in the matter-dominated era.

  12. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  13. Parallel Network Simulations with NEURON

    PubMed Central

    Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

    2009-01-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

  14. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  15. Parallel execution and scriptability in micromagnetic simulations

    NASA Astrophysics Data System (ADS)

    Fischbacher, Thomas; Franchin, Matteo; Bordignon, Giuliano; Knittel, Andreas; Fangohr, Hans

    2009-04-01

    We demonstrate the feasibility of an "encapsulated parallelism" approach toward micromagnetic simulations that combines offering a high degree of flexibility to the user with the efficient utilization of parallel computing resources. While parallelization is obviously desirable to address the high numerical effort required for realistic micromagnetic simulations through utilizing now widely available multiprocessor systems (including desktop multicore CPUs and computing clusters), conventional approaches toward parallelization impose strong restrictions on the structure of programs: numerical operations have to be executed across all processors in a synchronized fashion. This means that from the user's perspective, either the structure of the entire simulation is rigidly defined from the beginning and cannot be adjusted easily, or making modifications to the computation sequence requires advanced knowledge in parallel programming. We explain how this dilemma is resolved in the NMAG simulation package in such a way that the user can utilize without any additional effort on his side both the computational power of multiple CPUs and the flexibility to tailor execution sequences for specific problems: simulation scripts written for single-processor machines can just as well be executed on parallel machines and behave in precisely the same way, up to increased speed. We provide a simple instructive magnetic resonance simulation example that demonstrates utilizing both custom execution sequences and parallelism at the same time. Furthermore, we show that this strategy of encapsulating parallelism even allows to benefit from speed gains through parallel execution in simulations controlled by interactive commands given at a command line interface.

  16. Structured building model reduction toward parallel simulation

    SciTech Connect

    Dobbs, Justin R.; Hencey, Brondon M.

    2013-08-26

    Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

  17. Data parallel sorting for particle simulation

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1992-01-01

    Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

  18. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  19. Acoustic simulation in architecture with parallel algorithm

    NASA Astrophysics Data System (ADS)

    Li, Xiaohong; Zhang, Xinrong; Li, Dan

    2004-03-01

    In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

  20. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  1. Stochastic Parallel PARticle Kinetic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2008-07-01

    SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore » simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less

  2. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  3. Parallel processing of a rotating shaft simulation

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.

    1989-01-01

    A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

  4. The Xyce Parallel Electronic Simulator - An Overview

    SciTech Connect

    HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.

    2000-12-08

    The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.

  5. Parallel and Distributed System Simulation

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack

    1998-01-01

    This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.

  6. Xyce parallel electronic simulator release notes.

    SciTech Connect

    Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  7. On extending parallelism to serial simulators

    NASA Technical Reports Server (NTRS)

    Nicol, David; Heidelberger, Philip

    1994-01-01

    This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.

  8. Parallel Performance of a Combustion Chemistry Simulation

    DOE PAGESBeta

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  9. Denoising of brain MRI images using modified PDE based on pixel similarity

    NASA Astrophysics Data System (ADS)

    Jin, Renchao; Song, Enmin; Zhang, Lijuan; Min, Zhifang; Xu, Xiangyang; Huang, Chih-Cheng

    2008-03-01

    Although various image denoising methods such as PDE-based algorithms have made remarkable progress in the past years, the trade-off between noise reduction and edge preservation is still an interesting and difficult problem in the field of image processing and analysis. A new image denoising algorithm, using a modified PDE model based on pixel similarity, is proposed to deal with the problem. The pixel similarity measures the similarity between two pixels. Then the neighboring consistency of the center pixel can be calculated. Informally, if a pixel is not consistent enough with its surrounding pixels, it can be considered as a noise, but an extremely strong inconsistency suggests an edge. The pixel similarity is a probability measure, its value is between 0 and 1. According to the neighboring consistency of the pixel, a diffusion control factor can be determined by a simple thresholding rule. The factor is combined into the primary partial differential equation as an adjusting factor for controlling the speed of diffusion for different type of pixels. An evaluation of the proposed algorithm on the simulated brain MRI images was carried out. The initial experimental results showed that the new algorithm can smooth the MRI images better while keeping the edges better and achieve higher peak signal to noise ratio (PSNR) comparing with several existing denoising algorithms.

  10. Parallel Simulation of Unsteady Turbulent Flames

    NASA Technical Reports Server (NTRS)

    Menon, Suresh

    1996-01-01

    Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics

  11. Parallel algorithm strategies for circuit simulation.

    SciTech Connect

    Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

    2010-01-01

    Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

  12. Inflated speedups in parallel simulations via malloc()

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

  13. Xyce parallel electronic simulator : reference guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  14. Parallel Implicit Kinetic Simulation with PARSEK

    NASA Astrophysics Data System (ADS)

    Stefano, Markidis; Giovanni, Lapenta

    2004-11-01

    Kinetic plasma simulation is the ultimate tool for plasma analysis. One of the prime tools for kinetic simulation is the particle in cell (PIC) method. The explicit or semi-implicit (i.e. implicit only on the fields) PIC method requires exceedingly small time steps and grid spacing, limited by the necessity to resolve the electron plasma frequency, the Debye length and the speed of light (for fully explicit schemes). A different approach is to consider fully implicit PIC methods where both particles and fields are discretized implicitly. This approach allows radically larger time steps and grid spacing, reducing the cost of a simulation by orders of magnitude while keeping the full kinetic treatment. In our previous work, simulations impossible for the explicit PIC method even on massively parallel computers have been made possible on a single processor machine using the implicit PIC code CELESTE3D [1]. We propose here another quantum leap: PARSEK, a parallel cousin of CELESTE3D, based on the same approach but sporting a radically redesigned software architecture (object oriented C++, where CELESTE3D was structured and written in FORTRAN77/90) and fully parallelized using MPI for both particle and grid communication. [1] G. Lapenta, J.U. Brackbill, W.S. Daughton, Phys. Plasmas, 10, 1577 (2003).

  15. Parallel node placement method by bubble simulation

    NASA Astrophysics Data System (ADS)

    Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

    2014-03-01

    An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

  16. Parallel-distributed mobile robot simulator

    NASA Astrophysics Data System (ADS)

    Okada, Hiroyuki; Sekiguchi, Minoru; Watanabe, Nobuo

    1996-06-01

    The aim of this project is to achieve an autonomous learning and growth function based on active interaction with the real world. It should also be able to autonomically acquire knowledge about the context in which jobs take place, and how the jobs are executed. This article describes a parallel distributed movable robot system simulator with an autonomous learning and growth function. The autonomous learning and growth function which we are proposing is characterized by its ability to learn and grow through interaction with the real world. When the movable robot interacts with the real world, the system compares the virtual environment simulation with the interaction result in the real world. The system then improves the virtual environment to match the real-world result more closely. This the system learns and grows. It is very important that such a simulation is time- realistic. The parallel distributed movable robot simulator was developed to simulate the space of a movable robot system with an autonomous learning and growth function. The simulator constructs a virtual space faithful to the real world and also integrates the interfaces between the user, the actual movable robot and the virtual movable robot. Using an ultrafast CG (computer graphics) system (FUJITSU AG series), time-realistic 3D CG is displayed.

  17. Xyce(™) Parallel Electronic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  18. Xyce(™) Parallel Electronic Simulator

    SciTech Connect

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.

  19. Parallelism extraction and program restructuring for parallel simulation of digital systems

    SciTech Connect

    Vellandi, B.L.

    1990-01-01

    Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.

  20. Parallel Strategies for Crash and Impact Simulations

    SciTech Connect

    Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

    1998-12-07

    We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

  1. Massively Parallel Direct Simulation of Multiphase Flow

    SciTech Connect

    COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

    2000-08-10

    The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

  2. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  3. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  4. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  5. A polymorphic reconfigurable emulator for parallel simulation

    NASA Technical Reports Server (NTRS)

    Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

    1980-01-01

    Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

  6. Parallel multiscale simulations of a brain aneurysm

    SciTech Connect

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in

  7. Parallel multiscale simulations of a brain aneurysm

    NASA Astrophysics Data System (ADS)

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  8. Parallel multiscale simulations of a brain aneurysm.

    PubMed

    Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  9. A parallel algorithm for implicit depletant simulations

    NASA Astrophysics Data System (ADS)

    Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.

    2015-11-01

    We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ϕc < 0.50, and additionally enables simulation of the fluid-solid transition.

  10. A scalable parallel black oil simulator on distributed memory parallel computers

    NASA Astrophysics Data System (ADS)

    Wang, Kun; Liu, Hui; Chen, Zhangxin

    2015-11-01

    This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

  11. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1997-01-01

    Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be

  12. Parallel magnetic field perturbations in gyrokinetic simulations

    SciTech Connect

    Joiner, N.; Hirose, A.; Dorland, W.

    2010-07-15

    At low beta it is common to neglect parallel magnetic field perturbations on the basis that they are of order beta{sup 2}. This is only true if effects of order beta are canceled by a term in the nablaB drift also of order beta[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B{sub ||} is required for accurate gyrokinetic simulations at low beta for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B{sub ||} are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with eta{sub e}. The effect of B{sub ||} on the ETG instability is shown to depend on the normalized pressure gradient beta{sup '} at constant beta.

  13. PDE-based geophysical modelling using finite elements: examples from 3D resistivity and 2D magnetotellurics

    NASA Astrophysics Data System (ADS)

    Schaa, R.; Gross, L.; du Plessis, J.

    2016-04-01

    We present a general finite-element solver, escript, tailored to solve geophysical forward and inverse modeling problems in terms of partial differential equations (PDEs) with suitable boundary conditions. Escript’s abstract interface allows geoscientists to focus on solving the actual problem without being experts in numerical modeling. General-purpose finite element solvers have found wide use especially in engineering fields and find increasing application in the geophysical disciplines as these offer a single interface to tackle different geophysical problems. These solvers are useful for data interpretation and for research, but can also be a useful tool in educational settings. This paper serves as an introduction into PDE-based modeling with escript where we demonstrate in detail how escript is used to solve two different forward modeling problems from applied geophysics (3D DC resistivity and 2D magnetotellurics). Based on these two different cases, other geophysical modeling work can easily be realized. The escript package is implemented as a Python library and allows the solution of coupled, linear or non-linear, time-dependent PDEs. Parallel execution for both shared and distributed memory architectures is supported and can be used without modifications to the scripts.

  14. Parallel Simulation of Explosion in AN Unlimited Atmosphere

    NASA Astrophysics Data System (ADS)

    Ma, Tianbao; Wang, Cheng; Fei, Guanglei; Ning, Jianguo

    In this paper, a parallel Eulerian hydrocode for the simulation of large scale complicated explosion and impact problem is developed. The data dependency in the parallel algorithm is studied in particular. As a test, the three dimensional numerical simulation of the explosion field in an unlimited atmosphere is performed. The numerical results are in good agreement with the empirical results, indicating that the proposed parallel algorithm in this paper is valid. Finally, the parallel speedup and parallel efficiency under different dividing domain areas are analyzed.

  15. Improving the performance of molecular dynamics simulations on parallel clusters.

    PubMed

    Borstnik, Urban; Hodoscek, Milan; Janezic, Dusanka

    2004-01-01

    In this article a procedure is derived to obtain a performance gain for molecular dynamics (MD) simulations on existing parallel clusters. Parallel clusters use a wide array of interconnection technologies to connect multiple processors together, often at different speeds, such as multiple processor computers and networking. It is demonstrated how to configure existing programs for MD simulations to efficiently handle collective communication on parallel clusters with processor interconnections of different speeds. PMID:15032512

  16. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  17. Parallel methods for dynamic simulation of multiple manipulator systems

    NASA Technical Reports Server (NTRS)

    Mcmillan, Scott; Sadayappan, P.; Orin, David E.

    1993-01-01

    In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

  18. Theory and simulation of collisionless parallel shocks

    NASA Technical Reports Server (NTRS)

    Quest, K. B.

    1988-01-01

    This paper presents a self-consistent theoretical model for collisionless parallel shock structure, based on the hypothesis that shock dissipation and heating can be provided by electromagnetic ion beam-driven instabilities. It is shown that shock formation and plasma heating can result from parallel propagating electromagnetic ion beam-driven instabilities for a wide range of Mach numbers and upstream plasma conditions. The theoretical predictions are compared with recently published observations of quasi-parallel interplanetary shocks. It was found that low Mach number interplanetary shock observations were consistent with the explanation that group-standing waves are providing the dissipation; two high Mach number observations confirmed the theoretically predicted rapid thermalization across the shock.

  19. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  20. Parallel discrete-event simulation of FCFS stochastic queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.

  1. Xyce parallel electronic simulator : users' guide. Version 5.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical

  2. Xyce Parallel Electronic Simulator : users' guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical

  3. A conservative approach to parallelizing the Sharks World simulation

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Riffe, Scott E.

    1990-01-01

    Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.

  4. Iterative Schemes for Time Parallelization with Application to Reservoir Simulation

    SciTech Connect

    Garrido, I; Fladmark, G E; Espedal, M S; Lee, B

    2005-04-18

    Parallel methods are usually not applied to the time domain because of the inherit sequentialness of time evolution. But for many evolutionary problems, computer simulation can benefit substantially from time parallelization methods. In this paper, they present several such algorithms that actually exploit the sequential nature of time evolution through a predictor-corrector procedure. This sequentialness ensures convergence of a parallel predictor-corrector scheme within a fixed number of iterations. The performance of these novel algorithms, which are derived from the classical alternating Schwarz method, are illustrated through several numerical examples using the reservoir simulator Athena.

  5. n-body simulations using message passing parallel computers.

    NASA Astrophysics Data System (ADS)

    Grama, A. Y.; Kumar, V.; Sameh, A.

    The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.

  6. Running Parallel Discrete Event Simulators on Sierra

    SciTech Connect

    Barnes, P. D.; Jefferson, D. R.

    2015-12-03

    In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.

  7. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  8. Traffic simulations on parallel computers using domain decomposition techniques

    SciTech Connect

    Hanebutte, U.R.; Tentner, A.M.

    1995-12-31

    Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.

  9. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  10. A CUDA based parallel multi-phase oil reservoir simulator

    NASA Astrophysics Data System (ADS)

    Zaza, Ayham; Awotunde, Abeeb A.; Fairag, Faisal A.; Al-Mouhamed, Mayez A.

    2016-09-01

    Forward Reservoir Simulation (FRS) is a challenging process that models fluid flow and mass transfer in porous media to draw conclusions about the behavior of certain flow variables and well responses. Besides the operational cost associated with matrix assembly, FRS repeatedly solves huge and computationally expensive sparse, ill-conditioned and unsymmetrical linear system. Moreover, as the computation for practical reservoir dimensions lasts for long times, speeding up the process by taking advantage of parallel platforms is indispensable. By considering the state of art advances in massively parallel computing and the accompanying parallel architecture, this work aims primarily at developing a CUDA-based parallel simulator for oil reservoir. In addition to the initial reported 33 times speed gain compared to the serial version, running experiments showed that BiCGSTAB is a stable and fast solver which could be incorporated in such simulations instead of the more expensive, storage demanding and usually utilized GMRES.

  11. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1998-01-01

    We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10

  12. Improved task scheduling for parallel simulations. Master's thesis

    SciTech Connect

    McNear, A.E.

    1991-12-01

    The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.

  13. Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

    SciTech Connect

    HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

    2002-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models

  14. Parallel Optimization with Large Eddy Simulations

    NASA Astrophysics Data System (ADS)

    Talnikar, Chaitanya; Blonigan, Patrick; Bodart, Julien; Wang, Qiqi; Alex Gorodetsky Collaboration; Jasper Snoek Collaboration

    2014-11-01

    For design optimization results to be useful, the model used must be trustworthy. For turbulent flows, Large Eddy Simulations (LES) can capture separation and other phenomena that traditional models such as RANS struggle with. However, optimization with LES can be challenging because of noisy objective function evaluations. This noise is a consequence of the sampling error of turbulent statistics, or long time averaged quantities of interest, such as the drag of an airfoil or heat transfer to a turbine blade. The sampling error causes the objective function to vary noisily with respect to design parameters for finite time simulations. Furthermore, the noise decays very slowly as computational time increases. Therefore, robustness with noisy objective functions is a crucial prerequisite to optimization candidates for LES. One way of dealing with noisy objective functions is to filter the noise using a surrogate model. Bayesian optimization, which uses Gaussian processes as surrogates, has shown promise in optimizing expensive objective functions. The following talk presents a new approach for optimization with LES incorporating these ideas. Applications to flow control of a turbulent channel and the design of a turbine blade trailing edge are also discussed.

  15. Applying Parallel Processing Techniques to Tether Dynamics Simulation

    NASA Technical Reports Server (NTRS)

    Wells, B. Earl

    1996-01-01

    The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

  16. Parallel Simulation of Underdense Plasma Photocathode Experiments

    NASA Astrophysics Data System (ADS)

    Bruhwiler, David; Hidding, Bernhard; Xi, Yunfeng; Andonian, Gerard; Rosenzweig, James; Cormier-Michel, Estelle

    2013-10-01

    The underdense plasma photocathode concept (aka Trojan horse) is a promising approach to achieving fs-scale electron bunches with pC-scale charge and transverse normalized emittance below 0.01 mm-mrad, yielding peak currents of order 100 A and beam brightness as high as 1019 A /m2 / rad2 , for a wide range of achievable beam energies up to 10 GeV. A proof-of-principle experiment will be conducted at the FACET user facility in early 2014. We present 2D and 3D simulations with physical parameters relevant to the planned experiment. Work supported by DOE under Contract Nos. DE-SC0009533, DE-FG02-07ER46272 and DEFG03-92ER40693, and by ONR under Contract No. N00014-06-1-0925. NERSC computing resources are supported by DOE.

  17. Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

    SciTech Connect

    Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

    2007-01-01

    An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.

  18. Massively parallel switch-level simulation: A feasibility study

    SciTech Connect

    Kravitz, S.A.

    1989-01-01

    This thesis addresses the feasibility of mapping the COSMOS switch-level simulator onto computers with thousands of simple processors. COSMOS Preprocesses transistor networks into equivalent Boolean behavioral models, capturing the switch-level behavior of a circuit in a set of Boolean formulas. The author shows that thousand-fold parallelism exists in the formulas derived by COSMOS for some actual circuits. He exposes this parallelism by eliminating the event list from the simulator, and he demonstrates that this represents an attractive tradeoff given sufficient parallelism in the circuit model. To investigate the feasibility of this approach, he has developed a prototype implementation of the COSMOS simulator on a 32k processor Connection Machine.

  19. Xyce Parallel Electronic Simulator : users' guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce

  20. A hybrid parallel framework for the cellular Potts model simulations

    SciTech Connect

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  1. Parallel runway requirement analysis study. Volume 2: Simulation manual

    NASA Technical Reports Server (NTRS)

    Ebrahimi, Yaghoob S.; Chun, Ken S.

    1993-01-01

    This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.

  2. Parallelization of a Monte Carlo particle transport simulation code

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  3. Efficient parallel CFD-DEM simulations using OpenMP

    NASA Astrophysics Data System (ADS)

    Amritkar, Amit; Deb, Surya; Tafti, Danesh

    2014-01-01

    The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.

  4. PDE-based random-valued impulse noise removal based on new class of controlling functions.

    PubMed

    Wu, Jian; Tang, Chen

    2011-09-01

    This paper is concerned with partial differential equation (PDE)-based image denoising for random-valued impulse noise. We introduce the notion of ENI (the abbreviation for "edge pixels, noisy pixels, and interior pixels") that denotes the number of homogeneous pixels in a local neighborhood and is significantly different for edge pixels, noisy pixels, and interior pixels. We redefine the controlling speed function and the controlling fidelity function to depend on ENI. According to our two controlling functions, the diffusion and fidelity process at edge pixels, noisy pixels, and interior pixels can be selectively carried out. Furthermore, a class of second-order improved and edge-preserving PDE denoising models is proposed based on the two new controlling functions in order to deal with random-valued impulse noise reliably. We demonstrate the performance of the proposed PDEs via application to five standard test images, corrupted by random-valued impulse noise with various noise levels and comparison with the related second-order PDE models and the other special filtering methods for random-valued impulse noise. Our two controlling functions are extended to automatically other PDE models. PMID:21435980

  5. Xyce Parallel Electronic Simulator : reference guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  6. Xyce parallel electronic simulator reference guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  7. Xyce Parallel Electronic Simulator : reference guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  8. Xyce parallel electronic simulator reference guide, version 6.0.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.

    2013-08-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].

  9. Parallel Computing Environments and Methods for Power Distribution System Simulation

    SciTech Connect

    Lu, Ning; Taylor, Zachary T.; Chassin, David P.; Guttromson, Ross T.; Studham, Scott S.

    2005-11-10

    The development of cost-effective high-performance parallel computing on multi-processor super computers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution system simulator (PDSS) takes a bottom-up approach and simulates load at appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its run time increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator (PDSS) to a 128-processor shared-memory super computer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performances of the PDSS running on a standalone PC and on the super computer are compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.

  10. An adaptive synchronization protocol for parallel discrete event simulation

    SciTech Connect

    Bisset, K.R.

    1998-12-01

    Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.

  11. Parallel Performance Optimization of the Direct Simulation Monte Carlo Method

    NASA Astrophysics Data System (ADS)

    Gao, Da; Zhang, Chonglin; Schwartzentruber, Thomas

    2009-11-01

    Although the direct simulation Monte Carlo (DSMC) particle method is more computationally intensive compared to continuum methods, it is accurate for conditions ranging from continuum to free-molecular, accurate in highly non-equilibrium flow regions, and holds potential for incorporating advanced molecular-based models for gas-phase and gas-surface interactions. As available computer resources continue their rapid growth, the DSMC method is continually being applied to increasingly complex flow problems. Although processor clock speed continues to increase, a trend of increasing multi-core-per-node parallel architectures is emerging. To effectively utilize such current and future parallel computing systems, a combined shared/distributed memory parallel implementation (using both Open Multi-Processing (OpenMP) and Message Passing Interface (MPI)) of the DSMC method is under development. The parallel implementation of a new state-of-the-art 3D DSMC code employing an embedded 3-level Cartesian mesh will be outlined. The presentation will focus on performance optimization strategies for DSMC, which includes, but is not limited to, modified algorithm designs, practical code-tuning techniques, and parallel performance optimization. Specifically, key issues important to the DSMC shared memory (OpenMP) parallel performance are identified as (1) granularity (2) load balancing (3) locality and (4) synchronization. Challenges and solutions associated with these issues as they pertain to the DSMC method will be discussed.

  12. Molecular simulation of rheological properties using massively parallel supercomputers

    SciTech Connect

    Bhupathiraju, R.K.; Cui, S.T.; Gupta, S.A.; Cummings, P.T.; Cochran, H.D.

    1996-11-01

    Advances in parallel supercomputing now make possible molecular-based engineering and science calculations that will soon revolutionize many technologies, such as those involving polymers and those involving aqueous electrolytes. We have developed a suite of message-passing codes for classical molecular simulation of such complex fluids and amorphous materials and have completed a number of demonstration calculations of problems of scientific and technological importance with each. In this paper, we will focus on the molecular simulation of rheological properties, particularly viscosity, of simple and complex fluids using parallel implementations of non-equilibrium molecular dynamics. Such calculations represent significant challenges computationally because, in order to reduce the thermal noise in the calculated properties within acceptable limits, large systems and/or long simulated times are required.

  13. PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

    SciTech Connect

    Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A; Popov, Emilian L

    2012-01-01

    At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.

  14. Parallelization of Program to Optimize Simulated Trajectories (POST3D)

    NASA Technical Reports Server (NTRS)

    Hammond, Dana P.; Korte, John J. (Technical Monitor)

    2001-01-01

    This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

  15. Numerical simulation of polymer flows: A parallel computing approach

    SciTech Connect

    Aggarwal, R.; Keunings, R.; Roux, F.X.

    1993-12-31

    We present a parallel algorithm for the numerical simulation of viscoelastic fluids on distributed memory computers. The algorithm has been implemented within a general-purpose commercial finite element package used in polymer processing applications. Results obtained on the Intel iPSC/860 computer demonstrate high parallel efficiency in complex flow problems. However, since the computational load is unknown a priori, load balancing is a challenging issue. We have developed an adaptive allocation strategy which dynamically reallocates the work load to the processors based upon the history of the computational procedure. We compare the results obtained with the adaptive and static scheduling schemes.

  16. Reusable Component Model Development Approach for Parallel and Distributed Simulation

    PubMed Central

    Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

    2014-01-01

    Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

  17. Xyce Parallel Electronic Simulator Users Guide Version 6.2.

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-09-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  18. Parallelizing a DNA simulation code for the Cray MTA-2.

    PubMed

    Bokhari, Shahid H; Glaser, Matthew A; Jordan, Harry F; Lansac, Yves; Sauer, Jon R; Van Zeghbroeck, Bart

    2002-01-01

    The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by Field Effect Transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers. PMID:15838145

  19. Casting Pearls Ballistically: Efficient Massively Parallel Simulation of Particle Deposition

    NASA Astrophysics Data System (ADS)

    Lubachevsky, Boris D.; Privman, Vladimir; Roy, Subhas C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are "cast" vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation.

  20. Casting pearls ballistically: Efficient massively parallel simulation of particle deposition

    SciTech Connect

    Lubachevsky, B.D.; Privman, V.; Roy, S.C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.

  1. The cost of conservative synchronization in parallel discrete event simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.

  2. Numerical simulation of supersonic wake flow with parallel computers

    SciTech Connect

    Wong, C.C.; Soetrisno, M.

    1995-07-01

    Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.

  3. Modularized Parallel Neutron Instrument Simulation on the TeraGrid

    SciTech Connect

    Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E

    2007-01-01

    In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.

  4. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  5. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  6. Synchronous parallel system for emulation and discrete event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor)

    1992-01-01

    A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

  7. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  8. Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

    SciTech Connect

    Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

    2005-06-01

    This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to

  9. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  10. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  11. Massively Parallel Processing for Fast and Accurate Stamping Simulations

    NASA Astrophysics Data System (ADS)

    Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

    2005-08-01

    The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

  12. Mapping a battlefield simulation onto message-passing parallel architectures

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1987-01-01

    Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.

  13. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  14. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David

    1992-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  15. Xyce Parallel Electronic Simulator Users Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  16. Numerical Simulation of Flow Field Within Parallel Plate Plastometer

    NASA Technical Reports Server (NTRS)

    Antar, Basil N.

    2002-01-01

    Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.

  17. Massively parallel simulations of multiphase flows using Lattice Boltzmann methods

    NASA Astrophysics Data System (ADS)

    Ahrenholz, Benjamin

    2010-03-01

    In the last two decades the lattice Boltzmann method (LBM) has matured as an alternative and efficient numerical scheme for the simulation of fluid flows and transport problems. Unlike conventional numerical schemes based on discretizations of macroscopic continuum equations, the LBM is based on microscopic models and mesoscopic kinetic equations. The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics of microscopic or mesoscopic processes so that the macroscopic averaged properties obey the desired macroscopic equations. Especially applications involving interfacial dynamics, complex and/or changing boundaries and complicated constitutive relationships which can be derived from a microscopic picture are suitable for the LBM. In this talk a modified and optimized version of a Gunstensen color model is presented to describe the dynamics of the fluid/fluid interface where the flow field is based on a multi-relaxation-time model. Based on that modeling approach validation studies of contact line motion are shown. Due to the fact that the LB method generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallelization. Hence, it is possible to perform efficient simulations in complex geometries at a large scale by massively parallel computations. Here, the results of drainage and imbibition (Degree of Freedom > 2E11) in natural porous media gained from microtomography methods are presented. Those fully resolved pore scale simulations are essential for a better understanding of the physical processes in porous media and therefore important for the determination of constitutive relationships.

  18. High Performance Parallel Methods for Space Weather Simulations

    NASA Technical Reports Server (NTRS)

    Hunter, Paul (Technical Monitor); Gombosi, Tamas I.

    2003-01-01

    This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.

  19. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  20. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  1. Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

    NASA Technical Reports Server (NTRS)

    Kiris, Cetin C.; Kwak, Dochan; Chan, William

    2000-01-01

    This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.

  2. A Generic Scheduling Simulator for High Performance Parallel Computers

    SciTech Connect

    Yoo, B S; Choi, G S; Jette, M A

    2001-08-01

    It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of

  3. Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

    SciTech Connect

    Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.

    1997-11-01

    An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

  4. Roadmap for efficient parallelization of breast anatomy simulation

    NASA Astrophysics Data System (ADS)

    Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.

    2012-03-01

    A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.

  5. Parallel Molecular Dynamics Stencil : a new parallel computing environment for a large-scale molecular dynamics simulation of solids

    NASA Astrophysics Data System (ADS)

    Shimizu, Futoshi; Kimizuka, Hajime; Kaburaki, Hideo

    2002-08-01

    A new parallel computing environment, called as ``Parallel Molecular Dynamics Stencil'', has been developed to carry out a large-scale short-range molecular dynamics simulation of solids. The stencil is written in C language using MPI for parallelization and designed successfully to separate and conceal parts of the programs describing cutoff schemes and parallel algorithms for data communication. This has been made possible by introducing the concept of image atoms. Therefore, only a sequential programming of the force calculation routine is required for executing the stencil in parallel environment. Typical molecular dynamics routines, such as various ensembles, time integration methods, and empirical potentials, have been implemented in the stencil. In the presentation, the performance of the stencil on parallel computers of Hitachi, IBM, SGI, and PC-cluster using the models of Lennard-Jones and the EAM type potentials for fracture problem will be reported.

  6. Parallel grid library for rapid and flexible simulation development

    NASA Astrophysics Data System (ADS)

    Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

    2013-04-01

    We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and

  7. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  8. Parallel finite element simulation of large ram-air parachutes

    NASA Astrophysics Data System (ADS)

    Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.

    1997-06-01

    In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.

  9. MPSim: A Massively Parallel General Simulation Program for Materials

    NASA Astrophysics Data System (ADS)

    Iotov, Mihail; Gao, Guanghua; Vaidehi, Nagarajan; Cagin, Tahir; Goddard, William A., III

    1997-08-01

    In this talk, we describe a general purpose Massively Parallel Simulation (MPSim) program used for computational materials science and life sciences. We also will present scaling aspects of the program along with several case studies. The program incorporates highly efficient CMM method to accurately calculate the interactions. For studying bulk materials, the program uses the Reduced CMM to account for infinite range sums. The software embodies various advanced molecular dynamics algorithms, energy and structure optimization techniques with a set of analysis tools suitable for large scale structures. The applications using the program range amorphous polymers, liquid-polymer interfaces, large viruses, million atom clusters, surfaces, gas diffusion in polymers. Program is originally developed on KSR in an object oriented fashion and is ported to SGI-PC, and HP-Examplar. Message Passing version is originally implemented on Intel Paragon using NX, then MPI and later tested on Cray T3D, and IBM SP2 platforms.

  10. Parallelizing N-Body Simulations on a Heterogeneous Cluster

    NASA Astrophysics Data System (ADS)

    Stenborg, T. N.

    2009-10-01

    This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.

  11. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    SciTech Connect

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-28

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  12. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    PubMed Central

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-01-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887

  13. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    NASA Astrophysics Data System (ADS)

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  14. Investigation of reflective notching with massively parallel simulation

    NASA Astrophysics Data System (ADS)

    Tadros, Karim H.; Neureuther, Andrew R.; Gamelin, John K.; Guerrieri, Roberto

    1990-06-01

    A massively parallel simulation program TEMPEST is used to investigate the role of topography in generating reflective notching and to study the possibility of reducing effects through the introduction of special properties of resists and antireflection coating materials. The emphasis is on examining physical scattering mechanisms such as focused specular reflections resist thickness interference effects reflections from substrate grains and focusing of incident light by the resist curvature. Specular reflection from topography can focus incident radiation causing a 10-fold increase in effective exposure. Further complications such as dimples in the surface of positive resist features can result from a second reflection of focused energy by the resist/air interface. Variations in line-edge exposure due to substrate grain structure are primarily specular in nature and can become significant for grains larger than )tresi Local exposure variations due to vertical standing waves and changes in energy coupling due to changes in resist thickness are displaced laterally and are significant effects even though they are slightly less severe than vertical wave propagation theory suggests. Focusing effects due to refraction by the curved surface of the resist produce only minor changes in exposure. Increased resist contrast and resist absorption offer some improvement in reducing notching effects though minimizing substrate reflectivity is more effective. CPU time using 32 virtual nodes to simulate a 4 pm by 2 pm isolated domain with 13 bleaching steps was 30 minutes

  15. A PDE-based methodology for modeling, parameter estimation and feedback control in structural and structural acoustic systems

    NASA Technical Reports Server (NTRS)

    Banks, H. T.; Brown, D. E.; Metcalf, Vern L.; Silcox, R. J.; Smith, Ralph C.; Wang, Yun

    1994-01-01

    A problem of continued interest concerns the control of vibrations in a flexible structure and the related problem of reducing structure-borne noise in structural acoustic systems. In both cases, piezoceramic patches bonded to the structures have been successfully used as control actuators. Through the application of a controlling voltage, the patches can be used to reduce structural vibrations which in turn lead to methods for reducing structure-borne noise. A PDE-based methodology for modeling, estimating physical parameters, and implementing a feedback control scheme for problems of this type is discussed. While the illustrating example is a circular plate, the methodology is sufficiently general so as to be applicable in a variety of structural and structural acoustic systems.

  16. Xyce Parallel Electronic Simulator Reference Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  17. Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

    NASA Astrophysics Data System (ADS)

    Furuichi, M.; Nishiura, D.

    2015-12-01

    Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our

  18. Rasterizing geological models for parallel finite difference simulation using seismic simulation as an example

    NASA Astrophysics Data System (ADS)

    Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan

    2016-01-01

    3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.

  19. Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

    NASA Technical Reports Server (NTRS)

    Baganoff, Donald

    1996-01-01

    The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

  20. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  1. A natural partitioning scheme for parallel simulation of multibody systems

    NASA Technical Reports Server (NTRS)

    Chiou, J. C.; Park, K. C.; Farhat, C.

    1993-01-01

    A parallel partitioning scheme based on physical-co-ordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure and a Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

  2. A natural partitioning scheme for parallel simulation of multibody systems

    NASA Technical Reports Server (NTRS)

    Chiou, J. C.; Park, K. C.; Farhat, C.

    1991-01-01

    A parallel partitioning scheme based on physical-coordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure, and Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

  3. Scalable Parallel Formulations of the Barnes-Hut Method for n-Body Simulations

    NASA Astrophysics Data System (ADS)

    Grama, Ananth Y.; Kumar, Vipin; Sameh, Ahmed

    In this paper, we present two new parallel formulations of the Barnes-Hut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation which combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. The second parallel formulation is inspired by two currently best known parallel algorithms for the Barnes-Hut method. We present an experimental evaluation of these schemes on a 256 processor nCUBE2 parallel computer for an astrophysical simulation.

  4. Parallel climate model (PCM) control and transient simulations

    NASA Astrophysics Data System (ADS)

    Washington, W. M.; Weatherly, J. W.; Meehl, G. A.; Semtner, A. J., Jr.; Bettge, T. W.; Craig, A. P.; Strand, W. G., Jr.; Arblaster, J.; Wayland, V. B.; James, R.; Zhang, Y.

    The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3) and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on several distributed and shared memory computer systems. The coupling method is similar to that used in the NCAR Climate System Model (CSM) in that a flux coupler ties the components together, with interpolations between the different grids of the component models. Flux adjustments are not used in the PCM. The ocean component has 2/3° average horizontal grid spacing with 32 vertical levels and a free surface that allows calculation of sea level changes. Near the equator, the grid spacing is approximately 1/2° in latitude to better capture the ocean equatorial dynamics. The North Pole is rotated over northern North America thus producing resolution smaller than 2/3° in the North Atlantic where the sinking part of the world conveyor circulation largely takes place. Because this ocean model component does not have a computational point at the North Pole, the Arctic Ocean circulation systems are more realistic and similar to the observed. The elastic viscous plastic sea ice model has a grid spacing of 27km to represent small-scale features such as ice transport through the Canadian Archipelago and the East Greenland current region. Results from a 300year present-day coupled climate control simulation are presented, as well as for a transient 1% per year compound CO2 increase experiment which shows a global warming of 1.27°C for a 10year average at the doubling point of CO2 and 2.89°C at the quadrupling point. There is a gradual warming beyond the doubling and quadrupling points with CO2 held constant. Globally averaged sea level rise at the time of CO2 doubling is approximately 7cm and at the

  5. Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2008-01-01

    Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.

  6. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    SciTech Connect

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  7. Efficient solid state NMR powder simulations using SMP and MPP parallel computation.

    PubMed

    Kristensen, Jørgen Holm; Farnan, Ian

    2003-04-01

    Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations. PMID:12713968

  8. Empirical development of parallelization guidelines for time-driven simulation. Master's thesis

    SciTech Connect

    Huson, M.L.

    1989-12-01

    Distributed simulation is an area of research which offers great promise for speeding up simulations. Program parallelization is usually an iterative process requiring several attempts to produce an efficient parallel implementation of a sequential program. This is due to the lack of any standards or guidelines for program parallelization. In this research effort a Ballistic Missile Defense (BMD) time-driven simulation program, developed by DESE Research and Engineering, was used as a test vehicle for investigating parallelization options for distributed and shared memory architectures. Implementations were developed to address issues of functional versus data program decomposition, computation versus communications overhead, and shared versus distributed memory architectures. Performance data collected from each implementation was used to develop guidelines for implementing parallel versions of sequential time-driven simulations. These guidelines were based on the relative performance of the various implementations and on general observations made during the course of the research.

  9. Parallel direct numerical simulation of three-dimensional spray formation

    NASA Astrophysics Data System (ADS)

    Chergui, Jalel; Juric, Damir; Shin, Seungwon; Kahouadji, Lyes; Matar, Omar

    2015-11-01

    We present numerical results for the breakup mechanism of a liquid jet surrounded by a fast coaxial flow of air with density ratio (water/air) ~ 1000 and kinematic viscosity ratio ~ 60. We use code BLUE, a three-dimensional, two-phase, high performance, parallel numerical code based on a hybrid Front-Tracking/Level Set algorithm for Lagrangian tracking of arbitrarily deformable phase interfaces and a precise treatment of surface tension forces. The parallelization of the code is based on the technique of domain decomposition where the velocity field is solved by a parallel GMRes method for the viscous terms and the pressure by a parallel multigrid/GMRes method. Communication is handled by MPI message passing procedures. The interface method is also parallelized and defines the interface both by a discontinuous density field as well as by a triangular Lagrangian mesh and allows the interface to undergo large deformations including the rupture and/or coalescence of interfaces. EPSRC Programme Grant, MEMPHIS, EP/K0039761/1.

  10. A sweep algorithm for massively parallel simulation of circuit-switched networks

    NASA Technical Reports Server (NTRS)

    Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

    1992-01-01

    A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

  11. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  12. Parallel implementation of VHDL simulations on the Intel iPSC/2 hypercube. Master's thesis

    SciTech Connect

    Comeau, R.C.

    1991-12-01

    VHDL models are executed sequentially in current commercial simulators. As chip designs grow larger and more complex, simulations must run faster. One approach to increasing simulation speed is through parallel processors. This research transforms the behavioral and structural models created by Intermetrics' sequential VHDL simulator into models for parallel execution. The models are simulated on an Intel iPSC/2 hypercube with synchronization of the nodes being achieved by utilizing the Chandy Misra paradigm for discrete-event simulations. Three eight-bit adders, the ripple carry, the carry save, and the carry-lookahead, are each run through the parallel simulator. Simulation time is cut in at least half for all three test cases over the sequential Intermetrics model. Results with regard to speedup are given to show effects of different mappings, varying workloads per node, and overhead due to output messages.

  13. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  14. Towards parallel I/O in finite element simulations

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Pramono, Eddy; Felippa, Carlos

    1989-01-01

    I/O issues in finite element analysis on parallel processors are addressed. Viable solutions for both local and shared memory multiprocessors are presented. The approach is simple but limited by currently available hardware and software systems. Implementation is carried out on a CRAY-2 system. Performance results are reported.

  15. Comparison of serial and parallel simulations of a corridor fire using FDS

    NASA Astrophysics Data System (ADS)

    Valasek, L.

    2015-09-01

    Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.

  16. The two-level Newton method and its application to electronic simulation.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.

    2004-06-01

    Coupling between transient simulation codes of different fidelity can often be performed at the nonlinear solver level, if the time scales of the two codes are similar. A good example is electrical mixed-mode simulation, in which an analog circuit simulator is coupled to a PDE-based semiconductor device simulator. Semiconductor simulation problems, such as single-event upset (SEU), often require the fidelity of a mesh-based device simulator but are only meaningful when dynamically coupled with an external circuit. For such problems a mixed-level simulator is desirable, but the two types of simulation generally have different (somewhat conflicting) numerical requirements. To address these considerations, we have investigated variations of the two-level Newton algorithm, which preserves tight coupling between the circuit and the PDE device, while optimizing the numerics for both. The research was done within Xyce, a massively parallel electronic simulator under development at Sandia National Laboratories.

  17. Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

    NASA Astrophysics Data System (ADS)

    Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin

    2016-04-01

    This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.

  18. The IDES framework: A case study in development of a parallel discrete-event simulation system

    SciTech Connect

    Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.

    1997-12-31

    This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.

  19. Parallel Adaptive Multi-Mechanics Simulations using Diablo

    SciTech Connect

    Parsons, D; Solberg, J

    2004-12-03

    Coupled multi-mechanics simulations (such as thermal-stress and fluidstructure interaction problems) are of substantial interest to engineering analysts. In addition, adaptive mesh refinement techniques present an attractive alternative to current mesh generation procedures and provide quantitative error bounds that can be used for model verification. This paper discusses spatially adaptive multi-mechanics implicit simulations using the Diablo computer code. (U)

  20. A high resolution finite volume method for efficient parallel simulation of casting processes on unstructured meshes

    SciTech Connect

    Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.

    1997-03-01

    We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

  1. Large Eddy simulation of parallel blade-vortex interaction

    NASA Astrophysics Data System (ADS)

    Felten, Frederic; Lund, Thomas

    2002-11-01

    Helicopter Blade-Vortex Interaction (BVI) generally occurs under certain conditions of powered descent or during extreme maneuvering. The vibration and acoustic problems associated with the interaction of rotor tip vortices and the following blades is a major aerodynamic concern for the helicopter community. Numerous experimental and computational studies have been done over the last two decades in order to gain a better understanding of the physical mechanisms involved in BVI. The most severe interaction, in terms of generated noise, happens when the vortex filament is parallel to the blade, thus affecting a great portion of it. The majority of the previous numerical studies of parallel BVI fall within a potential flow framework. Some Navier-Stokes approaches using dissipative numerical methods and RANS-type turbulence models have also been attempted, but with limited success. The current investigation makes use of an incompressible, non-dissipative, kinetic energy conserving collocated mesh scheme in conjunction with a dynamic subgrid-scale model. The concentrated tip vortex is not attenuated as it is convected downstream and over a NACA-0012 airfoil. The lift, drag, moment and pressure coefficients induced by the passage of the vortex are monitored in time and compared with experimental data.

  2. Parallelized modelling and solution scheme for hierarchically scaled simulations

    NASA Technical Reports Server (NTRS)

    Padovan, Joe

    1995-01-01

    This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

  3. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  4. Xyce parallel electronic simulator users guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  5. Pelegant : a parallel accelerator simulation code for electron generation and tracking.

    SciTech Connect

    Wang, Y.; Borland, M. D.; Accelerator Systems Division

    2006-01-01

    elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

  6. Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers

    SciTech Connect

    Vashishta, Priya; Kalia, Rajiv

    2005-02-24

    Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.

  7. A parallel implementation of the Cellular Potts Model for simulation of cell-based morphogenesis

    PubMed Central

    Chen, Nan; Glazier, James A.; Izaguirre, Jesús A.; Alber, Mark S.

    2007-01-01

    The Cellular Potts Model (CPM) has been used in a wide variety of biological simulations. However, most current CPM implementations use a sequential modified Metropolis algorithm which restricts the size of simulations. In this paper we present a parallel CPM algorithm for simulations of morphogenesis, which includes cell–cell adhesion, a cell volume constraint, and cell haptotaxis. The algorithm uses appropriate data structures and checkerboard subgrids for parallelization. Communication and updating algorithms synchronize properties of cells simulated on different processor nodes. Tests show that the parallel algorithm has good scalability, permitting large-scale simulations of cell morphogenesis (107 or more cells) and broadening the scope of CPM applications. The new algorithm satisfies the balance condition, which is sufficient for convergence of the underlying Markov chain. PMID:18084624

  8. Virtual reality visualization of parallel molecular dynamics simulation

    SciTech Connect

    Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.

    1995-12-31

    When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.

  9. Toward parallel, adaptive mesh refinement for chemically reacting flow simulations

    SciTech Connect

    Devine, K.D.; Shadid, J.N.; Salinger, A.G. Hutchinson, S.A.; Hennigan, G.L.

    1997-12-01

    Adaptive numerical methods offer greater efficiency than traditional numerical methods by concentrating computational effort in regions of the problem domain where the solution is difficult to obtain. In this paper, the authors describe progress toward adding mesh refinement to MPSalsa, a computer program developed at Sandia National laboratories to solve coupled three-dimensional fluid flow and detailed reaction chemistry systems for modeling chemically reacting flow on large-scale parallel computers. Data structures that support refinement and dynamic load-balancing are discussed. Results using uniform refinement with mesh sequencing to improve convergence to steady-state solutions are also presented. Three examples are presented: a lid driven cavity, a thermal convection flow, and a tilted chemical vapor deposition reactor.

  10. Parallel performance optimizations on unstructured mesh-based simulations

    SciTech Connect

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

  11. An Optimization Algorithm for Multipath Parallel Allocation for Service Resource in the Simulation Task Workflow

    PubMed Central

    Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

    2014-01-01

    Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506

  12. Massively Parallel Reactive and Quantum Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Vashishta, Priya

    2015-03-01

    In this talk I will discuss two simulations: Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on a 163,840-processor BlueGene/P supercomputer to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit on the surface. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated. Quantum molecular dynamics (QMD) simulations are performed on 786,432-processor Blue Gene/Q to study on-demand production of hydrogen gas from water using Al nanoclusters. QMD simulations reveal rapid hydrogen production from water by an Al nanocluster. We find a low activation-barrier mechanism, in which a pair of Lewis acid and base sites on the Aln surface preferentially catalyzes hydrogen production. I will also discuss on-demand production of hydrogen gas from water using and LiAl alloy particles. Research reported in this lecture was carried in collaboration with Rajiv Kalia, Aiichiro Nakano and Ken-ichi Nomura from the University of Southern California, and Fuyuki Shimojo and Kohei Shimamura from Kumamoto University, Japan.

  13. Parallel performance optimizations on unstructured mesh-based simulations

    DOE PAGESBeta

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  14. Exploiting quantum parallelism to simulate quantum random many-body systems.

    PubMed

    Paredes, B; Verstraete, F; Cirac, J I

    2005-09-30

    We present an algorithm that exploits quantum parallelism to simulate randomness in a quantum system. In our scheme, all possible realizations of the random parameters are encoded quantum mechanically in a superposition state of an auxiliary system. We show how our algorithm allows for the efficient simulation of dynamics of quantum random spin chains with known numerical methods. We propose an experimental realization based on atoms in optical lattices in which disorder could be simulated in parallel and in a controlled way through the interaction with another atomic species. PMID:16241634

  15. Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

    SciTech Connect

    Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

    2011-09-06

    We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

  16. Parallel simulation of subsonic fluid dynamics on a cluster of workstations

    NASA Astrophysics Data System (ADS)

    Skordos, Panayotis A.

    1994-11-01

    An effective approach of simulating fluid dynamics on a cluster of non-dedicated workstations is presented. The approach uses local interaction algorithms, small communication capacity, and automatic migration of parallel processes from busy hosts to free hosts. The approach is well-suited for simulating subsonic flow problems which involve both hydrodynamics and acoustic waves, for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed measurements of the parallel efficiency of 2D and 3D simulations are presented, and a theoretical model of efficiency is developed which fits closely the measurements. Two numerical methods of fluid dynamics are tested: explicit finite differences, and the lattice Boltzmann method.

  17. Characterization of parallel-hole collimator using Monte Carlo Simulation

    PubMed Central

    Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh

    2015-01-01

    Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730

  18. Partitioning and packing mathematical simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.; Milner, E. J.

    1986-01-01

    The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

  19. SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeff S.

    1992-01-01

    Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

  20. Parallel FEM Simulation of Electromechanics in the Heart

    NASA Astrophysics Data System (ADS)

    Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

    2011-11-01

    Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.

  1. LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS

    SciTech Connect

    R. RYNE; J. QIANG

    2000-08-01

    In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.

  2. A parallel algorithm for transient solid dynamics simulations with contact detection

    SciTech Connect

    Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

    1996-06-01

    Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations.

  3. Monte Carlo simulations of converging laser beam propagating in turbid media with parallel computing

    NASA Astrophysics Data System (ADS)

    Wu, Di; Lu, Jun Q.; Hu, Xin H.; Zhao, S. S.

    1999-11-01

    Due to its flexibility and simplicity, Monte Carlo method is often used to study light propagation in turbid medium where the photons are treated like classic particles being scattered and absorbed randomly based on a radiative transfer theory. However, due to the need of large number of photons to produce statistically significance results, this type of calculations requires large computing resources. To overcome such difficulty, we implemented parallel computing technique into our Monte Carlo simulations. The algorithm is based on the fact that the classic particles are uncorrelated, and the trajectories of multiple photons can be tracked simultaneously. When a beam of focused light incident to the medium, the incident photons are divided into groups according to the available processes on a parallel machine and the calculations are carried out in parallel. Utilizing PVM (Parallel Virtual Machine, a parallel computing software), the parallel programs in both C and FORTRAN are developed on the massive parallel computer Cray T3E at the North Carolina Supercomputer Center and a local PC-cluster network running UNIX/Sun Solaris. The parallel performances of our codes have been excellent on both Cray T3E and the PC clusters. In this paper, we present results on a focusing laser beam propagating through a highly scattering and diluted solution of intralipid. The dependence of the spatial distribution of light near the focal point on the concentration of intralipid solution is studied and its significance is discussed.

  4. Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

    2016-07-01

    Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.

  5. Scalable simulations for directed self-assembly patterning with the use of GPU parallel computing

    NASA Astrophysics Data System (ADS)

    Yoshimoto, Kenji; Peters, Brandon L.; Khaira, Gurdaman S.; de Pablo, Juan J.

    2012-03-01

    Directed self-assembly (DSA) patterning has been increasingly investigated as an alternative lithographic process for future technology nodes. One of the critical specs for DSA patterning is defects generated through annealing process or by roughness of pre-patterned structure. Due to their high sensitivity to the process and wafer conditions, however, characterization of those defects still remain challenging. DSA simulations can be a powerful tool to predict the formation of the DSA defects. In this work, we propose a new method to perform parallel computing of DSA Monte Carlo (MC) simulations. A consumer graphics card was used to access its hundreds of processing units for parallel computing. By partitioning the simulation system into non-interacting domains, we were able to run MC trial moves in parallel on multiple graphics-processing units (GPUs). Our results show a significant improvement in computational performance.

  6. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  7. Robust large-scale parallel nonlinear solvers for simulations.

    SciTech Connect

    Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson

    2005-11-01

    This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write

  8. Simulation of reflooding on two parallel heated channel by TRACE

    NASA Astrophysics Data System (ADS)

    Zakir, Md. Ghulam

    2016-07-01

    In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.

  9. Parallel computation for reservoir thermal simulation: An overlapping domain decomposition approach

    NASA Astrophysics Data System (ADS)

    Wang, Zhongxiao

    2005-11-01

    In this dissertation, we are involved in parallel computing for the thermal simulation of multicomponent, multiphase fluid flow in petroleum reservoirs. We report the development and applications of such a simulator. Unlike many efforts made to parallelize locally the solver of a linear equations system which affects the performance the most, this research takes a global parallelization strategy by decomposing the computational domain into smaller subdomains. This dissertation addresses the domain decomposition techniques and, based on the comparison, adopts an overlapping domain decomposition method. This global parallelization method hands over each subdomain to a single processor of the parallel computer to process. Communication is required when handling overlapping regions between subdomains. For this purpose, MPI (message passing interface) is used for data communication and communication control. A physical and mathematical model is introduced for the reservoir thermal simulation. Numerical tests on two sets of industrial data of practical oilfields indicate that this model and the parallel implementation match the history data accurately. Therefore, we expect to use both the model and the parallel code to predict oil production and guide the design, implementation and real-time fine tuning of new well operating schemes. A new adaptive mechanism to synchronize processes on different processors has been introduced, which not only ensures the computational accuracy but also improves the time performance. To accelerate the convergence rate of iterative solution of the large linear equations systems derived from the discretization of governing equations of our physical and mathematical model in space and time, we adopt the ORTHOMIN method in conjunction with an incomplete LU factorization preconditioning technique. Important improvements have been made in both ORTHOMIN method and incomplete LU factorization in order to enhance time performance without affecting

  10. Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Sawyer, Darren Charles

    1994-01-01

    The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.

  11. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    NASA Technical Reports Server (NTRS)

    Krosel, S. M.; Milner, E. J.

    1982-01-01

    The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

  12. A conflict-free, path-level parallelization approach for sequential simulation algorithms

    NASA Astrophysics Data System (ADS)

    Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.

    2015-07-01

    Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.

  13. Special purpose parallel computer architecture for real-time control and simulation in robotic applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

    1993-01-01

    This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

  14. Virtual Simulator: An infrastructure for design and performance-prediction of massively parallel codes

    NASA Astrophysics Data System (ADS)

    Perumalla, K.; Fujimoto, R.; Pande, S.; Karimabadi, H.; Driscoll, J.; Omelchenko, Y.

    2005-12-01

    Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code.

  15. Parallel simulation of tsunami inundation on a large-scale supercomputer

    NASA Astrophysics Data System (ADS)

    Oishi, Y.; Imamura, F.; Sugawara, D.

    2013-12-01

    An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the

  16. Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics

    SciTech Connect

    Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J.

    1996-09-01

    Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.

  17. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    NASA Technical Reports Server (NTRS)

    Jones, Mark Howard

    1987-01-01

    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

  18. Object-Oriented NeuroSys: Parallel Programs for Simulating Large Networks of Biologically Accurate Neurons

    SciTech Connect

    Pacheco, P; Miller, P; Kim, J; Leese, T; Zabiyaka, Y

    2003-05-07

    Object-oriented NeuroSys (ooNeuroSys) is a collection of programs for simulating very large networks of biologically accurate neurons on distributed memory parallel computers. It includes two principle programs: ooNeuroSys, a parallel program for solving the large systems of ordinary differential equations arising from the interconnected neurons, and Neurondiz, a parallel program for visualizing the results of ooNeuroSys. Both programs are designed to be run on clusters and use the MPI library to obtain parallelism. ooNeuroSys also includes an easy-to-use Python interface. This interface allows neuroscientists to quickly develop and test complex neuron models. Both ooNeuroSys and Neurondiz have a design that allows for both high performance and relative ease of maintenance.

  19. Efficient parallel algorithm for statistical ion track simulations in crystalline materials

    NASA Astrophysics Data System (ADS)

    Jeon, Byoungseon; Grønbech-Jensen, Niels

    2009-02-01

    We present an efficient parallel algorithm for statistical Molecular Dynamics simulations of ion tracks in solids. The method is based on the Rare Event Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has been successfully applied to studies of, e.g., ion implantation into crystalline semiconductor wafers. We discuss the strategies for parallelizing the method, and we settle on a host-client type polling scheme in which a multiple of asynchronous processors are continuously fed to the host, which, in turn, distributes the resulting feed-back information to the clients. This real-time feed-back consists of, e.g., cumulative damage information or statistics updates necessary for the cloning in the rare event algorithm. We finally demonstrate the algorithm for radiation effects in a nuclear oxide fuel, and we show the balanced parallel approach with high parallel efficiency in multiple processor configurations.

  20. Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

    NASA Technical Reports Server (NTRS)

    Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

    2009-01-01

    A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.

  1. A parallel computational framework for integrated surface-subsurface flow and transport simulations

    NASA Astrophysics Data System (ADS)

    Park, Y.; Hwang, H.; Sudicky, E. A.

    2010-12-01

    HydroGeoSphere is a 3D control-volume finite element hydrologic model describing fully-integrated surface and subsurface water flow and solute and thermal energy transport. Because the model solves tighly-coupled highly-nonlinear partial differential equations, often applied at regional and continental scales (for example, to analyze the impact of climate change on water resources), high performance computing (HPC) is essential. The target parallelization includes the composition of the Jacobian matrix for the iterative linearization method and the sparse-matrix solver, a preconditioned Bi-CGSTAB. The matrix assembly is parallelized by using a coarse-grained scheme in that the local matrix compositions can be performed independently. The preconditioned Bi-CGSTAB algorithm performs a number of LU substitutions, matrix-vector multiplications, and inner products, where the parallelization of the LU substitution is not trivial. The parallelization of the solver is achieved by partitioning the domain into equal-size subdomains, with an efficient reordering scheme. The computational flow of the Bi-CGSTAB solver is also modified to reduce the parallelization overhead and to be suitable for parallel architectures. The parallelized model is tested on several benchmark simulations which include linear and nonlinear flow problems involving various domain sizes and degrees of hydrologic complexities. The performance is evaluated in terms of computational robustness and efficiency, using standard scaling performance measures. The results of simulation profiling indicate that the efficiency becomes higher with an increasing number of nodes/elements in the mesh, for increasingly nonlinear transient simulations, and with domains of irregular geometry. These characteristics are promising for the large-scale analysis water resources problems involved integrated surface/subsurface flow regimes.

  2. IB: a Monte Carlo Simulation Tool for Neutron Scattering Instrument Design under Parallel Virtual Machine

    SciTech Connect

    Zhao, Jinkui

    2011-01-01

    IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.

  3. Parallel Brownian dynamics simulations with the message-passing and PGAS programming models

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Sutmann, G.; Taboada, G. L.; Touriño, J.

    2013-04-01

    The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes.

  4. Explicit spatial scattering for load balancing in conservatively synchronized parallel discrete-event simulations

    SciTech Connect

    Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip

    2010-01-01

    We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that

  5. Model for the evolution of the time profile in optimistic parallel discrete event simulations

    NASA Astrophysics Data System (ADS)

    Ziganurova, L.; Novotny, M. A.; Shchur, L. N.

    2016-02-01

    We investigate synchronisation aspects of an optimistic algorithm for parallel discrete event simulations (PDES). We present a model for the time evolution in optimistic PDES. This model evaluates the local virtual time profile of the processing elements. We argue that the evolution of the time profile is reminiscent of the surface profile in the directed percolation problem and in unrestricted surface growth. We present results of the simulation of the model and emphasise predictive features of our approach.

  6. Xyce parallel electronic simulator reference guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  7. Study of Fracture in SiC by Parallel Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Chatterjee, A.; Omeltchenko, A.; Kalia, R. K.; Vashishta, P.

    1997-03-01

    Large scale molecular-dynamics simulations are performed on parallel architectures to investigate dynamic fracture in SiC. The simulations are based on an empirical bond-order potential proposed by Tersoff.(J. Tersoff, Phys. Rev. B 39), 5566(1989) (M. Tang and S. Yip, Phys. Rev. B 52), 15150(1995) Results will be presented for crack-front morphology, crack-tip speed, and the effect of strain rate on dynamic fracture.

  8. Massively parallel simulation of flow and transport in variably saturated porous and fractured media

    SciTech Connect

    Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten

    2002-01-15

    This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.

  9. A Plane-Parallel Wind Solution for Testing Numerical Simulations of Photoevaporation

    NASA Astrophysics Data System (ADS)

    Hutchison, Mark A.; Laibe, Guillaume

    2016-04-01

    Here, we derive a Parker-wind-like solution for a stratified, plane-parallel atmosphere undergoing photoionisation. The difference compared to the standard Parker solar wind is that the sonic point is crossed only at infinity. The simplicity of the analytic solution makes it a convenient test problem for numerical simulations of photoevaporation in protoplanetary discs.

  10. A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Owen, Jeffrey E.

    1988-01-01

    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

  11. Accelerating Markov chain Monte Carlo simulation through sequential updating and parallel computing

    NASA Astrophysics Data System (ADS)

    Ren, Ruichao

    Monte Carlo simulation is a statistical sampling method used in studies of physical systems with properties that cannot be easily obtained analytically. The phase behavior of the Restricted Primitive Model of electrolyte solutions on the simple cubic lattice is studied using grand canonical Monte Carlo simulations and finite-size scaling techniques. The transition between disordered and ordered, NaCl-like structures is continuous, second-order at high temperatures and discrete, first-order at low temperatures. The line of continuous transitions meets the line of first-order transitions at a tricritical point. A new algorithm-Random Skipping Sequential (RSS) Monte Carl---is proposed, justified and shown analytically to have better mobility over the phase space than the conventional Metropolis algorithm satisfying strict detailed balance. The new algorithm employs sequential updating, and yields greatly enhanced sampling statistics than the Metropolis algorithm with random updating. A parallel version of Markov chain theory is introduced and applied in accelerating Monte Carlo simulation via cluster computing. It is shown that sequential updating is the key to reduce the inter-processor communication or synchronization which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time by the new method for systems of large and moderate sizes.

  12. Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

    NASA Technical Reports Server (NTRS)

    Joslin, Ronald D.; Zubair, Mohammad

    1993-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

  13. A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

    PubMed

    Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

    2013-09-15

    A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can

  14. Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP

    NASA Astrophysics Data System (ADS)

    Fan, Houfu; Li, Shaofan

    2016-06-01

    In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.

  15. Relationship between parallel faults and stress field in rock mass based on numerical simulation

    NASA Astrophysics Data System (ADS)

    Imai, Y.; Mikada, H.; Goto, T.; Takekawa, J.

    2012-12-01

    Parallel cracks and faults, caused by earthquakes and crustal deformations, are often observed in various scales from regional to laboratory scales. However, the mechanism of formation of these parallel faults has not been quantitatively clarified, yet. Since the stress field plays a key role to the nucleation of parallel faults, it is fundamentally to investigate the failure and the extension of cracks in a large-scale rock mass (not with a laboratory-scale specimen) due to mechanically loaded stress field. In this study, we developed a numerical simulations code for rock mass failures under different loading conditions, and conducted rock failure experiments using this code. We assumed a numerical rock mass consisting of basalt with a rectangular shape for the model. We also assumed the failure of rock mass in accordance with the Mohr-Coulomb criterion, and the distribution of the initial tensile and compressive strength of rock elements to be the Weibull model. In this study, we use the Hamiltonian Particle Method (HPM), one of the particle methods, to represent large deformation and the destruction of materials. Out simulation results suggest that the confining pressure would have dominant influence for the initiation of parallel faults and their conjugates in compressive conditions. We conclude that the shearing force would provoke the propagation of parallel fractures along the shearing direction, but prevent that of fractures to the conjugate direction.

  16. The distributed diagonal force decomposition method for parallelizing molecular dynamics simulations.

    PubMed

    Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

    2011-11-15

    Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007

  17. Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

    NASA Technical Reports Server (NTRS)

    Yamakov, Vesselin I.

    2016-01-01

    This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.

  18. Parallel Simulation Algorithms for the Three Dimensional Strong-Strong Beam-Beam Interaction

    SciTech Connect

    Kabel, A.C.; /SLAC

    2008-03-17

    The strong-strong beam-beam effect is one of the most important effects limiting the luminosity of ring colliders. Little is known about it analytically, so most studies utilize numeric simulations. The two-dimensional realm is readily accessible to workstation-class computers (cf.,e.g.,[1, 2]), while three dimensions, which add effects such as phase averaging and the hourglass effect, require vastly higher amounts of CPU time. Thus, parallelization of three-dimensional simulation techniques is imperative; in the following we discuss parallelization strategies and describe the algorithms used in our simulation code, which will reach almost linear scaling of performance vs. number of CPUs for typical setups.

  19. Parallel 3D Multi-Stage Simulation of a Turbofan Engine

    NASA Technical Reports Server (NTRS)

    Turner, Mark G.; Topp, David A.

    1998-01-01

    A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force

  20. Application of parallel computing to seismic damage process simulation of an arch dam

    NASA Astrophysics Data System (ADS)

    Zhong, Hong; Lin, Gao; Li, Jianbo

    2010-06-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  1. Large-scale numerical simulation of laser propulsion by parallel computing

    NASA Astrophysics Data System (ADS)

    Zeng, Yaoyuan; Zhao, Wentao; Wang, Zhenghua

    2013-05-01

    As one of the most significant methods to study laser propelled rocket, the numerical simulation of laser propulsion has drawn an ever increasing attention at present. Nevertheless, the traditional serial simulation model cannot satisfy the practical needs because of insatiable memory overhead and considerable computation time. In order to solve this problem, we study on a general algorithm for laser propulsion design, and bring about parallelization by using a twolevel hybrid parallel programming model. The total computing domain is decomposed into distributed data spaces, and each partition is assigned to a MPI process. A single step of computation operates in the inter loop level, where a compiler directive is used to split MPI process into several OpenMP threads. Finally, parallel efficiency of hybrid program about two typical configurations on a China-made supercomputer with 4 to 256 cores is compared with pure MPI program. And, the hybrid program exhibits better performance than the pure MPI program on the whole, roughly as expected. The result indicates that our hybrid parallel approach is effective and practical in large-scale numerical simulation of laser propulsion.

  2. Re-forming supercritical quasi-parallel shocks. I - One- and two-dimensional simulations

    NASA Technical Reports Server (NTRS)

    Thomas, V. A.; Winske, D.; Omidi, N.

    1990-01-01

    The process of reforming supercritical quasi-parallel shocks is investigated using one-dimensional and two-dimensional hybrid (particle ion, massless fluid electron) simulations both of shocks and of simpler two-stream interactions. It is found that the supercritical quasi-parallel shock is not steady. Instread of a well-defined shock ramp between upstream and downstream states that remains at a fixed position in the flow, the ramp periodically steepens, broadens, and then reforms upstream of its former position. It is concluded that the wave generation process is localized at the shock ramp and that the reformation process proceeds in the absence of upstream perturbations intersecting the shock.

  3. Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

    SciTech Connect

    Kononenko, Oleksiy

    2015-03-26

    ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.

  4. Parallel simulations of Grover's algorithm for closest match search in neutron monitor data

    NASA Astrophysics Data System (ADS)

    Kussainov, Arman; White, Yelena

    We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.

  5. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    NASA Astrophysics Data System (ADS)

    Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik

    2015-09-01

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  6. Application of parallel computing techniques to a large-scale reservoir simulation

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-02-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.

  7. Design of a real-time wind turbine simulator using a custom parallel architecture

    NASA Technical Reports Server (NTRS)

    Hoffman, John A.; Gluck, R.; Sridhar, S.

    1995-01-01

    The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

  8. A method for data handling numerical results in parallel OpenFOAM simulations

    NASA Astrophysics Data System (ADS)

    Anton, Alin; Muntean, Sebastian

    2015-12-01

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit®[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  9. Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

    NASA Technical Reports Server (NTRS)

    Lyons, Daniel T.; Desai, Prasun N.

    2005-01-01

    This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

  10. A method for data handling numerical results in parallel OpenFOAM simulations

    SciTech Connect

    Anton, Alin; Muntean, Sebastian

    2015-12-31

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  11. A new parallel P3M code for very large-scale cosmological simulations

    NASA Astrophysics Data System (ADS)

    MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

    1998-12-01

    We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

  12. Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.

    PubMed

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S

    2003-01-01

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models. PMID:12714301

  13. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  14. Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.; Lichtner, P. C.; Mills, R. T.

    2014-01-01

    To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.

  15. A 3D parallel simulator for crystal growth and solidification in complex alloy systems

    NASA Astrophysics Data System (ADS)

    Nestler, Britta

    2005-02-01

    A 3D parallel simulator is developed to numerically solve the evolution equations of a new non-isothermal phase-field model for crystal growth and solidification in complex alloy systems. The new model and the simulator are capable to simultaneously describe the diffusion processes of multiple components, the phase transitions between multiple phases and the development of the temperature field. Weak and facetted formulations of both, surface energy and kinetic anisotropies are incorporated in the phase-field model. Multicomponent bulk diffusion effects including interdiffusion coefficients as well as diffusion in the interfacial region of phase or grain boundaries are considered. We introduce our parallel simulator that is based on a finite difference discretization including effective adaptive strategies and multigrid methods to reduce computation time and memory usage. The parallelization is realized for distributed as well as shared memory computer architectures using MPI libraries and OpenMP concepts. Applying the new computer model, we present a variety of simulated crystal structures such as dendrites, grains, binary and ternary eutectics in 2D and 3D. The influence of anisotropy on the microstructure evolution shows the formation of facets in preferred crystallographic directions. Phase transformations and solidification processes in a real multi-component alloy can be described by incorporating the physical data (e.g. surface tensions, kinetic coefficients, specific heat, heat and mass diffusion coefficients) and the specific phase diagram (in particular latent heats and melting temperatures) into the diffuse interface model via the free energies.

  16. Long-time atomistic simulations with the Parallel Replica Dynamics method

    NASA Astrophysics Data System (ADS)

    Perez, Danny

    Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.

  17. Application of Parallel Discrete Event Simulation to the Space Surveillance Network

    NASA Astrophysics Data System (ADS)

    Jefferson, D.; Leek, J.

    2010-09-01

    In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to

  18. Vortex-induced vibration of two parallel risers: Experimental test and numerical simulation

    NASA Astrophysics Data System (ADS)

    Huang, Weiping; Zhou, Yang; Chen, Haiming

    2016-04-01

    The vortex-induced vibration of two identical rigidly mounted risers in a parallel arrangement was studied using Ansys- CFX and model tests. The vortex shedding and force were recorded to determine the effect of spacing on the two-degree-of-freedom oscillation of the risers. CFX was used to study the single riser and two parallel risers in 2-8 D spacing considering the coupling effect. Because of the limited width of water channel, only three different riser spacings, 2 D, 3 D, and 4 D, were tested to validate the characteristics of the two parallel risers by comparing to the numerical simulation. The results indicate that the lift force changes significantly with the increase in spacing, and in the case of 3 D spacing, the lift force of the two parallel risers reaches the maximum. The vortex shedding of the risers in 3 D spacing shows that a variable velocity field with the same frequency as the vortex shedding is generated in the overlapped area, thus equalizing the period of drag force to that of lift force. It can be concluded that the interaction between the two parallel risers is significant when the risers are brought to a small distance between them because the trajectory of riser changes from oval to curve 8 as the spacing is increased. The phase difference of lift force between the two risers is also different as the spacing changes.

  19. Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas

    SciTech Connect

    Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.

    2006-07-15

    The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.

  20. Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

    PubMed Central

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  1. A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

    NASA Technical Reports Server (NTRS)

    Bui, Trong T.

    1999-01-01

    A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

  2. Parallel solutions for voxel-based simulations of reaction-diffusion systems.

    PubMed

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  3. Adaptive finite element simulation of flow and transport applications on parallel computers

    NASA Astrophysics Data System (ADS)

    Kirk, Benjamin Shelton

    The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the

  4. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    SciTech Connect

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-11-13

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.

  5. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations.

    PubMed

    Hepburn, I; Chen, W; De Schutter, E

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification. PMID:27497550

  6. Task parallel sensitivity analysis and parameter estimation of groundwater simulations through the SALSSA framework

    SciTech Connect

    Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

    2010-07-15

    The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

  7. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

    NASA Astrophysics Data System (ADS)

    Hepburn, I.; Chen, W.; De Schutter, E.

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.

  8. A Large Scale Simulation of Ultrasonic Wave Propagation in Concrete Using Parallelized EFIT

    NASA Astrophysics Data System (ADS)

    Nakahata, Kazuyuki; Tokunaga, Jyunichi; Kimoto, Kazushi; Hirose, Sohichi

    A time domain simulation tool for the ultrasonic propagation in concrete is developed using the elastodynamic finite integration technique (EFIT) and the image-based modeling. The EFIT is a grid-based time domain differential technique and easily treats the different boundary conditions in the inhomogeneous material such as concrete. Here, the geometry of concrete is determined by a scanned image of concrete and the processed color bitmap image is fed into the EFIT. Although the ultrasonic wave simulation in such a complex material requires much time to calculate, we here execute the EFIT by a parallel computing technique using a shared memory computer system. In this study, formulations of the EFIT and treatment of the different boundary conditions are briefly described and examples of shear horizontal wave propagations in reinforced concrete are demonstrated. The methodology and performance of parallelization for the EFIT are also discussed.

  9. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.

  10. Construction of a parallel processor for simulating manipulators and other mechanical systems

    NASA Technical Reports Server (NTRS)

    Hannauer, George

    1991-01-01

    This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

  11. Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations

    NASA Technical Reports Server (NTRS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.

  12. Spontaneous hot flow anomalies at quasi-parallel shocks: 2. Hybrid simulations

    NASA Astrophysics Data System (ADS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Abstract<p label="1">Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid <span class="hlt">simulations</span> to investigate the formation of Spontaneous Hot Flow Anomalies (SHFAs) upstream of quasi-<span class="hlt">parallel</span> bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-<span class="hlt">parallel</span> bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity, and enhancements in ion temperature. Using virtual spacecraft in the <span class="hlt">simulation</span>, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the <span class="hlt">simulation</span> data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly nonuniform plasma in the quasi-<span class="hlt">parallel</span> magnetosheath including large-scale density and magnetic field cavities.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24329381','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24329381"><span id="translatedtitle">Efficient <span class="hlt">parallelization</span> of short-range molecular dynamics <span class="hlt">simulations</span> on many-core systems.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Meyer, R</p> <p>2013-11-01</p> <p>This article introduces a highly <span class="hlt">parallel</span> algorithm for molecular dynamics <span class="hlt">simulations</span> with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high <span class="hlt">parallel</span> speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark <span class="hlt">simulations</span> on a typical 12-core machine show that the described algorithm achieves excellent <span class="hlt">parallel</span> efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These <span class="hlt">simulations</span> demonstrate that the algorithm scales well to large numbers of cores. PMID:24329381</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhRvE..88e3309M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhRvE..88e3309M"><span id="translatedtitle">Efficient <span class="hlt">parallelization</span> of short-range molecular dynamics <span class="hlt">simulations</span> on many-core systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Meyer, R.</p> <p>2013-11-01</p> <p>This article introduces a highly <span class="hlt">parallel</span> algorithm for molecular dynamics <span class="hlt">simulations</span> with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high <span class="hlt">parallel</span> speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark <span class="hlt">simulations</span> on a typical 12-core machine show that the described algorithm achieves excellent <span class="hlt">parallel</span> efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These <span class="hlt">simulations</span> demonstrate that the algorithm scales well to large numbers of cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20040200740','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20040200740"><span id="translatedtitle">Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow <span class="hlt">Simulations</span> Using Massively <span class="hlt">Parallel</span> Computers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Morgan, Philip E.</p> <p>2004-01-01</p> <p>This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow <span class="hlt">Simulations</span> Using Massively <span class="hlt">Parallel</span> Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two <span class="hlt">parallel</span> flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order <span class="hlt">parallel</span> solver for Direct Numerical <span class="hlt">Simulations</span> (DNS) and Large Eddy <span class="hlt">Simulation</span> (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3148765','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3148765"><span id="translatedtitle"><span class="hlt">Parallel</span> Discrete Molecular Dynamics <span class="hlt">Simulation</span> With Speculation and In-Order Commitment*†</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Khan, Md. Ashfaquzzaman; Herbordt, Martin C.</p> <p>2011-01-01</p> <p>Discrete molecular dynamics <span class="hlt">simulation</span> (DMD) uses simplified and discretized models enabling <span class="hlt">simulations</span> to advance by event rather than by timestep. DMD is an instance of discrete event <span class="hlt">simulation</span> and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of <span class="hlt">parallelizing</span> DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes <span class="hlt">parallelism</span>, while in-order commitment ensures correctness. We analyze the potential of this <span class="hlt">parallelization</span> method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19990008585','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19990008585"><span id="translatedtitle">Synchronous <span class="hlt">Parallel</span> Emulation and Discrete Event <span class="hlt">Simulation</span> System with Self-Contained <span class="hlt">Simulation</span> Objects and Active Event Objects</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Steinman, Jeffrey S. (Inventor)</p> <p>1998-01-01</p> <p>The present invention is embodied in a method of performing object-oriented <span class="hlt">simulation</span> and a system having inter-connected processor nodes operating in <span class="hlt">parallel</span> to <span class="hlt">simulate</span> mutual interactions of a set of discrete <span class="hlt">simulation</span> objects distributed among the nodes as a sequence of discrete events changing state variables of respective <span class="hlt">simulation</span> objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented <span class="hlt">simulation</span> is performed at each one of the nodes by assigning passive self-contained <span class="hlt">simulation</span> objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained <span class="hlt">simulation</span> objects of the one node, restricting the respective passive self-contained <span class="hlt">simulation</span> objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained <span class="hlt">simulation</span> object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1999APS..DPP.JP165L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1999APS..DPP.JP165L"><span id="translatedtitle"><span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> of Ultra-High Intensity Laser Plasma Interactions.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lasinski, B. F.; Still, C. H.; Langdon, A. B.; Wilks, S. C.; Hatchett, S. P.; Hinkel, D. E.</p> <p>1999-11-01</p> <p>We extend our previous <span class="hlt">simulations</span> of high intensity short pulse laser plasma interactionsfootnote B. F. Lasinski, A. B. Langdon, S. P. Hatchett, M. H. Key, and M. Tabak, Phys. Plasmas 6, 2041 (1999); S. C. Wilks and W. L. Kruer, IEEE Journal of Quantum Electronics 11, 1954 (1997). to 3D and to much larger systems in 2D using our new, modern, 3D, electromagnetic, fully relativistic, massively <span class="hlt">parallel</span> PIC code. Our <span class="hlt">simulation</span> parameters are guided by the recent Petawatt experiments at Livermore. We study the generation of hot electrons and energetic ions and the associated complex phenomena. Laser light filamentation and the formation of high static magnetic fields are described.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22218447','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22218447"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of three-dimensional molecular dynamic <span class="hlt">simulation</span> for laser-cluster interaction</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Holkundkar, Amol R.</p> <p>2013-11-15</p> <p>The objective of this article is to report the <span class="hlt">parallel</span> implementation of the 3D molecular dynamic <span class="hlt">simulation</span> code for laser-cluster interactions. The benchmarking of the code has been done by comparing the <span class="hlt">simulation</span> results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMD.....8..473H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMD.....8..473H"><span id="translatedtitle">A generic <span class="hlt">simulation</span> cell method for developing extensible, efficient and readable <span class="hlt">parallel</span> computational models</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Honkonen, I.</p> <p>2015-03-01</p> <p>I present a method for developing extensible and modular computational models without sacrificing serial or <span class="hlt">parallel</span> performance or source code readability. By using a generic <span class="hlt">simulation</span> cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic <span class="hlt">simulation</span> cell method presented here, generic <span class="hlt">simulation</span> cell class (gensimcell), also includes support for <span class="hlt">parallel</span> programming by allowing model developers to select which <span class="hlt">simulation</span> variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic <span class="hlt">simulation</span> cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at <a href="https://github.com/nasailja/gensimcell"target="_blank">https://github.com/nasailja/gensimcell</a> for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_11");'>11</a></li> <li><a href="#" onclick='return showDiv("page_12");'>12</a></li> <li class="active"><span>13</span></li> <li><a href="#" onclick='return showDiv("page_14");'>14</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_13 --> <div id="page_14" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_12");'>12</a></li> <li><a href="#" onclick='return showDiv("page_13");'>13</a></li> <li class="active"><span>14</span></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="261"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012AGUFMOS34A..07R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012AGUFMOS34A..07R"><span id="translatedtitle">Field-Scale, Massively <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Production from Oceanic Gas Hydrate Deposits</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Reagan, M. T.; Moridis, G. J.; Freeman, C. M.; Pan, L.; Boyle, K. L.; Johnson, J. N.; Husebo, J. A.</p> <p>2012-12-01</p> <p>The quantity of hydrocarbon gases trapped in natural hydrate accumulations is enormous, leading to significant interest in the evaluation of their potential as an energy source. It has been shown that large volumes of gas can be readily produced at high rates for long times from some types of methane hydrate accumulations by means of depressurization-induced dissociation, and using conventional technologies with horizontal or vertical well configurations. However, these systems are currently assessed using simplified or reduced-scale 3D or even 2D production <span class="hlt">simulations</span>. In this study, we use the massively <span class="hlt">parallel</span> TOUGH+HYDRATE code (pT+H) to assess the production potential of a large, deep-ocean hydrate reservoir and develop strategies for effective production. The <span class="hlt">simulations</span> model a full 3D system of over 24 km2 extent, examining the productivity of vertical and horizontal wells, single or multiple wells, and explore variations in reservoir properties. Systems of up to 2.5M gridblocks, running on thousands of supercomputing nodes, are required to <span class="hlt">simulate</span> such large systems at the highest level of detail. The <span class="hlt">simulations</span> reveal the challenges inherent in producing from deep, relatively cold systems with extensive water-bearing channels and connectivity to large aquifers, including the difficulty of achieving depressurizing, the challenges of high water removal rates, and the complexity of production design. Also highlighted are new frontiers in large-scale reservoir <span class="hlt">simulation</span> of coupled flow, transport, thermodynamics, and phase behavior, including the construction of large meshes, the use <span class="hlt">parallel</span> numerical solvers and MPI, and large-scale, <span class="hlt">parallel</span> 3D visualization of results.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22230824','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22230824"><span id="translatedtitle">Massively <span class="hlt">parallel</span> Monte Carlo for many-particle <span class="hlt">simulations</span> on GPUs</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.</p> <p>2013-12-01</p> <p>Current trends in <span class="hlt">parallel</span> processors call for the design of efficient massively <span class="hlt">parallel</span> algorithms for scientific computing. <span class="hlt">Parallel</span> algorithms for Monte Carlo <span class="hlt">simulations</span> of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively <span class="hlt">parallel</span> method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001JChPh.114.9772S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001JChPh.114.9772S"><span id="translatedtitle">A novel <span class="hlt">parallel</span>-rotation algorithm for atomistic Monte Carlo <span class="hlt">simulation</span> of dense polymer systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.</p> <p>2001-06-01</p> <p>We develop and test a new elementary Monte Carlo move for use in the off-lattice <span class="hlt">simulation</span> of polymer systems. This novel <span class="hlt">Parallel</span>-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The <span class="hlt">parallel</span>-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo <span class="hlt">simulation</span>. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo <span class="hlt">simulation</span> of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice <span class="hlt">simulation</span> of realistic models of dense polymer systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016SPIE.9805E..0NS&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016SPIE.9805E..0NS&link_type=ABSTRACT"><span id="translatedtitle">Modeling of fatigue crack induced nonlinear ultrasonics using a highly <span class="hlt">parallelized</span> explicit local interaction <span class="hlt">simulation</span> approach</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shen, Yanfeng; Cesnik, Carlos E. S.</p> <p>2016-04-01</p> <p>This paper presents a <span class="hlt">parallelized</span> modeling technique for the efficient <span class="hlt">simulation</span> of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction <span class="hlt">Simulation</span> Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly <span class="hlt">parallelized</span> supercomputing on powerful graphic cards. Both the explicit contact formulation and the <span class="hlt">parallel</span> feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015CoPhC.194...18N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015CoPhC.194...18N"><span id="translatedtitle">Computational performance of a smoothed particle hydrodynamics <span class="hlt">simulation</span> for shared-memory <span class="hlt">parallel</span> computing</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide</p> <p>2015-09-01</p> <p>The computational performance of a smoothed particle hydrodynamics (SPH) <span class="hlt">simulation</span> is investigated for three types of current shared-memory <span class="hlt">parallel</span> computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several <span class="hlt">parallel</span> implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH <span class="hlt">simulation</span> on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs <span class="hlt">parallelized</span> by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23163385','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23163385"><span id="translatedtitle"><span class="hlt">Simulations</span> of structural and dynamic anisotropy in nano-confined water between <span class="hlt">parallel</span> graphite plates.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M H; Najafi, Bijan</p> <p>2012-11-14</p> <p>We use molecular dynamics <span class="hlt">simulations</span> to study the structure, dynamics, and transport properties of nano-confined water between <span class="hlt">parallel</span> graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our <span class="hlt">simulations</span> show anisotropic structure and dynamics of the confined water phase in directions <span class="hlt">parallel</span> and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions <span class="hlt">parallel</span> and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ~10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm(-3), bubble formation and restructuring of the water layers are observed. PMID:23163385</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/974630','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/974630"><span id="translatedtitle"><span class="hlt">Parallel</span> Agent-Based <span class="hlt">Simulations</span> on Clusters of GPUs and Multi-Core Processors</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K</p> <p>2010-01-01</p> <p>An effective latency-hiding mechanism is presented in the <span class="hlt">parallelization</span> of agent-based model <span class="hlt">simulations</span> (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art <span class="hlt">parallel</span> computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct <span class="hlt">parallel</span> platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular <span class="hlt">simulator</span> in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016PASJ...68...54I&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016PASJ...68...54I&link_type=ABSTRACT"><span id="translatedtitle">Implementation and performance of FDPS: a framework for developing <span class="hlt">parallel</span> particle <span class="hlt">simulation</span> codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro</p> <p>2016-08-01</p> <p>We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle <span class="hlt">Simulators</span>). FDPS is an application-development framework which helps researchers to develop <span class="hlt">simulation</span> programs using particle methods for large-scale distributed-memory <span class="hlt">parallel</span> supercomputers. A particle-based <span class="hlt">simulation</span> program for distributed-memory <span class="hlt">parallel</span> computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory <span class="hlt">parallel</span> computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient <span class="hlt">parallel</span> execution of particle-based <span class="hlt">simulations</span> as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale <span class="hlt">parallel</span> supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/895092','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/895092"><span id="translatedtitle">Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments New Directions and Work-in-Progress</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Trinitis, C; Schulz, M</p> <p>2006-06-29</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in <span class="hlt">parallel</span> algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. ParSim brings together researchers from both application disciplines and computer science and aims at fostering closer cooperation between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. This offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, eleven papers from authors in nine countries were submitted to ParSim, and we selected five of them. They cover a wide range of different application fields including gas flow <span class="hlt">simulations</span>, thermo-mechanical processes in nuclear waste storage, and cosmological <span class="hlt">simulations</span>. At the same time, the selected contributions also address the computer science side of their codes and discuss different <span class="hlt">parallelization</span> strategies, programming models and languages, as well as the use nonblocking collective operations in MPI. We are confident that this provides an attractive program and that ParSim will be an informal setting for lively discussions and for fostering new</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016PASJ..tmp...65I','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016PASJ..tmp...65I"><span id="translatedtitle">Implementation and performance of FDPS: a framework for developing <span class="hlt">parallel</span> particle <span class="hlt">simulation</span> codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro</p> <p>2016-06-01</p> <p>We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle <span class="hlt">Simulators</span>). FDPS is an application-development framework which helps researchers to develop <span class="hlt">simulation</span> programs using particle methods for large-scale distributed-memory <span class="hlt">parallel</span> supercomputers. A particle-based <span class="hlt">simulation</span> program for distributed-memory <span class="hlt">parallel</span> computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory <span class="hlt">parallel</span> computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient <span class="hlt">parallel</span> execution of particle-based <span class="hlt">simulations</span> as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale <span class="hlt">parallel</span> supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016RScI...87g6101S&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016RScI...87g6101S&link_type=ABSTRACT"><span id="translatedtitle">Note: Application of a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator for <span class="hlt">simulation</span> of hip joint motion</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shan, X. L.; Cheng, G.; Liu, X. Z.</p> <p>2016-07-01</p> <p>In the paper, a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator, which has two moving platforms, is proposed. The <span class="hlt">parallel</span> manipulator is adopted to <span class="hlt">simulate</span> hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the <span class="hlt">simulated</span> motions. The experimental results indicate that the 2(3HUS+S) <span class="hlt">parallel</span> manipulator can realize the <span class="hlt">simulation</span> of many kinds of hip joint motions without changing the structure size.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27475608','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27475608"><span id="translatedtitle">Note: Application of a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator for <span class="hlt">simulation</span> of hip joint motion.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shan, X L; Cheng, G; Liu, X Z</p> <p>2016-07-01</p> <p>In the paper, a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator, which has two moving platforms, is proposed. The <span class="hlt">parallel</span> manipulator is adopted to <span class="hlt">simulate</span> hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the <span class="hlt">simulated</span> motions. The experimental results indicate that the 2(3HUS+S) <span class="hlt">parallel</span> manipulator can realize the <span class="hlt">simulation</span> of many kinds of hip joint motions without changing the structure size. PMID:27475608</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3605599','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3605599"><span id="translatedtitle">GROMACS 4.5: a high-throughput and highly <span class="hlt">parallel</span> open source molecular <span class="hlt">simulation</span> toolkit</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik</p> <p>2013-01-01</p> <p>Motivation: Molecular <span class="hlt">simulation</span> has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated <span class="hlt">simulation</span> of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new <span class="hlt">simulation</span> algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient <span class="hlt">parallelization</span> even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art <span class="hlt">parallelization</span>, this provides extremely high performance and cost efficiency for high-throughput as well as massively <span class="hlt">parallel</span> <span class="hlt">simulations</span>. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/936704','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/936704"><span id="translatedtitle">De Novo Ultrascale Atomistic <span class="hlt">Simulations</span> On High-End <span class="hlt">Parallel</span> Supercomputers</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H</p> <p>2006-09-04</p> <p>We present a de novo hierarchical <span class="hlt">simulation</span> framework for first-principles based predictive <span class="hlt">simulations</span> of materials and their validation on high-end <span class="hlt">parallel</span> supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) <span class="hlt">simulations</span> explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) <span class="hlt">simulations</span> are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling <span class="hlt">simulation</span> algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical <span class="hlt">simulation</span> with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition <span class="hlt">parallelization</span> framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic <span class="hlt">simulations</span>--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the <span class="hlt">parallel</span> efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21499769','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21499769"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Martinez, E.; Monasterio, P.R.; Marian, J.</p> <p>2011-02-20</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910070181&hterms=Geomagnetic+pulsations&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D30%26Ntt%3DGeomagnetic%2Bpulsations','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910070181&hterms=Geomagnetic+pulsations&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D30%26Ntt%3DGeomagnetic%2Bpulsations"><span id="translatedtitle">Steepening of <span class="hlt">parallel</span> propagating hydromagnetic waves into magnetic pulsations - A <span class="hlt">simulation</span> study</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Akimoto, K.; Winske, D.; Onsager, T. G.; Thomsen, M. F.; Gary, S. P.</p> <p>1991-01-01</p> <p>The steepening mechanism of <span class="hlt">parallel</span> propagating low-frequency MHD-like waves observed upstream of the earth's quasi-<span class="hlt">parallel</span> bow shock has been investigated by means of electromagnetic hybrid <span class="hlt">simulations</span>. It is shown that an ion beam through the resonant electromagnetic ion/ion instability excites large-amplitude waves, which consequently pitch angle scatter, decelerate, and eventually magnetically trap beam ions in regions where the wave amplitudes are largest. As a result, the beam ions become bunched in both space and gyrophase. As these higher-density, nongyrotropic beam segments are formed, the hydromagnetic waves rapidly steepen, resulting in magnetic pulsations, with properties generally in agreement with observations. This steepening process operates on the scale of the linear growth time of the resonant ion/ion instability. Many of the pulsations generated by this mechanism are left-hand polarized in the spacecraft frame.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMDD....8.2369H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMDD....8.2369H"><span id="translatedtitle">A <span class="hlt">parallelization</span> scheme to <span class="hlt">simulate</span> reactive transport in the subsurface environment with OGS#IPhreeqc</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.</p> <p>2015-03-01</p> <p>This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to <span class="hlt">simulate</span> thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible <span class="hlt">parallelization</span> scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and <span class="hlt">parallelization</span> scheme have been tested and verified in terms of precision and performance.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011JCoPh.230.1359M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011JCoPh.230.1359M"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Martínez, E.; Monasterio, P. R.; Marian, J.</p> <p>2011-02-01</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JSP...162..701U','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JSP...162..701U"><span id="translatedtitle"><span class="hlt">Parallel</span> Tempering Monte Carlo <span class="hlt">Simulations</span> of Spherical Fixed-Connectivity Model for Polymerized Membranes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Usui, Satoshi; Koibuchi, Hiroshi</p> <p>2016-02-01</p> <p>We study the first order phase transition of the fixed-connectivity triangulated surface model using the <span class="hlt">Parallel</span> Tempering Monte Carlo (PTMC) technique on relatively large lattices. From the PTMC results, we find that the transition is considerably stronger than the reported ones predicted by the conventional Metropolis MC (MMC) technique and the flat histogram MC technique. We also confirm that the results of the PTMC on relatively smaller lattices are in good agreement with those known results. This implies that the PTMC is successfully used to <span class="hlt">simulate</span> the first order phase transitions. The <span class="hlt">parallel</span> computation in the PTMC is implemented by OpenMP, where the speed of the PTMC on multi-core CPUs is considerably faster than that on the single-core CPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/658737','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/658737"><span id="translatedtitle"><span class="hlt">Parallel</span> traffic flow <span class="hlt">simulation</span> of freeway networks: Phase 2. Final report 1994--1995</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Chronopoulos, A.</p> <p>1997-07-01</p> <p>Explicit and implicit numerical methods for solving simple macroscopic traffic flow continuum models have been studied and efficiently implemented in traffic <span class="hlt">simulation</span> codes in the past. The authors have already studied and implemented explicit methods for solving the high-order flow conservation traffic model. Implicit methods allow much larger time step size than explicit methods, for the same accuracy. However, at each time step a nonlinear system must be solved. They use the Newton method coupled with a linear iterative (Orthomin). They accelerate the convergence of Orthomin with <span class="hlt">parallel</span> incomplete LU factorization preconditionings. The authors implemented this implicit method on a 16 processor nCUBE2 <span class="hlt">parallel</span> computer and obtained significant execution time speedup.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_12");'>12</a></li> <li><a href="#" onclick='return showDiv("page_13");'>13</a></li> <li class="active"><span>14</span></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_14 --> <div id="page_15" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_13");'>13</a></li> <li><a href="#" onclick='return showDiv("page_14");'>14</a></li> <li class="active"><span>15</span></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="281"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008JCoPh.227.6249C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008JCoPh.227.6249C"><span id="translatedtitle">Implementation of unsteady sampling procedures for the <span class="hlt">parallel</span> direct <span class="hlt">simulation</span> Monte Carlo method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.</p> <p>2008-06-01</p> <p>An unsteady sampling routine for a general <span class="hlt">parallel</span> direct <span class="hlt">simulation</span> Monte Carlo method called PDSC is introduced, allowing the <span class="hlt">simulation</span> of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and <span class="hlt">simulation</span> time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by <span class="hlt">simulating</span> shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. <span class="hlt">Simulations</span> are then conducted of two applications involving the interaction of shocks over wedges. The results of these <span class="hlt">simulations</span> are compared to experimental data and <span class="hlt">simulations</span> from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016IAUS..312..260W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016IAUS..312..260W"><span id="translatedtitle">Acceleration of hybrid MPI <span class="hlt">parallel</span> NBODY6++ for large N-body globular cluster <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Long; Spurzem, Rainer; Aarseth, Sverre; Nitadori, Keigo; Berczik, Peter; Kouwenhoven, M. B. N.; Naab, Thorsten</p> <p>2016-02-01</p> <p>Previous research on globular clusters (GCs) dynamics is mostly based on semi-analytic, Fokker-Planck, Monte-Carlo methods and on direct N-body (NB) <span class="hlt">simulations</span>. These works have great advantages but also limits since GCs are massive and compact and close encounters and binaries play very important roles in their dynamics. The former three methods make approximations and assumptions, while expensive computing time and number of stars limit the latter method. The current largest direct NB <span class="hlt">simulation</span> has ~ 500k stars (Heggie 2014). Here, we accelerate the direct NB code NBODY6++ (which extends NBODY6 to supercomputers by using MPI) with new <span class="hlt">parallel</span> computing technologies (GPU, OpenMP + SSE/AVX). Our aim is to handle large N (up to 106) direct NB <span class="hlt">simulations</span> to obtain better understanding of the dynamical evolution of GCs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2004LNP...650..255G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2004LNP...650..255G"><span id="translatedtitle">Small-World Synchronized Computing Networks for Scalable <span class="hlt">Parallel</span> Discrete-Event <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Guclu, Hasan; Korniss, Gyorgy; Toroczkai, Zoltan; Novotny, Mark A.</p> <p></p> <p>We study the scalability of <span class="hlt">parallel</span> discrete-event <span class="hlt">simulations</span> for arbitrary short-range interacting systems with asynchronous dynamics. When the synchronization topology mimics that of the short-range interacting underlying system, the virtual time horizon (corresponding to the progress of the processing elements) exhibits Kardar-Parisi-Zhang-like kinetic roughening. Although the virtual times, on average, progress at a nonzero rate, their statistical spread diverges with the number of processing elements, hindering efficient data collection. We show that when the synchronization topology is extended to include quenched random communication links between the processing elements, they make a close-to-uniform progress with a nonzero rate, without global synchronization. We discuss in detail a coarse-grained description for the small-world synchronized virtual time horizon and compare the findings to those obtained by <span class="hlt">simulating</span> the <span class="hlt">simulations</span> based on the exact algorithmic rules.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/919137','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/919137"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> design : mathematical formulation, version 2.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.</p> <p>2004-06-01</p> <p>This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively <span class="hlt">parallel</span> SPICE-style circuit <span class="hlt">simulator</span> developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit <span class="hlt">simulation</span>, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit <span class="hlt">simulation</span>, such as voltage limiting, are also described in detail.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920260','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920260"><span id="translatedtitle"><span class="hlt">Parallel</span> Beam Dynamics <span class="hlt">Simulation</span> Tools for Future Light SourceLinac Modeling</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Qiang, Ji; Pogorelov, Ilya v.; Ryne, Robert D.</p> <p>2007-06-25</p> <p>Large-scale modeling on <span class="hlt">parallel</span> computers is playing an increasingly important role in the design of future light sources. Such modeling provides a means to accurately and efficiently explore issues such as limits to beam brightness, emittance preservation, the growth of instabilities, etc. Recently the IMPACT codes suite was enhanced to be applicable to future light source design. <span class="hlt">Simulations</span> with IMPACT-Z were performed using up to one billion <span class="hlt">simulation</span> particles for the main linac of a future light source to study the microbunching instability. Combined with the time domain code IMPACT-T, it is now possible to perform large-scale start-to-end linac <span class="hlt">simulations</span> for future light sources, including the injector, main linac, chicanes, and transfer lines. In this paper we provide an overview of the IMPACT code suite, its key capabilities, and recent enhancements pertinent to accelerator modeling for future linac-based light sources.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003CoPhC.155..159A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003CoPhC.155..159A"><span id="translatedtitle">FLY. A <span class="hlt">parallel</span> tree N-body code for cosmological <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.</p> <p>2003-10-01</p> <p>FLY is a <span class="hlt">parallel</span> treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: <span class="hlt">parallel</span> program. The user can select the number of processors >=1 Has the code been vectorized or <span class="hlt">parallelized</span>?: <span class="hlt">parallelized</span> Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: <span class="hlt">Parallel</span> tree N-body code for cosmological <span class="hlt">simulations</span> Nature of physical problem: FLY is a <span class="hlt">parallel</span> collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles <span class="hlt">simulation</span> on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4824128','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4824128"><span id="translatedtitle">BioFVM: an efficient, <span class="hlt">parallelized</span> diffusive transport solver for 3-D biological <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul</p> <p>2016-01-01</p> <p>Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can <span class="hlt">simulate</span> release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been <span class="hlt">parallelized</span> with OpenMP, allowing efficient <span class="hlt">simulations</span> on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger <span class="hlt">simulator</span>. Availability and implementation: BioFVM is written in C ++ with <span class="hlt">parallelization</span> in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011AGUFMNG51D1672H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011AGUFMNG51D1672H"><span id="translatedtitle">Using Speculative Execution to Reduce Communication in a <span class="hlt">Parallel</span> Large Scale Earthquake <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Heien, E. M.; Yikilmaz, M. B.; Sachs, M. K.; Rundle, J. B.; Turcotte, D. L.; Kellogg, L. H.</p> <p>2011-12-01</p> <p>Earthquake <span class="hlt">simulations</span> on <span class="hlt">parallel</span> systems can be communication intensive due to local events (rupture waves) which have global effects (stress transfer). These events require global communication to transmit the effects of increased stress to model elements on other computing nodes. We describe a method of using speculative execution in a large scale <span class="hlt">parallel</span> computation to decrease communication and improve <span class="hlt">simulation</span> speed. This method exploits the tendency of earthquake ruptures to remain physically localized even though their effects on stress will be over long ranges. In this method we assume the stress transfer caused by a rupture remains localized and avoid global communication until the rupture has a high probability of passing to another node. We then calculate the stress state of the system to ensure that the rupture in fact remained localized, proceeding if the assumption was correct or rolling back the calculation otherwise. Using this method we are able to reduce communication frequency by 78% percent, in turn decreasing communication time by up to 66% and improving <span class="hlt">simulation</span> speed by up to 45%.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23600445','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23600445"><span id="translatedtitle">Accelerating groundwater flow <span class="hlt">simulation</span> in MODFLOW using JASMIN-based <span class="hlt">parallel</span> computing.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Cheng, Tangpei; Mo, Zeyao; Shao, Jingli</p> <p>2014-01-01</p> <p>To accelerate the groundwater flow <span class="hlt">simulation</span> process, this paper reports our work on developing an efficient <span class="hlt">parallel</span> <span class="hlt">simulator</span> through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and <span class="hlt">parallel</span> algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is <span class="hlt">simulating</span> flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine times faster than the GMG-based MODFLOW program. The third test is a simplified transient flow problem with the order of tens of millions of cells to examine the scalability. Compared to 32 cores, <span class="hlt">parallel</span> efficiency of 77 and 68% are obtained on 512 and 1024 cores, respectively, which indicates impressive scalability. PMID:23600445</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=20020010587&hterms=Weeratunga&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DWeeratunga','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=20020010587&hterms=Weeratunga&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DWeeratunga"><span id="translatedtitle"><span class="hlt">Simulation</span> of Unsteady Combustion in a Ramjet Engine Using a Highly <span class="hlt">Parallel</span> Computer</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Menon, Suresh; Weeratunga, Sisira; Cooper, D. M. (Technical Monitor)</p> <p>1994-01-01</p> <p>Combustion instability in ramjets is a complex phenomenon that involve nonlinear interaction between acoustic waves, vortex motion and unsteady heat release in the combustor. To numerically <span class="hlt">simulate</span> this 3-D, transient phenomenon, enormous computer resources (time, memory and disk storage) are required. Although current generation vector supercomputers are capable of providing adequate resources for <span class="hlt">simulations</span> of this nature, their high cost and limited availability, makes such machines less than satisfactory for routine use. The primary focus of this study is to assess the feasibility of using highly <span class="hlt">parallel</span> computer systems as a cost-effective alternative for conducting such unsteady flow <span class="hlt">simulations</span>. Towards this end, a large-eddy <span class="hlt">simulation</span> model for combustion instability was implemented on the Intel iPSC/860 and a careful study was conducted to determine the benefits and the problems associated with the use of such machines for transient <span class="hlt">simulations</span>. Details of this study along with the results obtained from the unsteady combustion <span class="hlt">simulations</span> carried out on the iPSC/860 are discussed in this paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19900007132','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19900007132"><span id="translatedtitle">Stochastic <span class="hlt">simulation</span> of charged particle transport on the massively <span class="hlt">parallel</span> processor</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Earl, James A.</p> <p>1988-01-01</p> <p>Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively <span class="hlt">Parallel</span> Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These <span class="hlt">simulations</span> have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001APS..DPPKP1112L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001APS..DPPKP1112L"><span id="translatedtitle"><span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> of Short-Pulse High Intensity Laser Plasma Interactions.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lasinski, B. F.; Still, C. H.; Langdon, A. B.</p> <p>2001-10-01</p> <p>We extend our previous <span class="hlt">simulations</span> of high intensity short pulse laser plasma interactions footnote B. F. Lasinski, A. B. Langdon, S. P. Hatchett, M. H. Key, and M. Tabak, Phys. Plasmas 6, 2041 (1999); S. C. Wilks and W. L. Kruer, IEEE Journal of Quantum Electronics 11, 1954 (1997). to 3D and to much larger systems in 2D using our new, modern, 3D, electromagnetic, fully relativistic, massively <span class="hlt">parallel</span> PIC code. We study the generation of hot electrons and energetic ions and the associated complex phenomena. Laser light filamentation and the formation of high static magnetic fields are described.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1028177','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1028177"><span id="translatedtitle">A <span class="hlt">parallel</span> multigrid preconditioner for the <span class="hlt">simulation</span> of large fracture networks</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sampath, Rahul S; Barai, Pallab; Nukala, Phani K</p> <p>2010-01-01</p> <p>Computational modeling of a fracture in disordered materials using discrete lattice models requires the solution of a linear system of equations every time a new lattice bond is broken. Solving these linear systems of equations successively is the most expensive part of fracture <span class="hlt">simulations</span> using large three-dimensional networks. In this paper, we present a <span class="hlt">parallel</span> multigrid preconditioned conjugate gradient algorithm to solve these linear systems. Numerical experiments demonstrate that this algorithm performs significantly better than the algorithms previously used to solve this problem.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1035294','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1035294"><span id="translatedtitle">Understanding Performance of <span class="hlt">Parallel</span> Scientific <span class="hlt">Simulation</span> Codes using Open|SpeedShop</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ghosh, K K</p> <p>2011-11-07</p> <p>Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, <span class="hlt">parallel</span>, scientific <span class="hlt">simulation</span> codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/5574659','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/5574659"><span id="translatedtitle">Forced-convection boiling tests performed in <span class="hlt">parallel</span> <span class="hlt">simulated</span> LMR fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Rose, S.D.; Carbajo, J.J.; Levin, A.E.; Lloyd, D.B.; Montgomery, B.H.; Wantland, J.L.</p> <p>1985-04-21</p> <p>Forced-convection tests have been carried out using <span class="hlt">parallel</span> <span class="hlt">simulated</span> Liquid Metal Reactor fuel assemblies in an engineering-scale sodium loop, the Thermal-Hydraulic Out-of-Reactor Safety facility. The tests, performed under single- and two-phase conditions, have shown that for low forced-convection flow there is significant flow augmentation by thermal convection, an important phenomenon under degraded shutdown heat removal conditions in an LMR. The power and flows required for boiling and dryout to occur are much higher than decay heat levels. The experimental evidence supports analytical results that heat removal from an LMR is possible with a degraded shutdown heat removal system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/427933','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/427933"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span> and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, L.</p> <p>1995-12-31</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed and its operating modes for 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using constant current model and Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of 1 kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500 W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging PF mode for the entire load range and requires a narrow variation in switching frequency.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2000PhDT.......208B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2000PhDT.......208B"><span id="translatedtitle"><span class="hlt">Parallel</span> direct numerical <span class="hlt">simulation</span> of wake vortex detection using monostatic and bistatic radio acoustic sounding systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Boluriaan Esfahaani, Said</p> <p></p> <p>A <span class="hlt">parallel</span> two-dimensional code is developed in this thesis to numerically <span class="hlt">simulate</span> wake vortex detection using a Radio Acoustic Sounding System (RASS). The Maxwell equations for media with non-uniform permittivity and the linearized Euler equations for media with non-uniform mean flow are the main framework for the <span class="hlt">simulations</span>. The code is written in Fortran 90 with the Message Passing Interface (MPI) for <span class="hlt">parallel</span> implementation. The main difficulty encountered with a time accurate <span class="hlt">simulation</span> of a RASS is the number of samples required to resolve the Doppler shift in the scattered electromagnetic signal. Even for a 1D <span class="hlt">simulation</span> with a typical scatterer size, the CPU time required to run the code is far beyond currently available computer resources. Two solutions that overcome this problem are described. In the first the actual electromagnetic wave propagation speed is replaced with a much lower value. This allows an explicit, time accurate numerical scheme to be used. In the second the governing differential equations are recast in order to remove the carrier frequency and solve only for the frequency shift using an implicit scheme with large time steps. The numerical stability characteristics of the resulting discretized equation with complex coefficients are examined. A number of cases for both the monostatic and bistatic configurations are considered. First, a uniform mean flow is considered and the RASS <span class="hlt">simulation</span> is performed for two different types of incident acoustic field, namely a short single frequency acoustic pulse and a continuous broadband acoustic source. Both the explicit and implicit schemes are examined and the mean flow velocity is determined from the spectrum of the backscattered electromagnetic signal with very good accuracy. Second, the Taylor and Oseen vortex models are considered and their velocity field along the incident electromagnetic beam is retrieved. The Abel transform is then applied to the velocity profiles determined by both</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577"><span id="translatedtitle">Evaluating the performance of <span class="hlt">parallel</span> subsurface <span class="hlt">simulators</span>: An illustrative example with PFLOTRAN</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Hammond, G E; Lichtner, P C; Mills, R T</p> <p>2014-01-01</p> <p>[1] To better inform the subsurface scientist on the expected performance of <span class="hlt">parallel</span> <span class="hlt">simulators</span>, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's <span class="hlt">parallel</span> layout and code design, PFLOTRAN's <span class="hlt">parallel</span> performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2005SPIE.6019..523L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2005SPIE.6019..523L"><span id="translatedtitle"><span class="hlt">Simulation</span> of optical devices using <span class="hlt">parallel</span> finite-difference time-domain method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Li, Kang; Kong, Fanmin; Mei, Liangmo; Liu, Xin</p> <p>2005-11-01</p> <p>This paper presents a new <span class="hlt">parallel</span> finite-difference time-domain (FDTD) numerical method in a low-cost network environment to stimulate optical waveguide characteristics. The PC motherboard based cluster is used, as it is relatively low-cost, reliable and has high computing performance. Four clusters are networked by fast Ethernet technology. Due to the simplicity nature of FDTD algorithm, a native Ethernet packet communication mechanism is used to reduce the overhead of the communication between the adjacent clusters. To validate the method, a microcavity ring resonator based on semiconductor waveguides is chosen as an instance of FDTD <span class="hlt">parallel</span> computation. Speed-up rate under different division density is calculated. From the result we can conclude that when the decomposing size reaches a certain point, a good <span class="hlt">parallel</span> computing speed up will be maintained. This <span class="hlt">simulation</span> shows that through the overlapping of computation and communication method and controlling the decomposing size, the overhead of the communication of the shared data will be conquered. The result indicates that the implementation can achieve significant speed up for the FDTD algorithm. This will enable us to tackle the larger real electromagnetic problem by the low-cost PC clusters.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22089677','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22089677"><span id="translatedtitle">A <span class="hlt">PARALLEL</span> MONTE CARLO CODE FOR <span class="hlt">SIMULATING</span> COLLISIONAL N-BODY SYSTEMS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.</p> <p>2013-02-15</p> <p>We present a new <span class="hlt">parallel</span> code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a <span class="hlt">parallel</span> random number generation scheme as well as a <span class="hlt">parallel</span> sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all <span class="hlt">simulations</span>. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the <span class="hlt">parallel</span> sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_13");'>13</a></li> <li><a href="#" onclick='return showDiv("page_14");'>14</a></li> <li class="active"><span>15</span></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_15 --> <div id="page_16" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_14");'>14</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li class="active"><span>16</span></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="301"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015A%26C....12..109H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015A%26C....12..109H"><span id="translatedtitle">L-PICOLA: A <span class="hlt">parallel</span> code for fast dark matter <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Howlett, C.; Manera, M.; Percival, W. J.</p> <p>2015-09-01</p> <p>Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-<span class="hlt">parallel</span> code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body <span class="hlt">simulation</span>. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the <span class="hlt">simulation</span> and <span class="hlt">simulate</span> the past lightcone at run-time, with optional replication of the <span class="hlt">simulation</span> volume. Through comparisons to fully non-linear N-Body <span class="hlt">simulations</span> we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24416069','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24416069"><span id="translatedtitle">Large-scale modeling of epileptic seizures: scaling properties of two <span class="hlt">parallel</span> neuronal network <span class="hlt">simulation</span> algorithms.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim</p> <p>2013-01-01</p> <p>Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale <span class="hlt">simulations</span> of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale <span class="hlt">simulations</span>. We have determined the detailed behavior of two such <span class="hlt">simulators</span> on <span class="hlt">parallel</span> computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our <span class="hlt">simulations</span> required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, <span class="hlt">simulations</span> of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/pages/biblio/1227736-large-scale-modeling-epileptic-seizures-scaling-properties-two-parallel-neuronal-network-simulation-algorithms','SCIGOV-DOEP'); return false;" href="http://www.osti.gov/pages/biblio/1227736-large-scale-modeling-epileptic-seizures-scaling-properties-two-parallel-neuronal-network-simulation-algorithms"><span id="translatedtitle">Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two <span class="hlt">Parallel</span> Neuronal Network <span class="hlt">Simulation</span> Algorithms</span></a></p> <p><a target="_blank" href="http://www.osti.gov/pages">DOE PAGESBeta</a></p> <p>Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; Visser, Sid; Stevens, Rick L.; Wildeman, Albert; van Drongelen, Wim</p> <p>2013-01-01</p> <p>Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale <span class="hlt">simulations</span> of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale <span class="hlt">simulations</span>. We have determinedmore » the detailed behavior of two such <span class="hlt">simulators</span> on <span class="hlt">parallel</span> computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our <span class="hlt">simulations</span> required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, <span class="hlt">simulations</span> of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/18334421','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/18334421"><span id="translatedtitle">Gait <span class="hlt">simulation</span> via a 6-DOF <span class="hlt">parallel</span> robot with iterative learning control.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Aubin, Patrick M; Cowley, Matthew S; Ledoux, William R</p> <p>2008-03-01</p> <p>We have developed a robotic gait <span class="hlt">simulator</span> (RGS) by leveraging a 6-degree of freedom <span class="hlt">parallel</span> robot, with the goal of overcoming three significant challenges of gait <span class="hlt">simulation</span>, including: 1) operating at near physiologically correct velocities; 2) inputting full scale ground reaction forces; and 3) <span class="hlt">simulating</span> motion in all three planes (sagittal, coronal and transverse). The robot will eventually be employed with cadaveric specimens, but as a means of exploring the capability of the system, we have first used it with a prosthetic foot. Gait data were recorded from one transtibial amputee using a motion analysis system and force plate. Using the same prosthetic foot as the subject, the RGS accurately reproduced the recorded kinematics and kinetics and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller. After six gait iterations the controller reduced the root mean square (RMS) error between the <span class="hlt">simulated</span> and in situ; vertical ground reaction force to 35 N during a 1.5 s <span class="hlt">simulation</span> of the stance phase of gait with a prosthetic foot. This paper addresses the design, methodology and validation of the novel RGS. PMID:18334421</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777"><span id="translatedtitle">PCSIM: A <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Environment for Neural Circuits Fully Integrated with Python</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus</p> <p>2008-01-01</p> <p>The <span class="hlt">Parallel</span> Circuit <span class="hlt">SIMulator</span> (PCSIM) is a software package for <span class="hlt">simulation</span> of neural circuits. It is primarily designed for distributed <span class="hlt">simulation</span> of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit <span class="hlt">simulator</span> with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural <span class="hlt">simulations</span>. PMID:19543450</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1093069','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1093069"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive fluid-structure interaction <span class="hlt">simulation</span> of explosions impacting on building structures</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Deiterding, Ralf; Wood, Stephen L</p> <p>2013-01-01</p> <p>We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside <span class="hlt">simulations</span> verifying the coupled fluid-structure solver and assessing its <span class="hlt">parallel</span> scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the <span class="hlt">simulation</span> of a prototypical blast explosion in a realistic multistory building are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20140009920','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20140009920"><span id="translatedtitle"><span class="hlt">Simulation</span>/Emulation Techniques: Compressing Schedules With <span class="hlt">Parallel</span> (HW/SW) Development</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Mangieri, Mark L.; Hoang, June</p> <p>2014-01-01</p> <p>NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of <span class="hlt">simulators</span> and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of <span class="hlt">simulators</span> and emulators in advance of deliverable hardware to achieve <span class="hlt">parallel</span> design and development on a compressed schedule.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015SPIE.9424E..0JB','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015SPIE.9424E..0JB"><span id="translatedtitle"><span class="hlt">Simulating</span> massively <span class="hlt">parallel</span> electron beam inspection for sub-20 nm defects</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt</p> <p>2015-03-01</p> <p>SEMATECH has initiated a program to develop massively-<span class="hlt">parallel</span> electron beam defect inspection (MPEBI). Here we use JMONSEL <span class="hlt">simulations</span> to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these <span class="hlt">simulated</span> images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1096496','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1096496"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> users%3CU%2B2019%3E guide, version 6.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.</p> <p>2013-08-01</p> <p>This manual describes the use of the Xyce <span class="hlt">Parallel</span> Electronic <span class="hlt">Simulator</span>. Xyce has been designed as a SPICE-compatible, high-performance analog circuit <span class="hlt">simulator</span>, and has been written to support the <span class="hlt">simulation</span> needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale <span class="hlt">parallel</span> computing platforms (up to thousands of processors). This includes support for most popular <span class="hlt">parallel</span> and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a <span class="hlt">parallel</span> code in the most general sense of the phrase - a message passing <span class="hlt">parallel</span> implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory <span class="hlt">parallel</span> platforms. Attention has been paid to the specific nature of circuit-<span class="hlt">simulation</span> problems to ensure that optimal <span class="hlt">parallel</span> efficiency is achieved as the number of processors grows.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1194329','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1194329"><span id="translatedtitle">A Many-Task <span class="hlt">Parallel</span> Approach for Multiscale <span class="hlt">Simulations</span> of Subsurface Flow and Reactive Transport</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.</p> <p>2014-12-16</p> <p>Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall <span class="hlt">simulation</span> domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in <span class="hlt">parallel</span>; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo <span class="hlt">simulations</span> of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing <span class="hlt">parallel</span> continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale <span class="hlt">simulations</span>, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface <span class="hlt">simulations</span> with minimal overhead.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4755232','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4755232"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan</p> <p>2015-01-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016APS..MAR.T1021D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016APS..MAR.T1021D"><span id="translatedtitle">Optimized <span class="hlt">simulations</span> of Olami-Feder-Christensen systems using <span class="hlt">parallel</span> algorithms</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Dominguez, Rachele; Necaise, Rance; Montag, Eric</p> <p></p> <p>The sequential nature of the Olami-Feder-Christensen (OFC) model for earthquake <span class="hlt">simulations</span> limits the benefits of <span class="hlt">parallel</span> computing approaches because of the frequent communication required between processors. We developed a <span class="hlt">parallel</span> version of the OFC algorithm for multi-core processors. Our data, even for relatively small system sizes and low numbers of processors, indicates that increasing the number of processors provides significantly faster <span class="hlt">simulations</span>; producing more efficient results than previous attempts that used network-based Beowulf clusters. Our algorithm optimizes performance by exploiting the multi-core processor architecture, minimizing communication time in contrast to the networked Beowulf-cluster approaches. Our multi-core algorithm is the basis for a new algorithm using GPUs that will drastically increase the number of processors available. Previous studies incorporating realistic structural features of faults into OFC models have revealed spatial and temporal patterns observed in real earthquake systems. The computational advances presented here will allow for studying interacting networks of faults, rather than individual faults, further enhancing our understanding of the relationship between the earth's structure and the triggering process. Support for this project comes from the Chenery Research Fund, the Rashkind Family Endowment, the Walter Williams Craigie Teaching Endowment, and the Schapiro Undergraduate Research Fellowship.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26123630','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26123630"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C</p> <p>2015-08-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/16851481','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/16851481"><span id="translatedtitle">On the efficiency of exchange in <span class="hlt">parallel</span> tempering monte carlo <span class="hlt">simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Predescu, Cristian; Predescu, Mihaela; Ciobanu, Cristian V</p> <p>2005-03-10</p> <p>We introduce the concept of effective fraction, defined as the expected probability that a configuration from the lowest index replica successfully reaches the highest index replica during a replica exchange Monte Carlo <span class="hlt">simulation</span>. We then argue that the effective fraction represents an adequate measure of the quality of the sampling technique, as far as swapping is concerned. Under the hypothesis that the correlation between successive exchanges is negligible, we propose a technique for the computation of the effective fraction, a technique that relies solely on the values of the acceptance probabilities obtained at the end of the <span class="hlt">simulation</span>. The effective fraction is then utilized for the study of the efficiency of a popular swapping scheme in the context of <span class="hlt">parallel</span> tempering in the canonical ensemble. For large dimensional oscillators, we show that the swapping probability that minimizes the computational effort is 38.74%. By studying the <span class="hlt">parallel</span> tempering swapping efficiency for a 13-atom Lennard-Jones cluster, we argue that the value of 38.74% remains roughly the optimal probability for most systems with continuous distributions that are likely to be encountered in practice. PMID:16851481</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W"><span id="translatedtitle">MDSLB: A new static load balancing method for <span class="hlt">parallel</span> molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang</p> <p>2014-02-01</p> <p>Large-scale <span class="hlt">parallelization</span> of molecular dynamics <span class="hlt">simulations</span> is facing challenges which seriously affect the <span class="hlt">simulation</span> efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in <span class="hlt">parallel</span>, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.204..107B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.204..107B"><span id="translatedtitle">A scalable <span class="hlt">parallel</span> Stokesian Dynamics method for the <span class="hlt">simulation</span> of colloidal suspensions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bülow, F.; Hamberger, P.; Nirschl, H.; Dörfler, W.</p> <p>2016-07-01</p> <p>We have developed a new method for the efficient numerical <span class="hlt">simulation</span> of colloidal suspensions. This method is designed and especially well-suited for <span class="hlt">parallel</span> code execution, but it can also be applied to single-core programs. It combines the Stokesian Dynamics method with a variant of the widely used Barnes-Hut algorithm in order to reduce computational costs. This combination and the inherent <span class="hlt">parallelization</span> of the method make <span class="hlt">simulations</span> of large numbers of particles within days possible. The level of accuracy can be determined by the user and is limited by the truncation of the used multipole expansion. Compared to the original Stokesian Dynamics method the complexity can be reduced from O(N2) to linear complexity for dilute suspensions of strongly clustered particles, N being the number of particles. In case of non-clustered particles in a dense suspension, the complexity depends on the particle configuration and is between O(N) and O(Pnp,max2) , where P is the number of used processes and np,max = ⌈ N / P ⌉ , respectively.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24732497','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24732497"><span id="translatedtitle">pWeb: A High-Performance, <span class="hlt">Parallel</span>-Computing Framework for Web-Browser-Based Medical <span class="hlt">Simulation</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Halic, Tansel; Ahn, Woojin; De, Suvranu</p> <p>2014-01-01</p> <p>This work presents a pWeb - a new language and compiler for <span class="hlt">parallelization</span> of client-side compute intensive web applications such as surgical <span class="hlt">simulations</span>. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical <span class="hlt">simulations</span> and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of <span class="hlt">parallel</span> programming languages as well as the fork/join <span class="hlt">parallel</span> model which is not supported by web workers. The language compiler automatically generates an equivalent <span class="hlt">parallel</span> script that complies with the HTML5 standard. A case study on realistic rendering for surgical <span class="hlt">simulations</span> demonstrates enhanced performance with a compact set of instructions. PMID:24732497</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CG.....89..174K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CG.....89..174K"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">simulation</span> of particle transport in an advection field applied to volcanic explosive eruptions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Künzli, Pierre; Tsunematsu, Kae; Albuquerque, Paul; Falcone, Jean-Luc; Chopard, Bastien; Bonadonna, Costanza</p> <p>2016-04-01</p> <p>Volcanic ash transport and dispersal models typically describe particle motion via a turbulent velocity field. Particles are advected inside this field from the moment they leave the vent of the volcano until they deposit on the ground. Several techniques exist to <span class="hlt">simulate</span> particles in an advection field such as finite difference Eulerian, Lagrangian-puff or pure Lagrangian techniques. In this paper, we present a new flexible <span class="hlt">simulation</span> tool called TETRAS (TEphra TRAnsport <span class="hlt">Simulator</span>) based on a hybrid Eulerian-Lagrangian model. This scheme offers the advantages of being numerically stable with no numerical diffusion and easily parallelizable. It also allows us to output particle atmospheric concentration or ground mass load at any given time. The model is validated using the advection-diffusion analytical equation. We also obtained a good agreement with field observations of the tephra deposit associated with the 2450 BP Pululagua (Ecuador) and the 1996 Ruapehu (New Zealand) eruptions. As this kind of model can lead to computationally intensive <span class="hlt">simulations</span>, a <span class="hlt">parallelization</span> on a distributed memory architecture was developed. A related performance model, taking into account load imbalance, is proposed and its accuracy tested.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19940023055','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19940023055"><span id="translatedtitle">Efficient massively <span class="hlt">parallel</span> <span class="hlt">simulation</span> of dynamic channel assignment schemes for wireless cellular communications</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.</p> <p>1994-01-01</p> <p>Fast, efficient <span class="hlt">parallel</span> algorithms are presented for discrete event <span class="hlt">simulations</span> of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To <span class="hlt">simulate</span> the model, conservative methods are used; i.e., methods in which no errors occur in the course of the <span class="hlt">simulation</span> and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial <span class="hlt">simulation</span> running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200..324N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200..324N"><span id="translatedtitle">MaMiCo: Software design for <span class="hlt">parallel</span> molecular-continuum flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim</p> <p>2016-03-01</p> <p>The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum <span class="hlt">simulations</span>, retaining sequential and <span class="hlt">parallel</span> performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow <span class="hlt">simulations</span> which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum <span class="hlt">simulations</span>. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid <span class="hlt">simulations</span> is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_14");'>14</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li class="active"><span>16</span></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_16 --> <div id="page_17" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li class="active"><span>17</span></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="321"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012JASMS..23.1609S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012JASMS..23.1609S"><span id="translatedtitle">Application of <span class="hlt">Parallel</span> Hybrid Algorithm in Massively <span class="hlt">Parallel</span> GPGPU—The Improved Effective and Efficient Method for Calculating Coulombic Interactions in <span class="hlt">Simulations</span> of Many Ions with SIMION</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Saito, Kenichiro; Koizumi, Eiko; Koizumi, Hideya</p> <p>2012-09-01</p> <p>In our previous study, we introduced a new hybrid approach to effectively approximate the total force on each ion during a trajectory calculation in mass spectrometry device <span class="hlt">simulations</span>, and the algorithm worked successfully with SIMION. We took one step further and applied the method in massively <span class="hlt">parallel</span> general-purpose computing with GPU (GPGPU) to test its performance in <span class="hlt">simulations</span> with thousands to over a million ions. We took extra care to minimize the barrier synchronization and data transfer between the host (CPU) and the device (GPU) memory, and took full advantage of the latency hiding. <span class="hlt">Parallel</span> codes were written in CUDA C++ and implemented to SIMION via the user-defined Lua program. In this study, we tested the <span class="hlt">parallel</span> hybrid algorithm with a couple of basic models and analyzed the performance by comparing it to that of the original, fully-explicit method written in serial code. The Coulomb explosion <span class="hlt">simulation</span> with 128,000 ions was completed in 309 s, over 700 times faster than the 63 h taken by the original explicit method in which we evaluated two-body Coulomb interactions explicitly on one ion with each of all the other ions. The <span class="hlt">simulation</span> of 1,024,000 ions was completed in 2650 s. In another example, we applied the hybrid method on a <span class="hlt">simulation</span> of ions in a simple quadrupole ion storage model with 100,000 ions, and it only took less than 10 d. Based on our estimate, the same <span class="hlt">simulation</span> is expected to take 5-7 y by the explicit method in serial code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920870','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920870"><span id="translatedtitle">6th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Schulz, M; Trinitis, C</p> <p>2007-07-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and many-core, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in <span class="hlt">parallel</span> algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 6th year ParSim continues to build a bridge between computer science and the application disciplines and to help with fostering cooperations between the different fields. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a shorter turn-around time. This offers the unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, ten papers with authors in ten countries were submitted to ParSim, and after a quick turn-around, yet thorough review process we decided to accept three of them for publication and presentation during the ParSim session. These three papers show the use of <span class="hlt">simulation</span> in a range of different application fields including earthquake and turbulence <span class="hlt">simulation</span>. At the same time, they also address computer science aspects and discuss different <span class="hlt">parallelization</span> strategies, programming models and environments, as well as scalability. We are confident that this provides an attractive program and that ParSim will yet again be an informal setting for lively discussions and for fostering new</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011AGUFMIN11D..03W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011AGUFMIN11D..03W"><span id="translatedtitle">An evaluation of <span class="hlt">parallelization</span> strategies for low-frequency electromagnetic induction <span class="hlt">simulators</span> using staggered grid discretizations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Weiss, C. J.; Schultz, A.</p> <p>2011-12-01</p> <p>The high computational cost of the forward solution for modeling low-frequency electromagnetic induction phenomena is one of the primary impediments against broad-scale adoption by the geoscience community of exploration techniques, such as magnetotellurics and geomagnetic depth sounding, that rely on fast and cheap forward solutions to make tractable the inverse problem. As geophysical observables, electromagnetic fields are direct indicators of Earth's electrical conductivity - a physical property independent of (but in some cases correlative with) seismic wavespeed. Electrical conductivity is known to be a function of Earth's physiochemical state and temperature, and to be especially sensitive to the presence of fluids, melts and volatiles. Hence, electromagnetic methods offer a critical and independent constraint on our understanding of Earth's interior processes. Existing methods for <span class="hlt">parallelization</span> of time-harmonic electromagnetic <span class="hlt">simulators</span>, as applied to geophysics, have relied heavily on a combination of strategies: coarse-grained decompositions of the model domain; and/or, a high-order functional decomposition across spectral components, which in turn can be domain-decomposed themselves. Hence, in terms of scaling, both approaches are ultimately limited by the growing communication cost as the granularity of the forward problem increases. In this presentation we examine alternate <span class="hlt">parallelization</span> strategies based on OpenMP shared-memory <span class="hlt">parallelization</span> and CUDA-based GPU <span class="hlt">parallelization</span>. As a test case, we use two different numerical <span class="hlt">simulation</span> packages, each based on a staggered Cartesian grid: FDM3D (Weiss, 2006) which solves the curl-curl equation directly in terms of the scattered electric field (available under the LGPL at www.openem.org); and APHID, the A-Phi Decomposition based on mixed vector and scalar potentials, in which the curl-curl operator is replaced operationally by the vector Laplacian. We describe progress made in modifying the code to</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/6818542','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/6818542"><span id="translatedtitle">Forced-to-natural convection transition tests in <span class="hlt">parallel</span> <span class="hlt">simulated</span> liquid metal reactor fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Levin, A.E. ); Montgomery, B.H. )</p> <p>1990-01-01</p> <p>The Thermal-Hydraulic Out of Reactor Safety (THORS) Program at Oak Ridge National Laboratory (ORNL) had as its objective the testing of <span class="hlt">simulated</span>, electrically heated liquid metal reactor (LMR) fuel assemblies in an engineering-scale, sodium loop. Between 1971 and 1985, the THORS Program operated 11 <span class="hlt">simulated</span> fuel bundles in conditions covering a wide range of normal and off-normal conditions. The last test series in the Program, THORS-SHRS Assembly 1, employed two <span class="hlt">parallel</span>, 19-pin, full-length, <span class="hlt">simulated</span> fuel assemblies of a design consistent with the large LMR (Large Scale Prototype Breeder -- LSPB) under development at that time. These bundles were installed in the THORS Facility, allowing single- and <span class="hlt">parallel</span>-bundle testing in thermal-hydraulic conditions up to and including sodium boiling and dryout. As the name SHRS (Shutdown Heat Removal System) implies, a major objective of the program was testing under conditions expected during low-power reactor operation, including low-flow forced convection, natural convection, and forced-to-natural convection transition at various powers. The THORS-SHRS Assembly 1 experimental program was divided up into four phases. Phase 1 included preliminary and shakedown tests, including the collection of baseline steady-state thermal-hydraulic data. Phase 2 comprised natural convection testing. Forced convection testing was conducted in Phase 3. The final phase of testing included forced-to-natural convection transition tests. Phases 1, 2, and 3 have been discussed in previous papers. The fourth phase is described in this paper. 3 refs., 2 figs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://files.eric.ed.gov/fulltext/ED210037.pdf','ERIC'); return false;" href="http://files.eric.ed.gov/fulltext/ED210037.pdf"><span id="translatedtitle">Multiple-Instruction, Multiple-Data Path Computers: <span class="hlt">Parallel</span> Processing Impact on Flight <span class="hlt">Simulation</span> Software. Final Report.</span></a></p> <p><a target="_blank" href="http://www.eric.ed.gov/ERICWebPortal/search/extended.jsp?_pageLabel=advanced">ERIC Educational Resources Information Center</a></p> <p>Lord, Robert E.; And Others</p> <p></p> <p>The purpose of this study was to evaluate the <span class="hlt">parallel</span> processing impact of multiple-instruction multiple-data path (MIMD) computers on flight <span class="hlt">simulation</span> software. Basic mathematical functions and arithmetic expressions from typical flight <span class="hlt">simulation</span> software were selected and run on an MIMD computer to evaluate the improvement in execution time…</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910032684&hterms=elements+hardware&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D60%26Ntt%3Delements%2Bhardware','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910032684&hterms=elements+hardware&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D60%26Ntt%3Delements%2Bhardware"><span id="translatedtitle">A comparison of real-time blade-element and rotor-map helicopter <span class="hlt">simulations</span> using <span class="hlt">parallel</span> processing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Corliss, Lloyd; Du Val, Ronald W.; Gillman, Herbert, III; Huynh, Loc C.</p> <p>1990-01-01</p> <p>In recent efforts by NASA, the Army, and Advanced Rotorcraft Technology, Inc. (ART), the application of <span class="hlt">parallel</span> processing techniques to real-time <span class="hlt">simulation</span> have been studied. Traditionally, real-time helicopter <span class="hlt">simulations</span> have omitted the modeling of high-frequency phenomena in order to achieve real-time operation on affordable computers. <span class="hlt">Parallel</span> processing technology can now provide the means for significantly improving the fidelity of real-time <span class="hlt">simulation</span>, and one specific area for improvement is the modeling of rotor dynamics. This paper focuses on the results of a piloted <span class="hlt">simulation</span> in which a traditional rotor-map mathematical model was compared with a more sophisticated blade-element mathematical model that had been implemented using <span class="hlt">parallel</span> processing hardware and software technology.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013AGUFMIN23A1416G&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013AGUFMIN23A1416G&link_type=ABSTRACT"><span id="translatedtitle">Accelerating Dust Storm <span class="hlt">Simulation</span> by Balancing Task Allocation in <span class="hlt">Parallel</span> Computing Environment</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.</p> <p>2013-12-01</p> <p>Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm <span class="hlt">simulation</span> is a data and computing intensive process. Normally, a <span class="hlt">simulation</span> for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a <span class="hlt">parallel</span> fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm <span class="hlt">simulation</span>, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the <span class="hlt">paralleling</span>. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/966572','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/966572"><span id="translatedtitle">8th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Trinitis, C; Bader, M; Schulz, M</p> <p>2009-06-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Significant progress in CPU architecture (multi- and many-core CPUs, SMT, transactional memory, virtualization support, shared caches etc.) system scalability, and interconnect technology, continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 8th year, ParSim continues to build a bridge between application disciplines and computer science and to help fostering closer cooperations between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. We believe that this offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables participants to present and discuss their work within the scope of both the session and the host conference. This year, five papers from authors in five countries were submitted to Par-Sim, and we selected three of them. They cover a range of different application fields including mechanical engineering, material science, and structural engineering <span class="hlt">simulations</span>. We are confident that this resulted in an attractive special session and that this will be an informal setting for lively discussions as well as for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Jan Westerholm, Juha</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1996gmu..rept.....W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1996gmu..rept.....W"><span id="translatedtitle">Development of a Massively <span class="hlt">Parallel</span> Particle-Mesh Algorithm for <span class="hlt">Simulations</span> of Galaxy Dynamics and Plasmas</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wallin, John</p> <p>1996-01-01</p> <p>Particle-mesh calculations treat forces and potentials as field quantities which are represented approximately on a mesh. A system of particles is mapped onto this mesh as a density distribution of mass or charge. The Fourier transform is used to convolve this distribution with the Green's function of the potential, and a finite difference scheme is used to calculate the forces acting on the particles. The computation time scales as the Ng log Ng, where Ng is the size of the computational grid. In contrast, the particle-particle method's computing time relies on direct summation, so the time for each calculation is given by Np2, where Np is the number of particles. The particle-mesh method is best suited for <span class="hlt">simulations</span> with a fixed minimum resolution and for collisionless systems, while hierarchical tree codes have proven to be superior for collisional systems where two-body interactions are important. Particle mesh methods still dominate in plasma physics where collisionless systems are modeled. The CM-200 Connection Machine produced by Thinking Machines Corp. is a data <span class="hlt">parallel</span> system. On this system, the front-end computer controls the timing and execution of the <span class="hlt">parallel</span> processing units. The programming paradigm is Single-Instruction, Multiple Data (SIMD). The processors on the CM-200 are connected in an N-dimensional hypercube; the largest number of links a message will ever have to make is N. As in all <span class="hlt">parallel</span> computing, the efficiency of an algorithm is primarily determined by the fraction of the time spent communicating compared to that spent computing. Because of the topology of the processors, nearest neighbor communication is more efficient than general communication.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/974699','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/974699"><span id="translatedtitle">Massively <span class="hlt">parallel</span> <span class="hlt">simulation</span> with DOE's ASCI supercomputers : an overview of the Los Alamos Crestone project</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Weaver, R. P.; Gittings, M. L.</p> <p>2004-01-01</p> <p>The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively <span class="hlt">parallel</span> hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-<span class="hlt">parallel</span> versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest <span class="hlt">simulations</span> we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these <span class="hlt">simulations</span>. Key features of any massively <span class="hlt">parallel</span> system</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014PhDT........13Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014PhDT........13Z"><span id="translatedtitle">Scalable <span class="hlt">parallel</span> programming for high performance seismic <span class="hlt">simulation</span> on petascale heterogeneous supercomputers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhou, Jun</p> <p></p> <p>The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale <span class="hlt">simulations</span> are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" <span class="hlt">simulations</span> with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable <span class="hlt">parallel</span> programming techniques for high performance seismic <span class="hlt">simulation</span> running on petascale heterogeneous supercomputers. A real world earthquake <span class="hlt">simulation</span> code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake <span class="hlt">simulation</span> workflow has also been developed to support the efficient production sets of <span class="hlt">simulations</span>. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhDT.......119R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhDT.......119R"><span id="translatedtitle"><span class="hlt">Parallel</span> Algorithms for Monte Carlo Particle Transport <span class="hlt">Simulation</span> on Exascale Computing Architectures</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Romano, Paul Kollath</p> <p></p> <p>Monte Carlo particle transport methods are being considered as a viable option for high-fidelity <span class="hlt">simulation</span> of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in <span class="hlt">parallel</span> efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear <span class="hlt">parallel</span> scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the <span class="hlt">simulation</span>. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed <span class="hlt">simulations</span>. The analysis demonstrated that load imbalances in domain decomposed <span class="hlt">simulations</span> arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006CoPhC.175..440B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006CoPhC.175..440B"><span id="translatedtitle">A package of Linux scripts for the <span class="hlt">parallelization</span> of Monte Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Badal, Andreu; Sempau, Josep</p> <p>2006-09-01</p> <p>Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) <span class="hlt">simulation</span> involves a prohibitively large amount of time. This limitation can be overcome by having recourse to <span class="hlt">parallel</span> computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a <span class="hlt">parallelization</span> scheme of a MC <span class="hlt">simulation</span> that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the <span class="hlt">parallel</span> calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to <span class="hlt">parallelize</span> PENELOPE, a code for MC <span class="hlt">simulation</span> of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in <span class="hlt">parallel</span> easily, without requiring specific libraries or significant alterations of the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016ApJ...823....7H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016ApJ...823....7H"><span id="translatedtitle">Ion Dynamics at a Rippled Quasi-<span class="hlt">parallel</span> Shock: 2D Hybrid <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hao, Yufei; Lu, Quanming; Gao, Xinliang; Wang, Shui</p> <p>2016-05-01</p> <p>In this paper, two-dimensional hybrid <span class="hlt">simulations</span> are performed to investigate ion dynamics at a rippled quasi-<span class="hlt">parallel</span> shock. The results show that the ripples around the shock front are inherent structures of a quasi-<span class="hlt">parallel</span> shock, and the re-formation of the shock is not synchronous along the surface of the shock front. By following the trajectories of the upstream ions, we find that these ions behave differently when they interact with the shock front at different positions along the shock surface. The upstream particles are transmitted more easily through the upper part of a ripple, and the corresponding bulk velocity downstream is larger, where a high-speed jet is formed. In the lower part of the ripple, the upstream particles tend to be reflected by the shock. Ions reflected by the shock may suffer multiple-stage acceleration when moving along the shock surface or trapped between the upstream waves and the shock front. Finally, these ions may escape further upstream or move downstream; therefore, superthermal ions can be found both upstream and downstream.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S"><span id="translatedtitle">MPI <span class="hlt">parallelization</span> of Vlasov codes for the <span class="hlt">simulation</span> of nonlinear laser-plasma interactions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.</p> <p>2003-10-01</p> <p>The <span class="hlt">simulation</span> of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive <span class="hlt">parallelization</span>. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for <span class="hlt">parallel</span> computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880008905','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880008905"><span id="translatedtitle">Experiences with serial and <span class="hlt">parallel</span> algorithms for channel routing using <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Brouwer, Randall Jay</p> <p>1988-01-01</p> <p>Two algorithms for channel routing using <span class="hlt">simulated</span> annealing are presented. <span class="hlt">Simulated</span> annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of <span class="hlt">simulated</span> annealing. The algorithm was implemented as a serial program for a workstation, and a <span class="hlt">parallel</span> program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010PhPl...17g3107W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010PhPl...17g3107W"><span id="translatedtitle">Three-dimensional <span class="hlt">parallel</span> UNIPIC-3D code for <span class="hlt">simulations</span> of high-power microwave devices</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan</p> <p>2010-07-01</p> <p>This paper introduces a self-developed, three-dimensional <span class="hlt">parallel</span> fully electromagnetic particle <span class="hlt">simulation</span> code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to <span class="hlt">simulate</span> the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the <span class="hlt">simulated</span> HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20160006398','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20160006398"><span id="translatedtitle"><span class="hlt">Parallel</span> Adjective High-Order CFD <span class="hlt">Simulations</span> Characterizing SOFIA Cavity Acoustics</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak</p> <p>2016-01-01</p> <p>This paper presents large-scale MPI-<span class="hlt">parallel</span> computational uid dynamics <span class="hlt">simulations</span> for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These <span class="hlt">simulations</span> focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy <span class="hlt">simulations</span>. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1050409','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1050409"><span id="translatedtitle">Mechanisms for the convergence of time-<span class="hlt">parallelized</span>, parareal turbulent plasma <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Reynolds-Barredo, J.; Newman, David E; Sanchez, R.; Samaddar, D.; Berry, Lee A; Elwasif, Wael R</p> <p>2012-01-01</p> <p>Parareal is a recent algorithm able to <span class="hlt">parallelize</span> the time dimension in spite of its sequential nature. It has been applied to several linear and nonlinear problems and, very recently, to a <span class="hlt">simulation</span> of fully-developed, two-dimensional drift wave turbulence. The mere fact that parareal works in such a turbulent regime is in itself somewhat unexpected, due to the characteristic sensitivity of turbulence to any change in initial conditions. This fundamental property of any turbulent system should render the iterative correction procedure characteristic of the parareal method inoperative, but this seems not to be the case. In addition, the choices that must be made to implement parareal (division of the temporal domain, election of the coarse solver and so on) are currently made using trial-and-error approaches. Here, we identify the mechanisms responsible for the convergence of parareal of these <span class="hlt">simulations</span> of drift wave turbulence. We also investigate which conditions these mechanisms impose on any successful parareal implementation. The results reported here should be useful to guide future implementations of parareal within the much wider context of fully-developed fluid and plasma turbulent <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..MARM27008W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..MARM27008W"><span id="translatedtitle">Large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> of short pulse laser interaction with metals</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Chengping; Zhigilei, Leonid; Computational Materials Group Team</p> <p>2014-03-01</p> <p>Taking advantage of petascale supercomputing architectures, large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> (108-109 atoms) are performed to study the microscopic mechanisms of short pulse laser interaction with metals. The results of the <span class="hlt">simulations</span> reveal a complex picture of highly non-equilibrium processes responsible for material modification and/or ejection. At low laser fluences below the ablation threshold, fast melting and resolidification occur under conditions of extreme heating and cooling rates resulting in surface microstructure modification. At higher laser fluences in the spallation regime, the material is ejected by the relaxation of laser-induced stresses and proceeds through the nucleation, growth and percolation of multiple voids in the sub-surface region of the irradiated target. At a fluence of ~ 2.5 times the spallation threshold, the top part of the target reaches the conditions for an explosive decomposition into vapor and small droplets, marking the transition to the phase explosion regime of laser ablation. The dynamics of plume formation and the characteristics of the ablation plume are obtained from the <span class="hlt">simulations</span> and compared with the results of time-resolved plume imaging experiments. Financial support for this work was provided by NSF (DMR-0907247 and CMMI-1301298) and AFOSR (FA9550-10-1-0541). Computational support was provided by the OLCF (MAT048) and XSEDE (TG-DMR110090).</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li class="active"><span>17</span></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_17 --> <div id="page_18" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="341"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015APS..DFD.E9006P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015APS..DFD.E9006P"><span id="translatedtitle">A 3D MPI-<span class="hlt">Parallel</span> GPU-accelerated framework for <span class="hlt">simulating</span> ocean wave energy converters</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Pathak, Ashish; Raessi, Mehdi</p> <p>2015-11-01</p> <p>We present an MPI-<span class="hlt">parallel</span> GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC <span class="hlt">simulations</span>. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables <span class="hlt">simulations</span> at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical <span class="hlt">simulations</span> of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22043417','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22043417"><span id="translatedtitle">The role of the electron convection term for the <span class="hlt">parallel</span> electric field and electron acceleration in MHD <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.</p> <p>2011-08-15</p> <p>There has been a great concern about the origin of the <span class="hlt">parallel</span> electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to <span class="hlt">simulate</span> magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with <span class="hlt">simulation</span> results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when <span class="hlt">parallel</span> current flows. The electron convection term enables us to describe a situation in which a <span class="hlt">parallel</span> electric field and <span class="hlt">parallel</span> electron acceleration coexist, which is impossible for ideal or resistive MHD.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016CoPhC.204...74Z&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016CoPhC.204...74Z&link_type=ABSTRACT"><span id="translatedtitle"><span class="hlt">Parallel</span> two-level domain decomposition based Jacobi-Davidson algorithms for pyramidal quantum dot <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhao, Tao; Hwang, Feng-Nan; Cai, Xiao-Chuan</p> <p>2016-07-01</p> <p>We consider a quintic polynomial eigenvalue problem arising from the finite volume discretization of a quantum dot <span class="hlt">simulation</span> problem. The problem is solved by the Jacobi-Davidson (JD) algorithm. Our focus is on how to achieve the quadratic convergence of JD in a way that is not only efficient but also scalable when the number of processor cores is large. For this purpose, we develop a projected two-level Schwarz preconditioned JD algorithm that exploits multilevel domain decomposition techniques. The pyramidal quantum dot calculation is carefully studied to illustrate the efficiency of the proposed method. Numerical experiments confirm that the proposed method has a good scalability for problems with hundreds of millions of unknowns on a <span class="hlt">parallel</span> computer with more than 10,000 processor cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/372178','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/372178"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span>, and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, R.L.</p> <p>1996-07-01</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed /and its operating modes for a 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using a constant current model and the Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of a 1-kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500-W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging power factor (PF) mode for the entire load range and requires a narrow variation in switching frequency, to adequately regulate the output power.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W"><span id="translatedtitle"><span class="hlt">Simulation</span> of the Quasi-Monoenergetic Protons Generation by <span class="hlt">Parallel</span> Laser Pulses Interaction with Foils</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu</p> <p>2014-11-01</p> <p>A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-<span class="hlt">parallel</span> laser pulses is proposed. Particle-in-cell <span class="hlt">simulation</span> shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 × 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4° is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014snam.conf04304F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014snam.conf04304F"><span id="translatedtitle">Hybrid <span class="hlt">parallel</span> strategy for the <span class="hlt">simulation</span> of fast transient accidental situations at reactor scale</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.</p> <p>2014-06-01</p> <p>This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to <span class="hlt">simulate</span> the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The <span class="hlt">parallel</span> acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/19051924','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/19051924"><span id="translatedtitle">Non-equilibrium molecular dynamics <span class="hlt">simulation</span> of nanojet injection with adaptive-spatial decomposition <span class="hlt">parallel</span> algorithm.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shin, Hyun-Ho; Yoon, Woong-Sup</p> <p>2008-07-01</p> <p>An Adaptive-Spatial Decomposition <span class="hlt">parallel</span> algorithm was developed to increase computation efficiency for molecular dynamics <span class="hlt">simulations</span> of nano-fluids. Injection of a liquid argon jet with a scale of 17.6 molecular diameters was investigated. A solid annular platinum injector was also solved simultaneously with the liquid injectant by adopting a solid modeling technique which incorporates phantom atoms. The viscous heat was naturally discharged through the solids so the liquid boiling problem was avoided with no separate use of temperature controlling methods. Parametric investigations of injection speed, wall temperature, and injector length were made. A sudden pressure drop at the orifice exit causes flash boiling of the liquid departing the nozzle exit with strong evaporation on the surface of the liquids, while rendering a slender jet. The elevation of the injection speed and the wall temperature causes an activation of the surface evaporation concurrent with reduction in the jet breakup length and the drop size. PMID:19051924</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1165004','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1165004"><span id="translatedtitle">Acceleration of the matrix multiplication of Radiance three phase daylighting <span class="hlt">simulations</span> with <span class="hlt">parallel</span> computing on heterogeneous hardware of personal computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.</p> <p>2013-05-23</p> <p>Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting <span class="hlt">simulation</span> program, has been used to conduct daylighting <span class="hlt">simulations</span> for complex fenestration systems. Depending on the configurations, the <span class="hlt">simulation</span> can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight <span class="hlt">simulation</span> by conducting <span class="hlt">parallel</span> computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in <span class="hlt">parallel</span> using OpenCL. The speed of new approach was evaluated using various daylighting <span class="hlt">simulation</span> cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting <span class="hlt">simulation</span>, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3306636','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3306636"><span id="translatedtitle">Macro-scale phenomena of arterial coupled cells: a massively <span class="hlt">parallel</span> <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Shaikh, Mohsin Ahmed; Wall, David J. N.; David, Tim</p> <p>2012-01-01</p> <p>Impaired mass transfer characteristics of blood-borne vasoactive species such as adenosine triphosphate in regions such as an arterial bifurcation have been hypothesized as a prospective mechanism in the aetiology of atherosclerotic lesions. Arterial endothelial cells (ECs) and smooth muscle cells (SMCs) respond differentially to altered local haemodynamics and produce coordinated macro-scale responses via intercellular communication. Using a computationally designed arterial segment comprising large populations of mathematically modelled coupled ECs and SMCs, we investigate their response to spatial gradients of blood-borne agonist concentrations and the effect of micro-scale-driven perturbation on the macro-scale. Altering homocellular (between same cell type) and heterocellular (between different cell types) intercellular coupling, we <span class="hlt">simulated</span> four cases of normal and pathological arterial segments experiencing an identical gradient in the concentration of the agonist. Results show that the heterocellular calcium (Ca2+) coupling between ECs and SMCs is important in eliciting a rapid response when the vessel segment is stimulated by the agonist gradient. In the absence of heterocellular coupling, homocellular Ca2+ coupling between SMCs is necessary for propagation of Ca2+ waves from downstream to upstream cells axially. Desynchronized intracellular Ca2+ oscillations in coupled SMCs are mandatory for this propagation. Upon decoupling the heterocellular membrane potential, the arterial segment looses the inhibitory effect of ECs on the Ca2+ dynamics of the underlying SMCs. The full system comprises hundreds of thousands of coupled nonlinear ordinary differential equations <span class="hlt">simulated</span> on the massively <span class="hlt">parallel</span> Blue Gene architecture. The use of massively <span class="hlt">parallel</span> computational architectures shows the capability of this approach to address macro-scale phenomena driven by elementary micro-scale components of the system. PMID:21920960</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/10108404','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/10108404"><span id="translatedtitle">Implementation of a <span class="hlt">parallel</span> algorithm for thermo-chemical nonequilibrium flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.</p> <p>1995-01-01</p> <p>Massively <span class="hlt">parallel</span> (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to <span class="hlt">simulate</span> a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale <span class="hlt">simulation</span> with a large number of computer nodes. This paper will discuss the implementation of a massively <span class="hlt">parallel</span> computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008SPIE.6924E..0YT','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008SPIE.6924E..0YT"><span id="translatedtitle">Massively-<span class="hlt">parallel</span> FDTD <span class="hlt">simulations</span> to address mask electromagnetic effects in hyper-NA immersion lithography</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Tirapu Azpiroz, Jaione; Burr, Geoffrey W.; Rosenbluth, Alan E.; Hibbs, Michael</p> <p>2008-03-01</p> <p>In the Hyper-NA immersion lithography regime, the electromagnetic response of the reticle is known to deviate in a complicated manner from the idealized Thin-Mask-like behavior. Already, this is driving certain RET choices, such as the use of polarized illumination and the customization of reticle film stacks. Unfortunately, full 3-D electromagnetic mask <span class="hlt">simulations</span> are computationally intensive. And while OPC-compatible mask electromagnetic field (EMF) models can offer a reasonable tradeoff between speed and accuracy for full-chip OPC applications, full understanding of these complex physical effects demands higher accuracy. Our paper describes recent advances in leveraging High Performance Computing as a critical step towards lithographic modeling of the full manufacturing process. In this paper, highly accurate full 3-D electromagnetic <span class="hlt">simulation</span> of very large mask layouts are conducted in <span class="hlt">parallel</span> with reasonable turnaround time, using a Blue- Gene/L supercomputer and a Finite-Difference Time-Domain (FDTD) code developed internally within IBM. A 3-D <span class="hlt">simulation</span> of a large 2-D layout spanning 5μm×5μm at the wafer plane (and thus (20μm×20μm×0.5μm at the mask) results in a <span class="hlt">simulation</span> with roughly 12.5GB of memory (grid size of 10nm at the mask, single-precision computation, about 30 bytes/grid point). FDTD is flexible and easily parallelizable to enable full <span class="hlt">simulations</span> of such large layout in approximately an hour using one BlueGene/L "midplane" containing 512 dual-processor nodes with 256MB of memory per processor. Our scaling studies on BlueGene/L demonstrate that <span class="hlt">simulations</span> up to 100μm × 100μm at the mask can be computed in a few hours. Finally, we will show that the use of a subcell technique permits accurate <span class="hlt">simulation</span> of features smaller than the grid discretization, thus improving on the tradeoff between computational complexity and <span class="hlt">simulation</span> accuracy. We demonstrate the correlation of the real and quadrature components that comprise the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010JGRB..11512101H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010JGRB..11512101H"><span id="translatedtitle">A <span class="hlt">parallel</span> 3-D staggered grid pseudospectral time domain method for ground-penetrating radar wave <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Huang, Qinghua; Li, Zhanhui; Wang, Yanbin</p> <p>2010-12-01</p> <p>We presented a <span class="hlt">parallel</span> 3-D staggered grid pseudospectral time domain (PSTD) method for <span class="hlt">simulating</span> ground-penetrating radar (GPR) wave propagation. We took the staggered grid method to weaken the global effect in PSTD and developed a modified fast Fourier transform (FFT) spatial derivative operator to eliminate the wraparound effect due to the implicit periodical boundary condition in FFT operator. After the above improvements, we achieved the <span class="hlt">parallel</span> PSTD computation based on an overlap domain decomposition method without any absorbing condition for each subdomain, which can significantly reduce the required grids in each overlap subdomain comparing with other proposed algorithms. We test our <span class="hlt">parallel</span> technique for some numerical models and obtained consistent results with the analytical ones and/or those of the nonparallel PSTD method. The above numerical tests showed that our <span class="hlt">parallel</span> PSTD algorithm is effective in <span class="hlt">simulating</span> 3-D GPR wave propagation, with merits of saving computation time, as well as more flexibility in dealing with complicated models without losing the accuracy. The application of our <span class="hlt">parallel</span> PSTD method in applied geophysics and paleoseismology based on GPR data confirmed the efficiency of our algorithm and its potential applications in various subdisciplines of solid earth geophysics. This study would also provide a useful <span class="hlt">parallel</span> PSTD approach to the <span class="hlt">simulation</span> of other geophysical problems on distributed memory PC cluster.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016IJCFD..30...79Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016IJCFD..30...79Z"><span id="translatedtitle">Implementation and efficiency analysis of <span class="hlt">parallel</span> computation using OpenACC: a case study using flow field <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhang, Shanghong; Yuan, Rui; Wu, Yu; Yi, Yujun</p> <p>2016-01-01</p> <p>The Open Accelerator (OpenACC) application programming interface is a relatively new <span class="hlt">parallel</span> computing standard. In this paper, particle-based flow field <span class="hlt">simulations</span> are examined as a case study of OpenACC <span class="hlt">parallel</span> computation. The <span class="hlt">parallel</span> conversion process of the OpenACC standard is explained, and further, the performance of the flow field <span class="hlt">parallel</span> model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the <span class="hlt">parallel</span> algorithm, a speedup factor of 18.26× is possible. In contrast, a speedup factor of just 11.77× was achieved with the conventional Open Multi-Processing (OpenMP) <span class="hlt">parallel</span> mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC <span class="hlt">parallel</span> mode is found to have good portability, making it easy to implement <span class="hlt">parallel</span> computation from the original serial model.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200...57J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200...57J"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji</p> <p>2016-03-01</p> <p>Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer <span class="hlt">simulations</span> and data analyses, including molecular dynamics (MD) <span class="hlt">simulations</span>. In this study, we develop hybrid (MPI+OpenMP) <span class="hlt">parallelization</span> schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD <span class="hlt">simulations</span>. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to <span class="hlt">simulate</span> one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22253805','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22253805"><span id="translatedtitle"><span class="hlt">Parallel</span> kinetic Monte Carlo <span class="hlt">simulation</span> framework incorporating accurate models of adsorbate lateral interactions</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nielsen, Jens; D’Avezac, Mayeul; Hetherington, James; Stamatakis, Michail</p> <p>2013-12-14</p> <p>Ab initio kinetic Monte Carlo (KMC) <span class="hlt">simulations</span> have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These <span class="hlt">simulations</span> necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for <span class="hlt">simulating</span> catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce <span class="hlt">parallelization</span> with OpenMP. We further benchmark our framework by <span class="hlt">simulating</span> a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/929325','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/929325"><span id="translatedtitle">Progress on H5Part: A Portable High Performance <span class="hlt">Parallel</span> DataInterface for Electromagnetics <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger,Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt</p> <p>2007-06-22</p> <p>Significant problems facing all experimental andcomputationalsciences arise from growing data size and complexity. Commonto allthese problems is the need to perform efficient data I/O ondiversecomputer architectures. In our scientific application, thelargestparallel particle <span class="hlt">simulations</span> generate vast quantitiesofsix-dimensional data. Such a <span class="hlt">simulation</span> run produces data foranaggregate data size up to several TB per run. Motived by the needtoaddress data I/O and access challenges, we have implemented H5Part,anopen source data I/O API that simplifies the use of the HierarchicalDataFormat v5 library (HDF5). HDF5 is an industry standard forhighperformance, cross-platform data storage and retrieval that runsonall contemporary architectures from large <span class="hlt">parallel</span> supercomputerstolaptops. H5Part, which is oriented to the needs of the particlephysicsand cosmology communities, provides support for parallelstorage andretrieval of particles, structured and in the future unstructuredmeshes.In this paper, we describe recent work focusing on I/O supportforparticles and structured meshes and provide data showing performance onmodernsupercomputer architectures like the IBM POWER 5.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016NewA...43...49B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016NewA...43...49B"><span id="translatedtitle">Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively <span class="hlt">parallel</span> star formation <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.</p> <p>2016-02-01</p> <p>We present an algorithm for solving the radiative transfer problem on massively <span class="hlt">parallel</span> computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation <span class="hlt">simulations</span> with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse <span class="hlt">simulations</span> resembling the early stages of protostar and disc formation.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013ApJ...776...46E','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013ApJ...776...46E"><span id="translatedtitle">Monte Carlo <span class="hlt">Simulations</span> of Nonlinear Particle Acceleration in <span class="hlt">Parallel</span> Trans-relativistic Shocks</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M.</p> <p>2013-10-01</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and "smooth" the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor γ0 = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006PhDT........83C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006PhDT........83C"><span id="translatedtitle">Giant impacts during planet formation: <span class="hlt">Parallel</span> tree code <span class="hlt">simulations</span> using smooth particle hydrodynamics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cohen, Randi L.</p> <p></p> <p>There is both theoretical and observational evidence that giant planets collided with objects ≥ Mearth during their evolution. These impacts may play a key role in giant planet formation. This paper describes impacts of a ˜ Earth-mass object onto a suite of proto-giant-planets, as <span class="hlt">simulated</span> using an SPH <span class="hlt">parallel</span> tree code. We run 6 <span class="hlt">simulations</span>, varying the impact angle and evolutionary stage of the proto-Jupiter. We find that it is possible for an impactor to free some mass from the core of the proto-planet it impacts through direct collision, as well as to make physical contact with the core yet escape partially, or even completely, intact. None of the 6 cases we consider produced a solid disk or resulted in a net decrease in the core mass of the pinto-planet (since the mass decrease due to disruption was outweighed by the increase due to the addition of the impactor's mass to the core). However, we suggest parameters which may have these effects, and thus decrease core mass and formation time in protoplanetary models and/or create satellite systems. We find that giant impacts can remove significant envelope mass from forming giant planets, leaving only 2 MEarth of gas, similar to Uranus and Neptune. They can also create compositional inhomogeneities in planetary cores, which creates differences in planetary thermal emission characteristics.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2000DPS....32.6532C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2000DPS....32.6532C"><span id="translatedtitle">Giant Impacts During Planet Formation: <span class="hlt">Parallel</span> Tree Code <span class="hlt">Simulations</span> Using Smooth Particle Hydrodynamics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cohen, R.; Bodenheimer, P.; Asphaug, E.</p> <p>2000-12-01</p> <p>There is both theoretical and observational evidence that giant planets collided with objects with mass >= Mearth during their evolution. These impacts may help shorten planetary formation timescales by changing the opacity of the planetary atmosphere to allow quicker cooling. They may also redistribute heavy metals within giant planets, affect the core/envelope mass ratio, and help determine the ratio of emitted to absorbed energy within giant planets. Thus, the researchers propose to <span class="hlt">simulate</span> the impact of a ~ Earth-mass object onto a proto-giant-planet with SPH. Results of the SPH collision models will be input into a steady-state planetary evolution code and the effect of impacts on formation timescales, core/envelope mass ratios, density profiles, and thermal emissions of giant planets will be quantified. The collision will be modelled using a modified version of an SPH routine which <span class="hlt">simulates</span> the collision of two polytropes. The Saumon-Chabrier and Tillotson equations of state will replace the polytropic equation of state. The <span class="hlt">parallel</span> tree algorithm of Olson & Packer will be used for the domain decomposition and neighbor search necessary to calculate pressure and self-gravity efficiently. This work is funded by the NASA Graduate Student Researchers Program.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_18 --> <div id="page_19" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="361"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013NIMPA.732..233R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013NIMPA.732..233R"><span id="translatedtitle">A study of Gd-based <span class="hlt">parallel</span> plate avalanche counter for thermal neutrons by MC <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Rhee, J. T.; Kim, H. G.; Ahmad, Farzana; Jeon, Y. J.; Jamil, M.</p> <p>2013-12-01</p> <p>In this work, we demonstrate the feasibility and characteristics of a single-gap <span class="hlt">parallel</span> plate avalanche counter (PPAC) as a low energy neutron detector, based on Gd-converter coating. Upon falling on the Gd-converter surface, the incident low energy neutrons produce internal conversion electrons which are evaluated and detected. For estimating the performance of the Gd-based PPAC, a <span class="hlt">simulation</span> study has been performed using GEANT4 Monte Carlo (MC) code. The detector response as a function of incident neutron energies in the range of 25-100 meV has been evaluated with two different physics lists. Using the QGSP_BIC_HP physics list and assuming 5 μm converter thickness, 11.8%, 18.48%, and 30.28% detection efficiencies have been achieved for the forward-, the backward-, and the total response of the converter-based PPAC. On the other hand, considering the same converter thickness and detector configuration, with the QGSP_BERT_HP physics list efficiencies of 12.19%, 18.62%, and 30.81%, respectively, were obtained. These <span class="hlt">simulation</span> results are briefly discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968"><span id="translatedtitle">A <span class="hlt">parallel</span> overset-curvilinear-immersed boundary framework for <span class="hlt">simulating</span> complex 3D incompressible flows</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis</p> <p>2013-01-01</p> <p>We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to <span class="hlt">simulate</span> a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient <span class="hlt">parallel</span> computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by <span class="hlt">simulating</span> the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20120014386','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20120014386"><span id="translatedtitle"><span class="hlt">Simulated</span> Wake Characteristics Data for Closely Spaced <span class="hlt">Parallel</span> Runway Operations Analysis</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Guerreiro, Nelson M.; Neitzke, Kurt W.</p> <p>2012-01-01</p> <p>A <span class="hlt">simulation</span> experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced <span class="hlt">parallel</span> runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were <span class="hlt">simulated</span> to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22270730','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22270730"><span id="translatedtitle">MONTE CARLO <span class="hlt">SIMULATIONS</span> OF NONLINEAR PARTICLE ACCELERATION IN <span class="hlt">PARALLEL</span> TRANS-RELATIVISTIC SHOCKS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M. E-mail: ambykov@yahoo.com</p> <p>2013-10-10</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and 'smooth' the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor γ{sub 0} = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/385558','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/385558"><span id="translatedtitle"><span class="hlt">Parallel</span> contact detection algorithm for transient solid dynamics <span class="hlt">simulations</span> using PRONTO3D</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.</p> <p>1996-09-01</p> <p>An efficient, scalable, <span class="hlt">parallel</span> algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD <span class="hlt">parallel</span> computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in <span class="hlt">parallel</span> computation by devising a dynamic (adaptive) processor load balancing scheme.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/28186','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/28186"><span id="translatedtitle">Automated integration of genomic physical mapping data via <span class="hlt">parallel</span> <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Slezak, T.</p> <p>1994-06-01</p> <p>The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via <span class="hlt">simulated</span> annealing performed on a network of 40+ Unix machines in <span class="hlt">parallel</span>, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the <span class="hlt">parallel</span> net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003PhDT.......106F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003PhDT.......106F"><span id="translatedtitle">Numerical investigation of <span class="hlt">parallel</span> airfoil-vortex interaction using large eddy <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Felten, Frederic N.</p> <p></p> <p>Helicopter Blade-Vortex Interaction (BVI) occurs under certain conditions of powered descent or during extreme maneuvering. The vibration and acoustic problems associated with the interaction of rotor tip vortices and the following blades are major aerodynamic concerns for the helicopter community. Researchers have performed numerous experimental and computational studies over the last two decades in order to gain a better understanding of the physical mechanisms involved in BVI. The most severe interaction, in terms of generated noise, happens when the vortex filament is <span class="hlt">parallel</span> to the blade, thus affecting a great portion of it. The majority of the previous numerical studies of <span class="hlt">parallel</span> BVI fall within a potential flow framework, therefore excluding all viscous phenomena. Some Navier-Stokes approaches using dissipative numerical methods in conjunction with RANS-type turbulence models have also been attempted, but with limited success. In this work, the situation is improved by increasing the fidelity of both the numerical method and the turbulence model. A kinetic-energy conserving finite-volume scheme using a collocated-mesh arrangement, specially designed for <span class="hlt">simulation</span> of turbulence in complex geometries, was implemented. For the turbulence model, a cost-effective zonal hybrid RANS/LES technique is used. A BANS zone covers the boundary layers on the airfoil and the wake region behind, while the remainder of the flow field, including the region occupied by the vortex makes up the dynamic LES zone. The concentrated tip vortex is not attenuated as it is convected downstream and over a NACA 0012 airfoil. The lift, drag, moment and friction coefficients induced by the passage of the vortex are monitored in time and compared with experimental data.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19930002341','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19930002341"><span id="translatedtitle">Direct numerical <span class="hlt">simulation</span> of instabilities in <span class="hlt">parallel</span> flow with spherical roughness elements</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Deanna, R. G.</p> <p>1992-01-01</p> <p>Results from a direct numerical <span class="hlt">simulation</span> of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical <span class="hlt">simulation</span> approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces <span class="hlt">simulate</span> an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the <span class="hlt">parallel</span>-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1995PhDT.......219M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1995PhDT.......219M"><span id="translatedtitle">Three-Dimensional <span class="hlt">Parallel</span> Lattice Boltzmann Hydrodynamic <span class="hlt">Simulations</span> of Turbulent Flows in Interstellar Dark Clouds</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Muders, Dirk</p> <p>1995-08-01</p> <p>Exploring the clumpy and filamentary structure of interstellar molecular clouds is one of the key problems of modern astrophysics. So far, we have little knowledge of the physical processes that cause the structure, but turbulence is suspected to be essential. In this thesis I study turbulent flows and how they contribute to the structure of interstellar dark clouds. To this end, three-dimensional numerical hydrodynamic <span class="hlt">simulations</span> are needed since the detailed turbulent spatial and velocity structure cannot be analytically calculated. I employ the ``Lattice Boltzmann Method'', a recently developed numerical method which solves the Boltzmann equation in a discretized phase space. Mesoscopic particle packets move with fixed velocities on a Cartesian lattice and at each time step they exchange mass according to given rules. Because of its mainly local operations the method is well suited for application on <span class="hlt">parallel</span> or clustered computers. As part of my thesis I have developed a <span class="hlt">parallelized</span> ``Lattice Boltzmann Method'' hydrodynamics code. I have improved the numerical stability for Reynolds numbers of up to 104.5 and Mach numbers of up to 0.9 and I have extended the method to include a second miscible fluid phase. The code has been used on the three currently most powerful workstations at the ``Max-Planck-Institut für Radioastronomie'' in Bonn and on the massively <span class="hlt">parallel</span> mainframe CM-5 at the ``Gesellschaft für Mathematik und Datenverarbeitung'' in St. Augustin. The <span class="hlt">simulations</span> consist of collimated shear flows and the motion of molecular clumps through an ambient medium. The dependence of the emerging structure on Reynolds and Mach numbers is studied. The main results are (1) that distinct clumps and filaments appear only at the transition between laminar and fully turbulent flow at Reynolds numbers between 500 and 5000 and (2) that subsonic viscous shear flows are capable of producing the dark cloud velocity structure. The unexpectedly low Reynolds numbers can</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5289523','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5289523"><span id="translatedtitle">Particle <span class="hlt">simulation</span> on radio frequency stabilization of flute modes in a tandem mirror. I. <span class="hlt">Parallel</span> antenna</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kadoya, Y.; Abe, H.</p> <p>1988-04-01</p> <p>A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied <span class="hlt">parallel</span> to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The <span class="hlt">parallel</span> electric field E/sub <span class="hlt">parallel</span>/ perturbs the electron velocity v/sub <span class="hlt">parallel</span>/ <span class="hlt">parallel</span> to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product <v/sub <span class="hlt">parallel</span>/B/sub perpendicular/> is then shown to stabilize the flute modes. The stabilizing wave power threshold, the frequency dependency, and the dependence on delchemically bondE/sub <span class="hlt">parallel</span>/chemically bond all agree with the theoretical predictions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20040111318','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20040111318"><span id="translatedtitle">Scalability of <span class="hlt">Parallel</span> Spatial Direct Numerical <span class="hlt">Simulations</span> on Intel Hypercube and IBM SP1 and SP2</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad</p> <p>1995-01-01</p> <p>The implementation and performance of a <span class="hlt">parallel</span> spatial direct numerical <span class="hlt">simulation</span> (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 <span class="hlt">parallel</span> computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be <span class="hlt">parallelized</span> on a distributed-memory <span class="hlt">parallel</span> machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" <span class="hlt">simulation</span> that consists of 1.7 million grid points. One time step of this <span class="hlt">simulation</span> is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same <span class="hlt">simulation</span>, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this <span class="hlt">simulation</span>, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical <span class="hlt">simulations</span>; incompressible viscous flows; spectral methods; finite differences; <span class="hlt">parallel</span> computing.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020060457','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020060457"><span id="translatedtitle">A Three Dimensional <span class="hlt">Parallel</span> Time Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid Systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2001-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability will be presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020073408','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020073408"><span id="translatedtitle">A Three-Dimensional <span class="hlt">Parallel</span> Time-Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid System</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2002-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability are presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/988956','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/988956"><span id="translatedtitle">Mesoscale <span class="hlt">Simulations</span> of Particulate Flows with <span class="hlt">Parallel</span> Distributed Lagrange Multiplier Technique</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kanarska, Y</p> <p>2010-03-24</p> <p>Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical <span class="hlt">simulation</span> of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A <span class="hlt">parallel</span> implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/957425','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/957425"><span id="translatedtitle"><span class="hlt">Parallel</span> Higher-order Finite Element Method for Accurate Field Computations in Wakefield and PIC <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.; Ko, K.; /SLAC</p> <p>2009-06-19</p> <p>Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) <span class="hlt">parallel</span> higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale <span class="hlt">simulation</span> of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell) approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012AGUFM.H13G1442S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012AGUFM.H13G1442S"><span id="translatedtitle"><span class="hlt">Simulation</span> of hydraulic fracture networks in three dimensions utilizing massively <span class="hlt">parallel</span> computing platforms</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Settgast, R. R.; Johnson, S.; Fu, P.; Walsh, S. D.; Ryerson, F. J.; Antoun, T.</p> <p>2012-12-01</p> <p>Hydraulic fracturing has been an enabling technology for commercially stimulating fracture networks for over half of a century. It has become one of the most widespread technologies for engineering subsurface fracture systems. Despite the ubiquity of this technique in the field, understanding and prediction of the hydraulic induced propagation of the fracture network in realistic, heterogeneous reservoirs has been limited. A number of developments in multiscale modeling in recent years have allowed researchers in related fields to tackle the modeling of complex fracture propagation as well as the mechanics of heterogeneous materials. These developments, combined with advances in quantifying solution uncertainties, provide possibilities for the geologic modeling community to capture both the fracturing behavior and longer-term permeability evolution of rock masses under hydraulic loading across both dynamic and viscosity-dominated regimes. Here we will demonstrate the first phase of this effort through illustrations of fully three-dimensional, tightly coupled hydromechanical <span class="hlt">simulations</span> of hydraulically induced fracture network propagation run on massively <span class="hlt">parallel</span> computing scales, and discuss preliminary results regarding the mechanisms by which fracture interactions and the accompanying changes to the stress field can lead to deleterious or beneficial changes to the fracture network.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/876729','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/876729"><span id="translatedtitle">Performance Evaluation of Lattice-Boltzmann Magnetohydrodynamics<span class="hlt">Simulations</span> on Modern <span class="hlt">Parallel</span> Vector Systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Carter, Jonathan; Oliker, Leonid</p> <p>2006-01-09</p> <p>The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on such platforms has become major concern in high performance computing. The latest generation of custom-built <span class="hlt">parallel</span> vector systems have the potential to address this concern for numerical algorithms with sufficient regularity in their computational structure. In this work, we explore two and three dimensional implementations of a lattice-Boltzmann magnetohydrodynamics (MHD) physics application, on some of today's most powerful supercomputing platforms. Results compare performance between the vector-based Cray X1, Earth <span class="hlt">Simulator</span>, and newly-released NEC SX-8, with the commodity-based superscalar platforms of the IBM Power3, IntelItanium2, and AMD Opteron. Overall results show that the SX-8 attains unprecedented aggregate performance across our evaluated applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5256735','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5256735"><span id="translatedtitle">Monte Carlo <span class="hlt">simulation</span> of photoelectron energization in <span class="hlt">parallel</span> electric fields: Electroglow on Uranus</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Singhal, R.P.; Bhardwaj, A. )</p> <p>1991-09-01</p> <p>A Monte Carlo <span class="hlt">simulation</span> of photoelectron energization and energy degradation in H{sub 2} gas in the presence of <span class="hlt">parallel</span> electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2015EGUGA..17.6111S&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2015EGUGA..17.6111S&link_type=ABSTRACT"><span id="translatedtitle">A heterogeneous and <span class="hlt">parallel</span> computing framework for high-resolution hydrodynamic <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Smith, Luke; Liang, Qiuhua</p> <p>2015-04-01</p> <p>Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of <span class="hlt">parallel</span> processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic <span class="hlt">simulations</span> across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. <span class="hlt">Simulations</span> for the three-day event could be performed</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013JChPh.139g4114B&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013JChPh.139g4114B&link_type=ABSTRACT"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-01</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution time/<span class="hlt">parallel</span> execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_19 --> <div id="page_20" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="381"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22303583','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22303583"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup ((serial execution time)/(<span class="hlt">parallel</span> execution time) ) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19910022757','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19910022757"><span id="translatedtitle">Comparisons of elastic and rigid blade-element rotor models using <span class="hlt">parallel</span> processing technology for piloted <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Hill, Gary; Duval, Ronald W.; Green, John A.; Huynh, Loc C.</p> <p>1991-01-01</p> <p>A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A <span class="hlt">simulation</span> development and analysis tool, FLIGHTLAB, was used to implement these models in real time using <span class="hlt">parallel</span> processing technology. Pilot comments and quantitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing <span class="hlt">simulation</span> facilities can be upgraded using <span class="hlt">parallel</span> processing, and the importance of these upgrades to <span class="hlt">simulation</span> fidelity.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19900013690','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19900013690"><span id="translatedtitle"><span class="hlt">Parallel</span> processing of real-time dynamic systems <span class="hlt">simulation</span> on OSCAR (Optimally SCheduled Advanced multiprocessoR)</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke</p> <p>1989-01-01</p> <p><span class="hlt">Parallel</span> processing of real-time dynamic systems <span class="hlt">simulation</span> on a multiprocessor system named OSCAR is presented. In the <span class="hlt">simulation</span> of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for <span class="hlt">parallel</span> processing of the <span class="hlt">simulation</span> since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, <span class="hlt">parallelism</span> inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the <span class="hlt">parallelism</span> from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22230809','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22230809"><span id="translatedtitle">Obtaining identical results with double precision global accuracy on different numbers of processors in <span class="hlt">parallel</span> particle Monte Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Cleveland, Mathew A. Brunner, Thomas A.; Gentile, Nicholas A.; Keasler, Jeffrey A.</p> <p>2013-10-15</p> <p>We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo <span class="hlt">simulations</span>. Reproducibility is desirable for code verification, testing, and debugging. <span class="hlt">Parallelism</span> creates a unique problem for achieving reproducibility in Monte Carlo <span class="hlt">simulations</span> because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. <span class="hlt">Parallel</span> Monte Carlo, both domain replicated and decomposed <span class="hlt">simulations</span>, will run their particles in a different order during different runs of the same <span class="hlt">simulation</span> because the non-reproducibility of communication between processors. In addition, runs of the same <span class="hlt">simulation</span> using different domain decompositions will also result in particles being <span class="hlt">simulated</span> in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo <span class="hlt">simulations</span>. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1091975','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1091975"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using a variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span> such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl+4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span> the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23968079','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23968079"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Bylaska, Eric J; Weare, Jonathan Q; Weare, John H</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010EGUGA..12.7428P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010EGUGA..12.7428P"><span id="translatedtitle">Efficient <span class="hlt">parallel</span> seismic <span class="hlt">simulations</span> including topography and 3-D material heterogeneities on locally refined composite grids</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Petersson, Anders; Rodgers, Arthur</p> <p>2010-05-01</p> <p> conserving, coupling procedure for the elastic wave equation at grid refinement interfaces. When used together with our single grid finite difference scheme, it results in a method which is provably stable, without artificial dissipation, for arbitrary heterogeneous isotropic elastic materials. The new coupling procedure is based on satisfying the summation-by-parts principle across refinement interfaces. From a practical standpoint, an important advantage of the proposed method is the absence of tunable numerical parameters, which seldom are appreciated by application experts. In WPP, the composite grid discretization is combined with a curvilinear grid approach that enables accurate modeling of free surfaces on realistic (non-planar) topography. The overall method satisfies the summation-by-parts principle and is stable under a CFL time step restriction. A feature of great practical importance is that WPP automatically generates the composite grid based on the user provided topography and the depths of the grid refinement interfaces. The WPP code has been verified extensively, for example using the method of manufactured solutions, by solving Lamb's problem, by solving various layer over half- space problems and comparing to semi-analytic (FK) results, and by <span class="hlt">simulating</span> scenario earthquakes where results from other seismic <span class="hlt">simulation</span> codes are available. WPP has also been validated against seismographic recordings of moderate earthquakes. WPP performs well on large <span class="hlt">parallel</span> computers and has been run on up to 32,768 processors using about 26 Billion grid points (78 Billion DOF) and 41,000 time steps. WPP is an open source code that is available under the Gnu general public license.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014EGUGA..16.2584S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014EGUGA..16.2584S"><span id="translatedtitle"><span class="hlt">Parallel</span> Processing of Numerical Tsunami <span class="hlt">Simulations</span> on a High Performance Cluster based on the GDAL Library</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Schroeder, Matthias; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim</p> <p>2014-05-01</p> <p>Thousands of numerical tsunami <span class="hlt">simulations</span> allow the computation of inundation and run-up along the coast for vulnerable areas over the time. A so-called Matching Scenario Database (MSDB) [1] contains this large number of <span class="hlt">simulations</span> in text file format. In order to visualize these wave propagations the scenarios have to be reprocessed automatically. In the TRIDEC project funded by the seventh Framework Programme of the European Union a Virtual Scenario Database (VSDB) and a Matching Scenario Database (MSDB) were established amongst others by the working group of the University of Bologna (UniBo) [1]. One part of TRIDEC was the developing of a new generation of a Decision Support System (DSS) for tsunami Early Warning Systems (TEWS) [2]. A working group of the GFZ German Research Centre for Geosciences was responsible for developing the Command and Control User Interface (CCUI) as central software application which support operator activities, incident management and message disseminations. For the integration and visualization in the CCUI, the numerical tsunami <span class="hlt">simulations</span> from MSDB must be converted into the shapefiles format. The usage of shapefiles enables a much easier integration into standard Geographic Information Systems (GIS). Since also the CCUI is based on two widely used open source products (GeoTools library and uDig), whereby the integration of shapefiles is provided by these libraries a priori. In this case, for an example area around the Western Iberian margin several thousand tsunami variations were processed. Due to the mass of data only a program-controlled process was conceivable. In order to optimize the computing efforts and operating time the use of an existing GFZ High Performance Computing Cluster (HPC) had been chosen. Thus, a geospatial software was sought after that is capable for <span class="hlt">parallel</span> processing. The FOSS tool Geospatial Data Abstraction Library (GDAL/OGR) was used to match the coordinates with the wave heights and generates the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4696414','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4696414"><span id="translatedtitle">GENESIS: a hybrid-<span class="hlt">parallel</span> and multi-scale molecular dynamics <span class="hlt">simulator</span> with enhanced sampling algorithms for biomolecular and cellular <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji</p> <p>2015-01-01</p> <p>GENESIS (Generalized-Ensemble <span class="hlt">Simulation</span> System) is a new software package for molecular dynamics (MD) <span class="hlt">simulations</span> of macromolecules. It has two MD <span class="hlt">simulators</span>, called ATDYN and SPDYN. ATDYN is <span class="hlt">parallelized</span> based on an atomic decomposition algorithm for the <span class="hlt">simulations</span> of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly <span class="hlt">parallelized</span> based on a domain decomposition scheme, allowing large-scale MD <span class="hlt">simulations</span> on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both <span class="hlt">simulators</span> to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly <span class="hlt">parallel</span> performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of <span class="hlt">parallel</span> input/output files, also contribute to the performance. We show the REMD <span class="hlt">simulation</span> results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2009MPLB...23..325T','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2009MPLB...23..325T"><span id="translatedtitle">Numerical <span class="hlt">Simulation</span> of Unsteady Flow Field around Helicopter in Forward Flight Using a <span class="hlt">Parallel</span> Dynamic Overset Unstructured Grids Method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Tian, Shuling; Wu, Yizhao; Xia, Jian</p> <p></p> <p>A <span class="hlt">parallel</span> Navier-Stokes solver based on dynamic overset unstructured grids method is presented to <span class="hlt">simulate</span> the unsteady turbulent flow field around helicopter in forward flight. The grid method has the advantages of unstructured grid and Chimera grid and is suitable to deal with multiple bodies in relatively moving. Unsteady Navier-Stokes equations are solved on overset unstructured grids by an explicit dual time-stepping, finite volume method. Preconditioning method applied to inner iteration of the dual-time stepping is used to speed up the convergence of numerical <span class="hlt">simulation</span>. The Spalart-Allmaras one-equation turbulence model is used to evaluate the turbulent viscosity. <span class="hlt">Parallel</span> computation is based on the dynamic domain decomposition method in overset unstructured grids system at each physical time step. A generic helicopter Robin with a four-blade rotor in forward flight is considered to validate the method presented in this paper. Numerical <span class="hlt">simulation</span> results show that the <span class="hlt">parallel</span> dynamic overset unstructured grids method is very efficient for the <span class="hlt">simulation</span> of helicopter flow field and the results are reliable.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1115367','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1115367"><span id="translatedtitle">SCORPIO: A Scalable Two-Phase <span class="hlt">Parallel</span> I/O Library With Application To A Large Scale Subsurface <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T; Hammond, Glenn; Mahinthakumar, Kumar</p> <p>2013-01-01</p> <p>Inefficient <span class="hlt">parallel</span> I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that <span class="hlt">parallel</span> I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on <span class="hlt">parallel</span> file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively <span class="hlt">parallel</span> multi-phase and multi-component subsurface <span class="hlt">simulator</span> (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose <span class="hlt">parallel</span> I/O library, SCORPIO (SCalable block-ORiented <span class="hlt">Parallel</span> I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing <span class="hlt">parallel</span> I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20090007630','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20090007630"><span id="translatedtitle">A Framework for <span class="hlt">Parallel</span> Unstructured Grid Generation for Complex Aerodynamic <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos</p> <p>2009-01-01</p> <p>A framework for <span class="hlt">parallel</span> unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in <span class="hlt">parallel</span> using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent <span class="hlt">parallelism</span>. For the <span class="hlt">parallel</span> implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011IJTIA.131.1212N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011IJTIA.131.1212N"><span id="translatedtitle"><span class="hlt">Parallel</span> Computing of Magnetic Field Analysis for Rotating Machines Driven by Voltage Source on the Earth <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nakano, Tomohito; Kawase, Yoshihiro; Yamaguchi, Tadashi; Shibayama, Yoshiyasu; Nakamura, Masanori; Nishikawa, Noriaki; Uehara, Hitoshi</p> <p></p> <p>A <span class="hlt">parallel</span> computing method for rotating machines excited by the voltage source with the three-dimensional finite element method is developed. In this method, the matrix equations which contains voltage equations are divided into multiple subdomains and the matrix-vector products for the voltage equations in each subdomain are calculated efficiently. The validity and the usefulness of the method are verified through the computation of an IPM motor with the off-centered rotor on the Earth <span class="hlt">Simulator</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27045833','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27045833"><span id="translatedtitle">Process <span class="hlt">Simulation</span> of Complex Biological Pathways in Physical Reactive Space and Reformulated for Massively <span class="hlt">Parallel</span> Computing Platforms.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Ganesan, Narayan; Li, Jie; Sharma, Vishakha; Jiang, Hanyu; Compagnoni, Adriana</p> <p>2016-01-01</p> <p>Biological systems encompass complexity that far surpasses many artificial systems. Modeling and <span class="hlt">simulation</span> of large and complex biochemical pathways is a computationally intensive challenge. Traditional tools, such as ordinary differential equations, partial differential equations, stochastic master equations, and Gillespie type methods, are all limited either by their modeling fidelity or computational efficiency or both. In this work, we present a scalable computational framework based on modeling biochemical reactions in explicit 3D space, that is suitable for studying the behavior of large and complex biological pathways. The framework is designed to exploit <span class="hlt">parallelism</span> and scalability offered by commodity massively <span class="hlt">parallel</span> processors such as the graphics processing units (GPUs) and other <span class="hlt">parallel</span> computing platforms. The reaction modeling in 3D space is aimed at enhancing the realism of the model compared to traditional modeling tools and framework. We introduce the <span class="hlt">Parallel</span> Select algorithm that is key to breaking the sequential bottleneck limiting the performance of most other tools designed to study biochemical interactions. The algorithm is designed to be computationally tractable, handle hundreds of interacting chemical species and millions of independent agents by considering all-particle interactions within the system. We also present an implementation of the framework on the popular graphics processing units and apply it to the <span class="hlt">simulation</span> study of JAK-STAT Signal Transduction Pathway. The computational framework will offer a deeper insight into various biological processes within the cell and help us observe key events as they unfold in space and time. This will advance the current state-of-the-art in <span class="hlt">simulation</span> study of large scale biological systems and also enable the realistic <span class="hlt">simulation</span> study of macro-biological cultures, where inter-cellular interactions are prevalent. PMID:27045833</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3963881','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3963881"><span id="translatedtitle">cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic <span class="hlt">Simulator</span> for Massive <span class="hlt">Parallel</span> Analyses of Biological Systems</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo</p> <p>2014-01-01</p> <p>Tau-leaping is a stochastic <span class="hlt">simulation</span> algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of <span class="hlt">simulations</span>, leading to high computational costs. Since each <span class="hlt">simulation</span> can be executed independently from the others, a massive <span class="hlt">parallelization</span> of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic <span class="hlt">simulator</span> of biological systems that makes use of GPGPU computing to execute multiple <span class="hlt">parallel</span> tau-leaping <span class="hlt">simulations</span>, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of <span class="hlt">parallel</span> <span class="hlt">simulations</span> increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/86949','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/86949"><span id="translatedtitle">Large-eddy <span class="hlt">simulation</span> of the Rayleigh-Taylor instability on a massively <span class="hlt">parallel</span> computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Amala, P.A.K.</p> <p>1995-03-01</p> <p>A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively <span class="hlt">parallel</span> computer. Programming models on massively <span class="hlt">parallel</span> computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively <span class="hlt">parallel</span> computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data <span class="hlt">parallel</span> code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data <span class="hlt">parallel</span> approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26575558','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26575558"><span id="translatedtitle">Molecular <span class="hlt">simulation</span> workflows as <span class="hlt">parallel</span> algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik</p> <p>2015-06-01</p> <p>Computational chemistry and other <span class="hlt">simulation</span> fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which <span class="hlt">simulation</span> applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple <span class="hlt">simulations</span> into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of <span class="hlt">simulations</span> and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally <span class="hlt">parallel</span>. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled <span class="hlt">simulations</span> using either distributed or <span class="hlt">parallel</span> resources with Copernicus. PMID:26575558</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19940010166','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19940010166"><span id="translatedtitle">Real-time dynamic <span class="hlt">simulation</span> of the Cassini spacecraft using DARTS. Part 2: <span class="hlt">Parallel</span>/vectorized real-time implementation</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Fijany, A.; Roberts, J. A.; Jain, A.; Man, G. K.</p> <p>1993-01-01</p> <p>Part 1 of this paper presented the requirements for the real-time <span class="hlt">simulation</span> of Cassini spacecraft along with some discussion of the DARTS algorithm. Here, in Part 2 we discuss the development and implementation of <span class="hlt">parallel</span>/vectorized DARTS algorithm and architecture for real-time <span class="hlt">simulation</span>. Development of the fast algorithms and architecture for real-time hardware-in-the-loop <span class="hlt">simulation</span> of spacecraft dynamics is motivated by the fact that it represents a hard real-time problem, in the sense that the correctness of the <span class="hlt">simulation</span> depends on both the numerical accuracy and the exact timing of the computation. For a given model fidelity, the computation should be computed within a predefined time period. Further reduction in computation time allows increasing the fidelity of the model (i.e., inclusion of more flexible modes) and the integration routine.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014SPIE.9145E..2PY','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014SPIE.9145E..2PY"><span id="translatedtitle">Modeling and <span class="hlt">simulation</span> of a 6-DOF <span class="hlt">parallel</span> platform for telescope secondary mirror</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Yue, Zhongyu; Ye, Yu; Gu, Bozhong</p> <p>2014-07-01</p> <p>The 6-DOF <span class="hlt">parallel</span> platform in this paper is a kind of Stewart platform. It can be used as supporting structure for telescope secondary mirror. In order to adapt the special dynamic environment of the telescope secondary mirror and to be installed in extremely narrow space, a unique <span class="hlt">parallel</span> platform is designed. PSS Stewart platform and SPS Stewart platform are analyzed and compared. Then the PSS Stewart platform is chosen for detailed design. The virtual prototyping model of the <span class="hlt">parallel</span> platform is built. The model is used for the analysis and calculation of multi-body dynamics. With the help of ANSYS, the finite element model of the platform is built and then the analysis is performed. According to the above analysis the experimental prototype of the platform is built.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1090857','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1090857"><span id="translatedtitle">Coupled models and <span class="hlt">parallel</span> <span class="hlt">simulations</span> for three-dimensional full-Stokes ice sheet modeling</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zhang, Huai; Ju, Lili; Gunzburger, Max; Ringler, Todd; Price, Stephen</p> <p>2011-01-01</p> <p>A three-dimensional full-Stokes computational model is considered for determining the dynamics, temperature, and thickness of ice sheets. The governing thermomechanical equations consist of the three-dimensional full-Stokes system with nonlinear rheology for the momentum, an advective-diffusion energy equation for temperature evolution, and a mass conservation equation for icethickness changes. Here, we discuss the variable resolution meshes, the finite element discretizations, and the <span class="hlt">parallel</span> algorithms employed by the model components. The solvers are integrated through a well-designed coupler for the exchange of parametric data between components. The discretization utilizes high-quality, variable-resolution centroidal Voronoi Delaunay triangulation meshing and existing <span class="hlt">parallel</span> solvers. We demonstrate the gridding technology, discretization schemes, and the efficiency and scalability of the <span class="hlt">parallel</span> solvers through computational experiments using both simplified geometries arising from benchmark test problems and a realistic Greenland ice sheet geometry.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_20 --> <div id="page_21" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="401"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4334526','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4334526"><span id="translatedtitle">Neurite, a Finite Difference Large Scale <span class="hlt">Parallel</span> Program for the <span class="hlt">Simulation</span> of Electrical Signal Propagation in Neurites under Mechanical Loading</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine</p> <p>2015-01-01</p> <p>With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference <span class="hlt">parallel</span> program for <span class="hlt">simulating</span> electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to <span class="hlt">simulate</span> the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The <span class="hlt">simulation</span> of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the <span class="hlt">simulated</span> cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore <span class="hlt">parallelized</span> using graphics processing units in order to reduce the burden of the <span class="hlt">simulation</span> costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the <span class="hlt">parallel</span> implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012JPhCS.385a2009S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012JPhCS.385a2009S"><span id="translatedtitle">Convergence order vs. <span class="hlt">parallelism</span> in the numerical <span class="hlt">simulation</span> of the bidomain equations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Sharomi, Oluwaseun; Spiteri, Raymond J.</p> <p>2012-10-01</p> <p>The propagation of electrical activity in the human heart can be modelled mathematically by the bidomain equations. The bidomain equations represent a multi-scale reaction-diffusion model that consists of a set of ordinary differential equations governing the dynamics at the cellular level coupled with a set of partial differential equations governing the dynamics at the tissue level. Significant computation is generally required to generate clinically useful data from the bidomain equations. Contemporary developments in computer architecture, in particular multi- and many-core computers and graphics processing units, have made such computations feasible. However, the zeal to take advantage to <span class="hlt">parallel</span> architectures has typically caused another important aspect of numerical methods for the solution of differential equations to be overlooked, namely the convergence order. It is well known that higher-order methods are generally more efficient than lower-order ones when solutions are smooth and relatively high accuracy is desired. In these situations, serial implementations of high-order methods may remain surprisingly competitive with <span class="hlt">parallel</span> implementations of low-order methods. In this paper, we examine the effect of order on the numerical solution of the bidomain equations in <span class="hlt">parallel</span>. We find that high-order methods, in particular high-order time-integration methods with relatively better stability properties, tend to outperform their low-order counterparts, even when the latter are run in <span class="hlt">parallel</span>. In other words, increasing integration order often trumps increasing available computational resources, especially when relatively high accuracy is desired.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2905722','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2905722"><span id="translatedtitle">Qualitative <span class="hlt">Simulation</span> of Photon Transport in Free Space Based on Monte Carlo Method and Its <span class="hlt">Parallel</span> Implementation</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Chen, Xueli; Gao, Xinbo; Qu, Xiaochao; Chen, Duofang; Ma, Bin; Wang, Lin; Peng, Kuan; Liang, Jimin; Tian, Jie</p> <p>2010-01-01</p> <p>During the past decade, Monte Carlo method has obtained wide applications in optical imaging to <span class="hlt">simulate</span> photon transport process inside tissues. However, this method has not been effectively extended to the <span class="hlt">simulation</span> of free-space photon transport at present. In this paper, a uniform framework for noncontact optical imaging is proposed based on Monte Carlo method, which consists of the <span class="hlt">simulation</span> of photon transport both in tissues and in free space. Specifically, the simplification theory of lens system is utilized to model the camera lens equipped in the optical imaging system, and Monte Carlo method is employed to describe the energy transformation from the tissue surface to the CCD camera. Also, the focusing effect of camera lens is considered to establish the relationship of corresponding points between tissue surface and CCD camera. Furthermore, a <span class="hlt">parallel</span> version of the framework is realized, making the <span class="hlt">simulation</span> much more convenient and effective. The feasibility of the uniform framework and the effectiveness of the <span class="hlt">parallel</span> version are demonstrated with a cylindrical phantom based on real experimental results. PMID:20689705</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21277303','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21277303"><span id="translatedtitle">Influence of the <span class="hlt">parallel</span> nonlinearity on zonal flows and heat transport in global gyrokinetic particle-in-cell <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Jolliet, S.; McMillan, B. F.; Vernay, T.; Villard, L.; Hatzky, R.; Bottino, A.; Angelino, P.</p> <p>2009-07-15</p> <p>In this paper, the influence of the <span class="hlt">parallel</span> nonlinearity on zonal flows and heat transport in global particle-in-cell ion-temperature-gradient <span class="hlt">simulations</span> is studied. Although this term is in theory orders of magnitude smaller than the others, several authors [L. Villard, P. Angelino, A. Bottino et al., Plasma Phys. Contr. Fusion 46, B51 (2004); L. Villard, S. J. Allfrey, A. Bottino et al., Nucl. Fusion 44, 172 (2004); J. C. Kniep, J. N. G. Leboeuf, and V. C. Decyck, Comput. Phys. Commun. 164, 98 (2004); J. Candy, R. E. Waltz, S. E. Parker et al., Phys. Plasmas 13, 074501 (2006)] found different results on its role. The study is performed using the global gyrokinetic particle-in-cell codes TORB (theta-pinch) [R. Hatzky, T. M. Tran, A. Koenies et al., Phys. Plasmas 9, 898 (2002)] and ORB5 (tokamak geometry) [S. Jolliet, A. Bottino, P. Angelino et al., Comput. Phys. Commun. 177, 409 (2007)]. In particular, it is demonstrated that the <span class="hlt">parallel</span> nonlinearity, while important for energy conservation, affects the zonal electric field only if the <span class="hlt">simulation</span> is noise dominated. When a proper convergence is reached, the influence of <span class="hlt">parallel</span> nonlinearity on the zonal electric field, if any, is shown to be small for both the cases of decaying and driven turbulence.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015JCoPh.298..161W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015JCoPh.298..161W"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive mesh refinement method based on WENO finite difference scheme for the <span class="hlt">simulation</span> of multi-dimensional detonation</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Cheng; Dong, XinZhuang; Shu, Chi-Wang</p> <p>2015-10-01</p> <p>For numerical <span class="hlt">simulation</span> of detonation, computational cost using uniform meshes is large due to the vast separation in both time and space scales. Adaptive mesh refinement (AMR) is advantageous for problems with vastly different scales. This paper aims to propose an AMR method with high order accuracy for numerical investigation of multi-dimensional detonation. A well-designed AMR method based on finite difference weighted essentially non-oscillatory (WENO) scheme, named as AMR&WENO is proposed. A new cell-based data structure is used to organize the adaptive meshes. The new data structure makes it possible for cells to communicate with each other quickly and easily. In order to develop an AMR method with high order accuracy, high order prolongations in both space and time are utilized in the data prolongation procedure. Based on the message passing interface (MPI) platform, we have developed a workload balancing <span class="hlt">parallel</span> AMR&WENO code using the Hilbert space-filling curve algorithm. Our numerical experiments with detonation <span class="hlt">simulations</span> indicate that the AMR&WENO is accurate and has a high resolution. Moreover, we evaluate and compare the performance of the uniform mesh WENO scheme and the <span class="hlt">parallel</span> AMR&WENO method. The comparison results provide us further insight into the high performance of the <span class="hlt">parallel</span> AMR&WENO method.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26511211','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26511211"><span id="translatedtitle">In-Series Versus In-<span class="hlt">Parallel</span> Mechanical Circulatory Support for the Right Heart: A <span class="hlt">Simulation</span> Study.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Hsu, Po-Lin; McIntyre, Madeleine; Boehning, Fiete; Dang, Weiguo; Parker, Jack; Autschbach, Rüdiger; Schmitz-Rode, Thomas; Steinseifer, Ulrich</p> <p>2016-06-01</p> <p>Right heart failure (RHF) is a serious health issue with increasing incidence and high mortality. Right ventricular assist devices (RVADs) have been used to support the end-stage failing right ventricle (RV). Current RVADs operate in <span class="hlt">parallel</span> with native RV, which alter blood flow pattern and increase RV afterload, associated with high tension in cardiac muscles and long-term valve complications. We are developing an in-series RVAD for better RV unloading. This article presents a mathematical model to compare the effects of RV unloading and hemodynamic restoration on an overloaded or failing RV. The model was used to <span class="hlt">simulate</span> both in-series (sRVAD) and in-<span class="hlt">parallel</span> (pRVAD) (right atrium-pulmonary artery cannulation) support for severe RHF. The results demonstrated that sRVAD more effectively unloads the RV and restores the balance between RV oxygen supply and demand in RHF patients. In comparison to <span class="hlt">simulated</span> pRVAD and published clinical and in silico studies, the sRVAD was able to provide comparable restoration of key hemodynamic parameters and demonstrated superior afterload and volume reduction. This study concluded that in-series support was able to produce effective afterload reduction and preserve the valve functionality and native blood flow pattern, eliminating complications associated with in-<span class="hlt">parallel</span> support. PMID:26511211</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4164817','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4164817"><span id="translatedtitle">Analysis and <span class="hlt">Simulation</span> of the Dynamic Spectrum Allocation Based on <span class="hlt">Parallel</span> Immune Optimization in Cognitive Wireless Networks</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Huixin, Wu; Duo, Mo; He, Li</p> <p>2014-01-01</p> <p>Spectrum allocation is one of the key issues to improve spectrum efficiency and has become the hot topic in the research of cognitive wireless network. This paper discusses the real-time feature and efficiency of dynamic spectrum allocation and presents a new spectrum allocation algorithm based on the master-slave <span class="hlt">parallel</span> immune optimization model. The algorithm designs a new encoding scheme for the antibody based on the demand for convergence rate and population diversity. For improving the calculating efficiency, the antibody affinity in the population is calculated in multiple computing nodes at the same time. <span class="hlt">Simulation</span> results show that the algorithm reduces the total spectrum allocation time and can achieve higher network profits. Compared with traditional serial algorithms, the algorithm proposed in this paper has better speedup ratio and <span class="hlt">parallel</span> efficiency. PMID:25254255</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20110010844','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20110010844"><span id="translatedtitle">A Computer <span class="hlt">Simulation</span> of the System-Wide Effects of <span class="hlt">Parallel</span>-Offset Route Maneuvers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl</p> <p>2010-01-01</p> <p>Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying <span class="hlt">parallel</span>-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing <span class="hlt">parallel</span>-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2010JCoPh.229.5123R&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2010JCoPh.229.5123R&link_type=ABSTRACT"><span id="translatedtitle"><span class="hlt">Parallel</span> finite element <span class="hlt">simulations</span> of incompressible viscous fluid flow by domain decomposition with Lagrange multipliers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Rivera, Christian A.; Heniche, Mourad; Glowinski, Roland; Tanguy, Philippe A.</p> <p>2010-07-01</p> <p>A <span class="hlt">parallel</span> approach to solve three-dimensional viscous incompressible fluid flow problems using discontinuous pressure finite elements and a Lagrange multiplier technique is presented. The strategy is based on non-overlapping domain decomposition methods, and Lagrange multipliers are used to enforce continuity at the boundaries between subdomains. The novelty of the work is the coupled approach for solving the velocity-pressure-Lagrange multiplier algebraic system of the discrete Navier-Stokes equations by a distributed memory <span class="hlt">parallel</span> ILU (0) preconditioned Krylov method. A penalty function on the interface constraints equations is introduced to avoid the failure of the ILU factorization algorithm. To ensure portability of the code, a message based memory distributed model with MPI is employed. The method has been tested over different benchmark cases such as the lid-driven cavity and pipe flow with unstructured tetrahedral grids. It is found that the partition algorithm and the order of the physical variables are central to <span class="hlt">parallelization</span> performance. A speed-up in the range of 5-13 is obtained with 16 processors. Finally, the algorithm is tested over an industrial case using up to 128 processors. In considering the literature, the obtained speed-ups on distributed and shared memory computers are found very competitive.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1996PhDT.......100C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1996PhDT.......100C"><span id="translatedtitle">Effects of rotation on turbulent convection: Direct numerical <span class="hlt">simulation</span> using <span class="hlt">parallel</span> processors</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Chan, Daniel Chiu-Leung</p> <p></p> <p>A new <span class="hlt">parallel</span> implicit adaptive mesh refinement (AMR) algorithm is developed for the prediction of unsteady behaviour of laminar flames. The scheme is applied to the solution of the system of partial-differential equations governing time-dependent, two- and three-dimensional, compressible laminar flows for reactive thermally perfect gaseous mixtures. A high-resolution finite-volume spatial discretization procedure is used to solve the conservation form of these equations on body-fitted multi-block hexahedral meshes. A local preconditioning technique is used to remove numerical stiffness and maintain solution accuracy for low-Mach-number, nearly incompressible flows. A flexible block-based octree data structure has been developed and is used to facilitate automatic solution-directed mesh adaptation according to physics-based refinement criteria. The data structure also enables an efficient and scalable <span class="hlt">parallel</span> implementation via domain decomposition. The <span class="hlt">parallel</span> implicit formulation makes use of a dual-time-stepping like approach with an implicit second-order backward discretization of the physical time, in which a Jacobian-free inexact Newton method with a preconditioned generalized minimal residual (GMRES) algorithm is used to solve the system of nonlinear algebraic equations arising from the temporal and spatial discretization procedures. An additive Schwarz global preconditioner is used in conjunction with block incomplete LU type local preconditioners for each sub-domain. The Schwarz preconditioning and block-based data structure readily allow efficient and scalable <span class="hlt">parallel</span> implementations of the implicit AMR approach on distributed-memory multi-processor architectures. The scheme was applied to solutions of steady and unsteady laminar diffusion and premixed methane-air combustion and was found to accurately predict key flame characteristics. For a premixed flame under terrestrial gravity, the scheme accurately predicted the frequency of the natural</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/11015901','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/11015901"><span id="translatedtitle">Dislocation emission at the Silicon/Silicon nitride interface: A million atom molecular dynamics <span class="hlt">simulation</span> on <span class="hlt">parallel</span> computers</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Bachlechner; Omeltchenko; Nakano; Kalia; Vashishta; Ebbsjo; Madhukar</p> <p>2000-01-10</p> <p>Mechanical behavior of the Si(111)/Si(3)N4(0001) interface is studied using million atom molecular dynamics <span class="hlt">simulations</span>. At a critical value of applied strain <span class="hlt">parallel</span> to the interface, a crack forms on the silicon nitride surface and moves toward the interface. The crack does not propagate into the silicon substrate; instead, dislocations are emitted when the crack reaches the interface. The dislocation loop propagates in the (1; 1;1) plane of the silicon substrate with a speed of 500 (+/-100) m/s. Time evolution of the dislocation emission and nature of defects is studied. PMID:11015901</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/936676','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/936676"><span id="translatedtitle"><span class="hlt">Simulation</span> of Large <span class="hlt">Parallel</span> Plasma Flows in the Tokamak SOL Driven by Cross-Field Transport Asymmetries</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pigarov, A Y; Krasheninnikov, S I; LaBombard, B; Rognlien, T D</p> <p>2006-06-06</p> <p>Large-Mach-number <span class="hlt">parallel</span> plasma flows in the single-null SOL of different tokamaks are <span class="hlt">simulated</span> with multi-fluid transport code UEDGE. The key role of poloidal asymmetry of cross-field plasma transport as the driving mechanism for such flows is discussed. The impact of ballooning-like diffusive and convective transport and plasma flows on divertor detachment, material migration, impurity flows, and erosion/deposition profiles is studied. The results on well-balanced double null plasma modeling that are indicative of strong asymmetry of cross-field transport are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/949977','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/949977"><span id="translatedtitle"><span class="hlt">Parallel</span>, Multigrid Finite Element <span class="hlt">Simulator</span> for Fractured/Faulted and Other Complex Reservoirs based on Common Component Architecture (CCA)</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Milind Deo; Chung-Kan Huang; Huabing Wang</p> <p>2008-08-31</p> <p> volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase <span class="hlt">simulations</span> in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional <span class="hlt">simulators</span> were developed for fractured reservoirs using the generalized framework. The thermal-compositional <span class="hlt">simulator</span> was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The <span class="hlt">simulators</span> were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently <span class="hlt">parallel</span>. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015AAS...22514029G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015AAS...22514029G"><span id="translatedtitle">Supernova Emulators: Connecting Massively <span class="hlt">Parallel</span> SN Ia Radiative Transfer <span class="hlt">Simulations</span> to Data with Gaussian Processes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Goldstein, Daniel; Thomas, Rollin; Kasen, Daniel</p> <p>2015-01-01</p> <p>Collaboration between the type Ia supernova (SN Ia) modeling and observation communities hinges on our ability to directly connect <span class="hlt">simulations</span> to data. Here we introduce supernova emulation, a method for facilitating such a connection. Emulation allows us to instantaneously predict the observables (light curves, spectra, spectral time series) generated by arbitrary SN Ia radiative transfer <span class="hlt">simulations</span>, with estimates of prediction error. Emulators learn the mapping between physically meaningful <span class="hlt">simulation</span> inputs and the resulting synthetic observables from a training set of <span class="hlt">simulation</span> input-output pairs. In our emulation framework, we model PCA-decomposed representations of <span class="hlt">simulated</span> observables as an ensemble of Gaussian Processes. As a proof of concept, we train a bolometric light curve (BLC) emulator on a grid of 400 <span class="hlt">simulation</span> inputs and BLCs synthesized with the publicly available, gray, time-dependent Monte Carlo expanding atmospheres code, SMOKE. We emulate SMOKE <span class="hlt">simulations</span> evaluated at a set of 100 out-of-sample input parameters, and achieve excellent agreement between the emulator predictions and the <span class="hlt">simulated</span> BLCs. In addition to predicting <span class="hlt">simulation</span> outputs, emulators allow us to infer the regions of <span class="hlt">simulation</span> input parameter space that correspond to observed SN Ia light curves and spectra. We present a Bayesian framework for solving this inverse problem using Markov Chain Monte Carlo sampling. We fit published bolometric light curves with our emulator and obtain reconstructed masses (nickel mass, total ejecta mass) in agreement with reconstructions from semi-analytic models. We discuss applications of emulation to supernova cosmology and physics, including how emulators can be used to identify and quantify astrophysical sources of systematic error affecting SNe Ia as distance indicators for cosmology.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020048420','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020048420"><span id="translatedtitle">Progress in the <span class="hlt">Simulation</span> of Steady and Time-Dependent Flows with 3D <span class="hlt">Parallel</span> Unstructured Cartesian Methods</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)</p> <p>2002-01-01</p> <p>The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow <span class="hlt">simulation</span>, and arbitrary rigid domain motion. The <span class="hlt">parallel</span> multilevel method makes use of on-the-fly <span class="hlt">parallel</span> domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating <span class="hlt">parallel</span> speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/446400','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/446400"><span id="translatedtitle"><span class="hlt">Parallel</span>, out-of-core methods for N-body <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Salmon, J.; Warren, M.S.</p> <p>1997-03-01</p> <p>Hierarchical treecodes have, to a large extent, converted the compute-bound N-body problem into a memory-bound problem. The large ratio of DRAM to disk pricing suggests use of out-of-core techniques to overcome memory capacity limitations. The authors describe a <span class="hlt">parallel</span>, out-of-core treecode library, targeted at machines with independent secondary storage associated with each processor. Borrowing the space-filling curve techniques from the in-core library, and manually paging, resulting in excellent spatial and temporal locality and very good performance.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=20060051630&hterms=pyramids&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D50%26Ntt%3DThe%2Bpyramids','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=20060051630&hterms=pyramids&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D50%26Ntt%3DThe%2Bpyramids"><span id="translatedtitle">Applying <span class="hlt">Parallel</span> Adaptive Methods with GeoFEST/PYRAMID to <span class="hlt">Simulate</span> Earth Surface Crustal Dynamics</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Glasscoe, Margaret; Donnellan, Andrea; Li, Peggy</p> <p>2006-01-01</p> <p>This viewgraph presentation reviews the use Adaptive Mesh Refinement (AMR) in <span class="hlt">simulating</span> the Crustal Dynamics of Earth's Surface. AMR simultaneously improves solution quality, time to solution, and computer memory requirements when compared to generating/running on a globally fine mesh. The use of AMR in <span class="hlt">simulating</span> the dynamics of the Earth's Surface is spurred by future proposed NASA missions, such as InSAR for Earth surface deformation and other measurements. These missions will require support for large-scale adaptive numerical methods using AMR to model observations. AMR was chosen because it has been successful in computation fluid dynamics for predictive <span class="hlt">simulation</span> of complex flows around complex structures.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21156093','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21156093"><span id="translatedtitle"><span class="hlt">Simulation</span> of radioactive waste transmutation on the T. Node <span class="hlt">parallel</span> computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bacha, F.; Maillard, J.; Silva, J.</p> <p>1995-09-15</p> <p>Before any experiment on reactor driven by an accelerator, computer <span class="hlt">simulation</span> supplies tools for optimization. Some of the key parameters are neutron production on a heavy target and neutronic distribution flux in the core. During two code benchmarks organized by the NEA-OECD, <span class="hlt">simulations</span> of energetic incident proton collisions on a thin lead target for the first one, on a thick lead target for the second one, are described. One validation of our numeric codes is based on these results. A preliminary design of a burning waste system using benchmark result analysis and fission focused <span class="hlt">simulations</span> is proposed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20150000244','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20150000244"><span id="translatedtitle">LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Approach</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon</p> <p>2014-01-01</p> <p>This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our <span class="hlt">simulation</span> approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the <span class="hlt">simulation</span>. The conjunctions that exceed a threshold probability of collision are then engaged by a <span class="hlt">simulated</span> network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new <span class="hlt">simulations</span> with three updated aspects: 1) By utilizing a highly <span class="hlt">parallel</span> <span class="hlt">simulation</span> approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The <span class="hlt">simulation</span> time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new <span class="hlt">simulation</span> approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our <span class="hlt">simulation</span> approach to <span class="hlt">parallelize</span> the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012IJMPC..2350015H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012IJMPC..2350015H"><span id="translatedtitle">a Novel Mode and its Verification of <span class="hlt">Parallel</span> Molecular Dynamics <span class="hlt">Simulation</span> with the Coupling of Gpu and Cpu</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hou, Chaofeng; Ge, Wei</p> <p></p> <p>Graphics processing unit (GPU) is becoming a powerful computational tool in scientific and engineering fields. In this paper, for the purpose of the full employment of computing capability, a novel mode for <span class="hlt">parallel</span> molecular dynamics (MD) <span class="hlt">simulation</span> is presented and implemented on basis of multiple GPUs and hybrids with central processing units (CPUs). Taking into account the interactions between CPUs, GPUs, and the threads on GPU in a multi-scale and multilevel computational architecture, several cases, such as polycrystalline silicon and heat transfer on the surface of silicon crystals, are provided and taken as model systems to verify the feasibility and validity of the mode. Furthermore, the mode can be extended to MD <span class="hlt">simulation</span> of other areas such as biology, chemistry and so forth.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_21 --> <div id="page_22" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="421"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2672016','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2672016"><span id="translatedtitle">Probing the Nanosecond Dynamics of a Designed Three-Stranded Beta-Sheet with a Massively <span class="hlt">Parallel</span> Molecular Dynamics <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Voelz, Vincent A.; Luttmann, Edgar; Bowman, Gregory R.; Pande, Vijay S.</p> <p>2009-01-01</p> <p>Recently a temperature-jump FTIR study of a designed three-stranded sheet showing a fast relaxation time of ~140 ± 20 ns was published. We performed massively <span class="hlt">parallel</span> molecular dynamics <span class="hlt">simulations</span> in explicit solvent to probe the structural events involved in this relaxation. While our <span class="hlt">simulations</span> produce similar relaxation rates, the structural ensemble is broad. We observe the formation of turn structure, but only very weak interaction in the strand regions, which is consistent with the lack of strong backbone-backbone NOEs in previous structural NMR studies. These results suggest that either DPDP-II folds at time scales longer than 240 ns, or that DPDP-II is not a well-defined three-stranded β-sheet. This work also provides an opportunity to compare the performance of several popular forcefield models against one another. PMID:19399235</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/919411','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/919411"><span id="translatedtitle">CMAD: A Self-consistent <span class="hlt">Parallel</span> Code to <span class="hlt">Simulate</span> the Electron Cloud Build-up and Instabilities</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pivi, M.T.F.; /SLAC</p> <p>2007-11-07</p> <p>We present the features of CMAD, a newly developed self-consistent code which <span class="hlt">simulates</span> both the electron cloud build-up and related beam instabilities. By means of <span class="hlt">parallel</span> (Message Passing Interface - MPI) computation, the code tracks the beam in an existing (MAD-type) lattice and continuously resolves the interaction between the beam and the cloud at each element location, with different cloud distributions at each magnet location. The goal of CMAD is to <span class="hlt">simulate</span> single- and coupled-bunch instability, allowing tune shift, dynamic aperture and frequency map analysis and the determination of the secondary electron yield instability threshold. The code is in its phase of development and benchmarking with existing codes. Preliminary results on benchmarking are presented in this paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21069825','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21069825"><span id="translatedtitle">One-dimensional Vlasov <span class="hlt">simulation</span> of <span class="hlt">parallel</span> electric fields in two-electron population plasma</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Saharia, K.; Goswami, K. S.</p> <p>2007-09-15</p> <p>One-dimensional Vlasov <span class="hlt">simulation</span> in electron current carrying multicomponent plasma seeded with a density depression is presented. Considering two electron populations [one is sufficiently hot ({approx}keV) and the other is cold along with cold background ions], the formation of weak double layers is investigated. <span class="hlt">Simulation</span> results show that in this numerical setting, formation of such double layers needs the majority of the hot electrons.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JGRA..121.2080H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JGRA..121.2080H"><span id="translatedtitle">Formation of downstream high-speed jets by a rippled nonstationary quasi-<span class="hlt">parallel</span> shock: 2-D hybrid <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hao, Y.; Lembege, B.; Lu, Q.; Guo, F.</p> <p>2016-03-01</p> <p>Experimental observations from space missions (including more recently Cluster and Time History of Events and Macroscale Interactions during Substorms data) have clearly revealed the existence of high-speed jets (HSJs) in the downstream region of the quasi-<span class="hlt">parallel</span> terrestrial bow shock. Presently, two-dimensional hybrid <span class="hlt">simulations</span> are performed in order to investigate the formation of such HSJs through a rippled quasi-<span class="hlt">parallel</span> shock front. The <span class="hlt">simulation</span> results show that (i) such shock fronts are strongly nonstationary along the shock normal, and (ii) ripples are evidenced along the shock front as the upstream ULF waves (excited by interaction between incident and reflected ions) are convected back to the front by the solar wind and contribute to the rippling formation. Then, these ripples are inherent structures of a quasi-<span class="hlt">parallel</span> shock. As a consequence, new incident solar wind ions interact differently at different locations along the shock surface, and the ion bulk velocity strongly differs locally as ions are transmitted downstream. Preliminary results show that (i) local bursty patterns of turbulent magnetic field may form within the rippled front and play the role of local secondary shock; (ii) some incident ion flows penetrate the front, suffer some deflection (instead of being decelerated) at the locations of these secondary shocks, and are at the origin of well-structured (filamentary) HSJs downstream; and (iii) the spatial scales of HSJs are in a good agreement with experimental observations. Such downstream HSJs are shown to be generated by local curvature effects (front rippling) and the nonstationarity of the shock front itself.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1993PhDT.......112D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1993PhDT.......112D"><span id="translatedtitle">Molecular Dynamics <span class="hlt">Simulations</span> on <span class="hlt">Parallel</span> Computers: a Study of Polar Versus Nonpolar Media Effects in Small Molecule Solvation.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Debolt, Stephen Edward</p> <p></p> <p>Solvent effects were studied and described via molecular dynamics (MD) and free energy perturbation (FEP) <span class="hlt">simulations</span> using the molecular mechanics program AMBER. The following specific topics were explored:. Polar solvents cause a blue shift of the rm nto pi^* transition band of simple alkyl carbonyl compounds. The ground- versus excited-state solvation effects responsible for the observed solvatochromism are described in terms of the molecular level details of solute-solvent interactions in several modeled solvents spanning the range from polar to nonpolar, including water, methanol, and carbon tetrachloride. The structure and dynamics of octanol media were studied to explore the question: "why is octanol/water media such a good biophase analog?". The formation of linear and cyclic polymers of hydrogen-bonded solvent molecules, micelle-like clusters, and the effects of saturating waters are described. Two small drug-sized molecules, benzene and phenol, were solvated in water-saturated octanol. The solute-solvent structure and dynamics were analysed. The difference in their partitioning free energies was calculated. MD and FEP calculations were adapted for <span class="hlt">parallel</span> computation, increasing their "speed" or the time span accessible by a <span class="hlt">simulation</span>. The non-cyclic polyether ionophore salinomycin was studied in methanol solvent via <span class="hlt">parallel</span> FEP. The path of binding and release for a potassium ion was investigated by calculating the potential of mean force along the "exit vector".</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2010A%26A...513A..36M&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2010A%26A...513A..36M&link_type=ABSTRACT"><span id="translatedtitle">EvoL: the new Padova Tree-SPH <span class="hlt">parallel</span> code for cosmological <span class="hlt">simulations</span>. I. Basic code: gravity and hydrodynamics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Merlin, E.; Buonomo, U.; Grassi, T.; Piovan, L.; Chiosi, C.</p> <p>2010-04-01</p> <p>Context. We present the new release of the Padova N-body code for cosmological <span class="hlt">simulations</span> of galaxy formation and evolution, EvoL. The basic Tree + SPH code is presented and analysed, together with an overview of the software architectures. Aims: EvoL is a flexible <span class="hlt">parallel</span> Fortran95 code, specifically designed for <span class="hlt">simulations</span> of cosmological structure formations on cluster, galactic and sub-galactic scales. Methods: EvoL is a fully Lagrangian self-adaptive code, based on the classical oct-tree by Barnes & Hut (1986, Nature, 324, 446) and on the smoothed particle hydrodynamics algorithm (SPH, Lucy 1977, AJ, 82, 1013). It includes special features like adaptive softening lengths with correcting extra-terms, and modern formulations of SPH and artificial viscosity. It is designed to be run in <span class="hlt">parallel</span> on multiple CPUs to optimise the performance and save computational time. Results: We describe the code in detail, and present the results of a number of standard hydrodynamical tests.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008CoPhC.179..604F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008CoPhC.179..604F"><span id="translatedtitle">CLUSTEREASY: A program for lattice <span class="hlt">simulations</span> of scalar fields in an expanding universe on <span class="hlt">parallel</span> computing clusters</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Felder, Gary</p> <p>2008-10-01</p> <p>We describe an MPI C++ program that we have written and made available for calculating the evolution of interacting scalar fields in an expanding universe on <span class="hlt">parallel</span> clusters. The program is a <span class="hlt">parallel</span> programming extension of the <span class="hlt">simulation</span> program LATTICEEASY. The ability to run these <span class="hlt">simulations</span> on <span class="hlt">parallel</span> clusters, however, greatly extends the range of scales and times that can be <span class="hlt">simulated</span>. The program is particularly useful for the study of reheating and thermalization after inflation. The program and its full documentation are available on the Web at http://www.science.smith.edu/departments/Physics/fstaff/gfelder/latticeeasy/. In this paper we provide a brief overview of what the program does and what it is useful for. Catalogue identifier: AEBJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7469 No. of bytes in distributed program, including test data, etc.: 613 334 Distribution format: tar.gz Programming language: C++/MPI Computer: Cluster. Must have the library FFTW installed Operating system: Any RAM: Typically 4 MB to 1 GB per processor Classification: 1.9 External routines: A single-precision version of the FFTW library (http://www.fftw.org/) must be available on the target machine. Nature of problem: After inflation the universe consisted of interacting fields in a high energy, nonthermal state [1]. The evolution of these fields cannot be described with standard approximation techniques such as linearization, kinetic theory, or Hartree expansion, and must thus be <span class="hlt">simulated</span> numerically. Fortunately, the fields rapidly acquire large occupation numbers over a range of frequencies, so their evolution can be accurately modeled with classical field theory [2]. The specific fields and</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015EPJWC..9202080S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015EPJWC..9202080S"><span id="translatedtitle">Scalability of the <span class="hlt">parallel</span> CFD <span class="hlt">simulations</span> of flow past a fluttering airfoil in OpenFOAM</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Šidlof, Petr; Řidký, Václav</p> <p>2015-05-01</p> <p>The paper is devoted to investigation of unsteady subsonic airflow past an elastically supported airfoil during onset of the flutter instability. Based on the geometry, boundary conditions and airfoil motion data identified from wind-tunnel measurements, a 3D CFD model has been set up in OpenFOAM. The model is based on incompressible Navier-Stokes equations. The turbulence is modelled by the Menter's k-omega shear stress transport turbulence model. The computational mesh was generated in GridPro, a mesh generator capable of producing highly orthogonal structured C-type meshes. The mesh totals 3.1 million elements. <span class="hlt">Parallel</span> scalability was measured on a small shared-memory SGI Altix UV 100 supercomputer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19950047148&hterms=steady+state&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D80%26Ntt%3Dsteady%2Bstate','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19950047148&hterms=steady+state&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D80%26Ntt%3Dsteady%2Bstate"><span id="translatedtitle">Investigation of intrinsic variability in one-dimensional <span class="hlt">parallel</span> shocks using steady state hybrid <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Bennett, Lee; Ellison, Donald C.</p> <p>1995-01-01</p> <p>We have developed a means of producing a steady state hybrid <span class="hlt">simulation</span> of a collisionless shock. The shock is stopped in the <span class="hlt">simulation</span> box by transforming into the shock frame and by modifying the downstream boundary conditions to allow the plasma to flow through the <span class="hlt">simulation</span> box. Once the shock is stationary in the box frame, the <span class="hlt">simulation</span> can be run for an arbitrary time with a fixed box size and a fixed number of <span class="hlt">simulation</span> particles. Using this technique, we have shown that certain gross properties associated with the shock, such as the particle distribution function (including energetic particles produced by Fermi acceleration) and the flow speed profile, are constant (except for statistical variations) over hundreds of gyroperiods when averaged over times short compared to the average residence time of energetic particles. Our results imply that any microphysical processes responsible for particle heating and/or injection into the Fermi mechanism can be viewed as smooth and continuous on timescales longer than a few gyroperiods.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19830021326','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19830021326"><span id="translatedtitle">Design of a high-speed digital processing element for <span class="hlt">parallel</span> <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Milner, E. J.; Cwynar, D. S.</p> <p>1983-01-01</p> <p>A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine <span class="hlt">simulator</span> is described. The purpose of the custom design was to give the computer the speed and versatility required to <span class="hlt">simulate</span> a jet engine in real time. Real time <span class="hlt">simulations</span> are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19990019835&hterms=n-body&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D90%26Ntt%3Dn-body','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19990019835&hterms=n-body&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D90%26Ntt%3Dn-body"><span id="translatedtitle">Virtual Petaflop <span class="hlt">Simulation</span>: <span class="hlt">Parallel</span> Potential Solvers and New Integrators for Gravitational Systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Lake, George; Quinn, Thomas; Richardson, Derek C.; Stadel, Joachim</p> <p>1999-01-01</p> <p>"The orbit of any one planet depends on the combined motion of all the planets, not to mention the actions of all these on each other. To consider simultaneously all these causes of motion and to define these motions by exact laws allowing of convenient calculation exceeds, unless I am mistaken, the forces of the entire human intellect" -Isaac Newton 1687. Epochal surveys are throwing down the gauntlet for cosmological <span class="hlt">simulation</span>. We describe three keys to meeting the challenge of N-body <span class="hlt">simulation</span>: adaptive potential solvers, adaptive integrators and volume renormalization. With these techniques and a dedicated Teraflop facility, <span class="hlt">simulation</span> can stay even with observation of the Universe. We also describe some problems in the formation and stability of planetary systems. Here, the challenge is to perform accurate integrations that retain Hamiltonian properties for 10(exp 13) timesteps.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27046603','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27046603"><span id="translatedtitle">GPU-Based <span class="hlt">Parallelized</span> Solver for Large Scale Vascular Blood Flow Modeling and <span class="hlt">Simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Santhanam, Anand P; Neylon, John; Eldredge, Jeff; Teran, Joseph; Dutson, Erik; Benharash, Peyman</p> <p>2016-01-01</p> <p>Cardio-vascular blood flow <span class="hlt">simulations</span> are essential in understanding the blood flow behavior during normal and disease conditions. To date, such blood flow <span class="hlt">simulations</span> have only been done at a macro scale level due to computational limitations. In this paper, we present a GPU based large scale solver that enables modeling the flow even in the smallest arteries. A mechanical equivalent of the circuit based flow modeling system is first developed to employ the GPU computing framework. Numerical studies were employed using a set of 10 million connected vascular elements. Run-time flow analysis were performed to <span class="hlt">simulate</span> vascular blockages, as well as arterial cut-off. Our results showed that we can achieve ~100 FPS using a GTX 680m and ~40 FPS using a Tegra K1 computing platform. PMID:27046603</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19890015601','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19890015601"><span id="translatedtitle"><span class="hlt">Simulation</span> of electrostatic ion instabilities in the presence of <span class="hlt">parallel</span> currents and transverse electric fields</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Nishikawa, K.-I.; Ganguli, G.; Lee, Y. C.; Palmadesso, P. J.</p> <p>1989-01-01</p> <p>A spatially two-dimensional electrostatic PIC <span class="hlt">simulation</span> code was used to study the stability of a plasma equilibrium characterized by a localized transverse dc electric field and a field-aligned drift for L is much less than Lx, where Lx is the <span class="hlt">simulation</span> length in the x direction and L is the scale length associated with the dc electric field. It is found that the dc electric field and the field-aligned current can together play a synergistic role to enable the excitation of electrostatic waves even when the threshold values of the field aligned drift and the E x B drift are individually subcritical. The <span class="hlt">simulation</span> results show that the growing ion waves are associated with small vortices in the linear stage, which evolve to the nonlinear stage dominated by larger vortices with lower frequencies.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/pages/biblio/1198062-improved-parallel-hashed-oct-tree-body-algorithm-cosmological-simulation','SCIGOV-DOEP'); return false;" href="http://www.osti.gov/pages/biblio/1198062-improved-parallel-hashed-oct-tree-body-algorithm-cosmological-simulation"><span id="translatedtitle">2HOT: An Improved <span class="hlt">Parallel</span> Hashed Oct-Tree N-Body Algorithm for Cosmological <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/pages">DOE PAGESBeta</a></p> <p>Warren, Michael S.</p> <p>2014-01-01</p> <p>We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2 18 ) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (4096 3 ) particle cosmological <span class="hlt">simulations</span>, accounting for 4×10 20 floating point operations. These results include the first <span class="hlt">simulations</span> using the new constraints on the standard model of cosmology from the Planck satellite. Our <span class="hlt">simulations</span> set a new standard for accuracymore » and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.« less</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1015686','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1015686"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive Cartesian upwind methods for shock-driven multiphysics <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Deiterding, Ralf</p> <p>2011-01-01</p> <p>The multiphysics fluid-structure interaction <span class="hlt">simulation</span> of shock-loaded thin-walled structures requires the dynamic coupling of a shock-capturing flow solver to a solid mechanics solver for large deformations. By combining a Cartesian embedded boundary approach with dynamic mesh adaptation a generic software framework for such flow solvers has been constructed that allows easy exchange of the specific hydrodynamic finite volume upwind scheme and coupling to various explicit finite element solid dynamics solvers. The paper gives an overview of the computational approach and presents first <span class="hlt">simulations</span> that couple the software to the general purpose solid dynamics code DYNA3D.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1038825','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1038825"><span id="translatedtitle">Optimal Use of Data in <span class="hlt">Parallel</span> Tempering <span class="hlt">Simulations</span> for the Construction of Discrete-State Markov Models of Biomolecular Dynamics</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Prinz, Jan-Hendrik; Chondera, John D; Pande, Vijay S; Swope, William C; Smith, Jeremy C; Noe, F</p> <p>2011-01-01</p> <p><span class="hlt">Parallel</span> tempering (PT) molecular dynamics <span class="hlt">simulations</span> have been extensively investigated as a means of efficient sampling of the configurations of biomolecular systems. Recent work has demonstrated how the short physical trajectories generated in PT <span class="hlt">simulations</span> of biomolecules can be used to construct the Markov models describing biomolecular dynamics at each <span class="hlt">simulated</span> temperature. While this approach describes the temperature-dependent kinetics, it does not make optimal use of all available PT data, instead estimating the rates at a given temperature using only data from that temperature. This can be problematic, as some relevant transitions or states may not be sufficiently sampled at the temperature of interest, but might be readily sampled at nearby temperatures. Further, the comparison of temperature-dependent properties can suffer from the false assumption that data collected from different temperatures are uncorrelated. We propose here a strategy in which, by a simple modification of the PT protocol, the harvested trajectories can be reweighted, permitting data from all temperatures to contribute to the estimated kinetic model. The method reduces the statistical uncertainty in the kinetic model relative to the single temperature approach and provides estimates of transition probabilities even for transitions not observed at the temperature of interest. Further, the method allows the kinetics to be estimated at temperatures other than those at which <span class="hlt">simulations</span> were run. We illustrate this method by applying it to the generation of a Markov model of the conformational dynamics of the solvated terminally blocked alanine peptide.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2000APS..MAR.G1004V','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2000APS..MAR.G1004V"><span id="translatedtitle">Large-Scale Atomistic <span class="hlt">Simulations</span> of Solid State Materials -- Modeling Many Millions of Atoms on <span class="hlt">Parallel</span> Computers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Vashishta, Priya</p> <p>2000-03-01</p> <p>Structural and dynamical correlations including crack propagation and fracture in nanophase materials, atomic level stresses in nanopixels, nanoindentation in crystalline and amorphous materials, and dynamics of oxidation in metallic nanoparticles will be discussed using large-scale atomistic <span class="hlt">simulations</span>. Multiresolution molecular-dynamics (MRMD) approach for multimillion atom <span class="hlt">simulations</span> has been used to carry out the 10-100 million atom <span class="hlt">simulations</span> on a variety of <span class="hlt">parallel</span> computer architectures including Cray T3E, SGI Origin, IBM SP, and large workstation clusters. Issues related to matching of length scales to carry out seamless <span class="hlt">simulations</span> of electronic, atomic and continuum degrees of freedom will also be briefly discussed. Research presented in this talk is carried out in collaboration with Martina E. Bachlechner, Timothy Campbell, Ingvar Ebbsjo, Rajiv K. Kalia, Hideaki Kikuchi, Sanjay Kodiyalam, Elefterios Lidorikis, Anupam Madhukar, Aiichiro Nakano, Shuji Ogata, Subhash Saini, Fuyuki Shimojo, and Phillip Walsh. Research supported by the US DOE, NSF, AFOSR, ARO, USC-LSU MURI (DARPA & AFOSR), NASA, and LEQSF</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/15311837','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/15311837"><span id="translatedtitle">A <span class="hlt">parallel</span> framework for the FE-based <span class="hlt">simulation</span> of knee joint motion.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Wawro, Martin; Fathi-Torbaghan, Madjid</p> <p>2004-08-01</p> <p>We present an object-oriented framework for the finite-element (FE)-based <span class="hlt">simulation</span> of the human knee joint motion. The FE model of the knee joint is acquired from the patients in vivo by using magnetic resonance imaging. The MRI images are converted into a three-dimensional model and finally an all-hexahedral mesh for the FE analysis is generated. The <span class="hlt">simulation</span> environment uses nonlinear finite-element analysis (FEA) and is capable of handling contact of the model to handle the complex rolling/sliding motion of the knee joint. The software strictly follows object-oriented concepts of software engineering in order to guarantee maximum extensibility and maintainability. The final goal of this work-in-progress is the creation of a computer-based biomechanical model of the knee joint which can be used in a variety of applications, ranging from prosthesis design and treatment planning (e.g., optimal reconstruction of ruptured ligaments) over surgical <span class="hlt">simulation</span> to impact computations in crashworthiness <span class="hlt">simulations</span>. PMID:15311837</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2009AGUFM.H31B0792H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2009AGUFM.H31B0792H"><span id="translatedtitle">Massively <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Uranium Migration at the Hanford 300 Area</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hammond, G. E.; Lichtner, P. C.</p> <p>2009-12-01</p> <p>Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to <span class="hlt">simulate</span> uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to <span class="hlt">simulate</span> geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to <span class="hlt">simulating</span> radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration direction, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components <span class="hlt">simulated</span>, 3D <span class="hlt">simulation</span> of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010JCoPh.229.6804L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010JCoPh.229.6804L"><span id="translatedtitle">Towards large-scale multi-socket, multicore <span class="hlt">parallel</span> <span class="hlt">simulations</span>: Performance of an MPI-only semiconductor device <span class="hlt">simulator</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lin, Paul T.; Shadid, John N.</p> <p>2010-09-01</p> <p>This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device <span class="hlt">simulator</span> on a set of multi-socket, multicore architectures with nonuniform memory access (NUMA) compute nodes. These multicore architectures include two linux clusters with multicore processors: a quad-socket, quad-core AMD Opteron platform and a dual-socket, quad-core Intel Xeon Nehalem platform; and a dual-socket, six-core AMD Opteron workstation. These platforms have complex memory hierarchies that include local core-based cache, local socket-based memory, access to memory on the same mainboard from another socket, and then memory across network links to different nodes. The specific semiconductor device <span class="hlt">simulator</span> used in this study employs a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling results presented include a large-scale problem of 100+ million unknowns on 4096 cores and a comparison with the Cray XT3/4 Red Storm capability platform. Although the MPI-only device <span class="hlt">simulator</span> employed for this work can take advantage of all the cores of quad-core and six-core CPUs, the efficiency of the linear system solve is decreasing with increased core count and eventually a different programming paradigm will be needed.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_22 --> <div id="page_23" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="441"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19990110308','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19990110308"><span id="translatedtitle"><span class="hlt">Parallelization</span> of Rocket Engine <span class="hlt">Simulator</span> Software (P.R.E.S.S.)</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Cezzar, Ruknet</p> <p>1999-01-01</p> <p><span class="hlt">Parallelization</span> of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The project has started on October 19, 1995, and after a three-year period corresponding to project phases and fiscal-year funding by NASA Lewis Research Center (now Glenn Research Center), has ended on October 18, 1998. The one-year no-cost extension period was granted on June 7, 1998, until October 19, 1999. The aim of this one year no-cost extension period was to carry out further research to complete the work and lay the groundwork for subsequent research in the area of aerospace engine design optimization software tools. The previous progress for the research has been reported in great detail in respective interim and final research progress reports, seven of them, in all. While the purpose of this report is to be a final summary and an valuative view of the entire work since the first year funding, the following is a quick recap of the most important sections of the interim report dated April 30, 1999.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19890014504','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19890014504"><span id="translatedtitle">Modeling and <span class="hlt">simulation</span> of a Stewart platform type <span class="hlt">parallel</span> structure robot</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Lim, Gee Kwang; Freeman, Robert A.; Tesar, Delbert</p> <p>1989-01-01</p> <p>The kinematics and dynamics of a Stewart Platform type <span class="hlt">parallel</span> structure robot (NASA's Dynamic Docking Test System) were modeled using the method of kinematic influence coefficients (KIC) and isomorphic transformations of system dependence from one set of generalized coordinates to another. By specifying the end-effector (platform) time trajectory, the required generalized input forces which would theoretically yield the desired motion were determined. It was found that the relationship between the platform motion and the actuators motion was nonlinear. In addition, the contribution to the total generalized forces, required at the actuators, from the acceleration related terms were found to be more significant than the velocity related terms. Hence, the curve representing the total required actuator force generally resembled the curve for the acceleration related force. Another observation revealed that the acceleration related effective inertia matrix I sub dd had the tendency to decouple, with the elements on the main diagonal of I sub dd being larger than the off-diagonal elements, while the velocity related inertia power array P sub ddd did not show such tendency. This tendency results in the acceleration related force curve of a given actuator resembling the acceleration profile of that particular actuator. Furthermore, it was indicated that the effective inertia matrix for the legs is more decoupled than that for the platform. These observations provide essential information for further research to develop an effective control strategy for real-time control of the Dynamic Docking Test System.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013JCoPh.250..224A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013JCoPh.250..224A"><span id="translatedtitle">Highly efficient numerical algorithm based on random trees for accelerating <span class="hlt">parallel</span> Vlasov-Poisson <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Acebrón, Juan A.; Rodríguez-Rozas, Ángel</p> <p>2013-10-01</p> <p>An efficient numerical method based on a probabilistic representation for the Vlasov-Poisson system of equations in the Fourier space has been derived. This has been done theoretically for arbitrary dimensional problems, and particularized to unidimensional problems for numerical purposes. Such a representation has been validated theoretically in the linear regime comparing the solution obtained with the classical results of the linear Landau damping. The numerical strategy followed requires generating suitable random trees combined with a Padé approximant for approximating accurately a given divergent series. Such series are obtained by summing the partial contributions to the solution coming from trees with arbitrary number of branches. These contributions, coming in general from multi-dimensional definite integrals, are efficiently computed by a quasi-Monte Carlo method. It is shown how the accuracy of the method can be effectively increased by considering more terms of the series. The new representation was used successfully to develop a Probabilistic Domain Decomposition method suited for massively <span class="hlt">parallel</span> computers, which improves the scalability found in classical methods. Finally, a few numerical examples based on classical phenomena such as the non-linear Landau damping, and the two streaming instability are given, illustrating the remarkable performance of the algorithm, when compared the results with those obtained using a classical method.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..MARD27010L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..MARD27010L"><span id="translatedtitle">SIESTA-PEXSI: Massively <span class="hlt">parallel</span> method for efficient and accurate ab initio materials <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lin, Lin; Huhs, Georg; Garcia, Alberto; Yang, Chao</p> <p>2014-03-01</p> <p>We describe how to combine the pole expansion and selected inversion (PEXSI) technique with the SIESTA method, which uses numerical atomic orbitals for Kohn-Sham density functional theory (KSDFT) calculations. The PEXSI technique can efficiently utilize the sparsity pattern of the Hamiltonian matrix and the overlap matrix generated from codes such as SIESTA, and solves KSDFT without using cubic scaling matrix diagonalization procedure. The complexity of PEXSI scales at most quadratically with respect to the system size, and the accuracy is comparable to that obtained from full diagonalization. One distinct feature of PEXSI is that it achieves low order scaling without using the near-sightedness property and can be therefore applied to metals as well as insulators and semiconductors, at room temperature or even lower temperature. The PEXSI method is highly scalable, and the recently developed massively <span class="hlt">parallel</span> PEXSI technique can make efficient usage of 10,000 ~100,000 processors on high performance machines. We demonstrate the performance the SIESTA-PEXSI method using several examples for large scale electronic structure calculation including long DNA chain and graphene-like structures with more than 20000 atoms. Funded by Luis Alvarez fellowship in LBNL, and DOE SciDAC project in partnership with BES.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5090912','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5090912"><span id="translatedtitle"><span class="hlt">Parallel</span> algorithms and architectures</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Albrecht, A.; Jung, H.; Mehlhorn, K.</p> <p>1987-01-01</p> <p>Contents of this book are the following: Preparata: Deterministic <span class="hlt">simulation</span> of idealized <span class="hlt">parallel</span> computers on more realistic ones; Convex hull of randomly chosen points from a polytope; Dataflow computing; <span class="hlt">Parallel</span> in sequence; Towards the architecture of an elementary cortical processor; <span class="hlt">Parallel</span> algorithms and static analysis of <span class="hlt">parallel</span> programs; <span class="hlt">Parallel</span> processing of combinatorial search; Communications; An O(nlogn) cost <span class="hlt">parallel</span> algorithms for the single function coarsest partition problem; Systolic algorithms for computing the visibility polygon and triangulation of a polygonal region; and RELACS - A recursive layout computing system. <span class="hlt">Parallel</span> linear conflict-free subtree access.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015HyInt.235...87F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015HyInt.235...87F"><span id="translatedtitle">Using GPU <span class="hlt">parallelization</span> to perform realistic <span class="hlt">simulations</span> of the LPCTrap experiments</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Fabian, X.; Mauger, F.; Quéméner, G.; Velten, Ph.; Ban, G.; Couratin, C.; Delahaye, P.; Durand, D.; Fabre, B.; Finlay, P.; Fléchard, X.; Liénard, E.; Méry, A.; Naviliat-Cuncic, O.; Pons, B.; Porobic, T.; Severijns, N.; Thomas, J. C.</p> <p>2015-11-01</p> <p>The LPCTrap setup is a sensitive tool to measure the β - ν angular correlation coefficient, a β ν , which can yield the mixing ratio ρ of a β decay transition. The latter enables the extraction of the Cabibbo-Kobayashi-Maskawa (CKM) matrix element V u d . In such a measurement, the most relevant observable is the energy distribution of the recoiling daughter nuclei following the nuclear β decay, which is obtained using a time-of-flight technique. In order to maximize the precision, one can reduce the systematic errors through a thorough <span class="hlt">simulation</span> of the whole set-up, especially with a correct model of the trapped ion cloud. This paper presents such a <span class="hlt">simulation</span> package and focuses on the ion cloud features; particular attention is therefore paid to realistic descriptions of trapping field dynamics, buffer gas cooling and the N-body space charge effects.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/850178','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/850178"><span id="translatedtitle">Final Report for 'ParSEC-<span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Electron Cooling"</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>David L Bruhwiler</p> <p>2005-09-16</p> <p>The Department of Energy has plans, during the next two or three years, to design an electron cooling section for the collider ring at RHIC (Relativistic Heavy Ion Collider) [1]. Located at Brookhaven National Laboratory (BNL), RHIC is the premier nuclear physics facility. The new cooling section would be part of a proposed luminosity upgrade [2] for RHIC. This electron cooling section will be different from previous electron cooling facilities in three fundamental ways. First, the electron energy will be 50 MeV, as opposed to 100's of keV (or 4 MeV for the electron cooling system now operating at Fermilab [3]). Second, both the electron beam and the ion beam will be bunched, rather than being essentially continuous. Third, the cooling will take place in a collider rather than in a storage ring. Analytical work, in combination with the use and further development of the semi-analytical codes BETACOOL [4,5] and SimCool [6,7] are being pursued at BNL [8] and at other laboratories around the world. However, there is a growing consensus in the field that high-fidelity 3-D particle <span class="hlt">simulations</span> are required to fully understand the critical cooling physics issues in this new regime. <span class="hlt">Simulations</span> of the friction coefficient, using the VORPAL code [9], for single gold ions passing once through the interaction region, have been compared with theoretical calculations [10,11], and the results have been presented in conference proceedings papers [8,12,13,14] and presentations [15,16,17]. Charged particles are advanced using a fourth-order Hermite predictor corrector algorithm [18]. The fields in the beam frame are obtained from direct calculation of Coulomb's law, which is more efficient than multipole-type algorithms for less than {approx} 10{sup 6} particles. Because the interaction time is so short, it is necessary to suppress the diffusive aspect of the ion dynamics through the careful use of positrons in the <span class="hlt">simulations</span>, and to run 100's of <span class="hlt">simulations</span> with the same</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016APS..MARP22005J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016APS..MARP22005J"><span id="translatedtitle">Accurate, efficient, and scalable <span class="hlt">parallel</span> <span class="hlt">simulation</span> of mesoscale electrostatic/magnetostatic problems accelerated by a fast multipole method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jiang, Xikai; Karpeev, Dmitry; Li, Jiyuan; de Pablo, Juan; Hernandez-Ortiz, Juan; Heinonen, Olle</p> <p></p> <p>Boundary integrals arise in many electrostatic and magnetostatic problems. In computational modeling of these problems, although the integral is performed only on the boundary of a domain, its direct evaluation needs O(N2) operations, where N is number of unknowns on the boundary. The O(N2) scaling impedes a wider usage of the boundary integral method in scientific and engineering communities. We have developed a <span class="hlt">parallel</span> computational approach that utilize the Fast Multipole Method to evaluate the boundary integral in O(N) operations. To demonstrate the accuracy, efficiency, and scalability of our approach, we consider two test cases. In the first case, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space using a hybrid finite element-boundary integral method. In the second case, we solve an electrostatic problem involving the polarization of dielectric objects in free space using the boundary element method. The results from test cases show that our <span class="hlt">parallel</span> approach can enable highly efficient and accurate <span class="hlt">simulations</span> of mesoscale electrostatic/magnetostatic problems. Computing resources was provided by Blues, a high-performance cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory. Work at Argonne was supported by U. S. DOE, Office of Science under Contract No. DE-AC02-06CH11357.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/3415','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/3415"><span id="translatedtitle"><span class="hlt">Simulations</span> of implosions with a 3D, <span class="hlt">parallel</span>, unstructured-grid, radiation-hydrodynamics code</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kaiser, T B; Milovich, J L; Prasad, M K; Rathkopf, J; Shestakov, A I</p> <p>1998-12-28</p> <p>An unstructured-grid, radiation-hydrodynamics code is used to <span class="hlt">simulate</span> implosions. Although most of the problems are spherically symmetric, they are run on 3D, unstructured grids in order to test the code's ability to maintain spherical symmetry of the converging waves. Three problems, of increasing complexity, are presented. In the first, a cold, spherical, ideal gas bubble is imploded by an enclosing high pressure source. For the second, we add non-linear heat conduction and drive the implosion with twelve laser beams centered on the vertices of an icosahedron. In the third problem, a NIF capsule is driven with a Planckian radiation source.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23320725','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23320725"><span id="translatedtitle">A coarse-grained model for DNA-functionalized spherical colloids, revisited: effective pair potential from <span class="hlt">parallel</span> replica <span class="hlt">simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Theodorakis, Panagiotis E; Dellago, Christoph; Kahl, Gerhard</p> <p>2013-01-14</p> <p>We discuss a coarse-grained model recently proposed by Starr and Sciortino [J. Phys.: Condens. Matter 18, L347 (2006)] for spherical particles functionalized with short single DNA strands. The model incorporates two key aspects of DNA hybridization, i.e., the specificity of binding between DNA bases and the strong directionality of hydrogen bonds. Here, we calculate the effective potential between two DNA-functionalized particles of equal size using a <span class="hlt">parallel</span> replica protocol. We find that the transition from bonded to unbonded configurations takes place at considerably lower temperatures compared to those that were originally predicted using standard <span class="hlt">simulations</span> in the canonical ensemble. We put particular focus on DNA-decorations of tetrahedral and octahedral symmetry, as they are promising candidates for the self-assembly into a single-component diamond structure. Increasing colloid size hinders hybridization of the DNA strands, in agreement with experimental findings. PMID:23320725</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016AIPC.1734p0006G&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016AIPC.1734p0006G&link_type=ABSTRACT"><span id="translatedtitle">A three-degree-of-freedom <span class="hlt">parallel</span> manipulator for concentrated solar power towers: Modeling, <span class="hlt">simulation</span> and design</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ghosal, Ashitava; Shyam, R. B. Ashith</p> <p>2016-05-01</p> <p>There is an increased thrust to harvest solar energy in India to meet increasing energy requirements and to minimize imported fossil fuels. In a solar power tower system, an array of tracking mirrors or heliostats are used to concentrate the incident solar energy on an elevated stationary receiver and then the thermal energy converted to electricity using a heat engine. The conventional method of tracking are the Azimuth-Elevation (Az-El) or Target-Aligned (T-A) mount. In both the cases, the mirror is rotated about two mutually perpendicular axes and is supported at the center using a pedestal which is fixed to the ground. In this paper, a three degree-of-freedom <span class="hlt">parallel</span> manipulator, namely the 3-RPS, is proposed for tracking the sun in a solar power tower system. We present modeling, <span class="hlt">simulation</span> and design of the 3-RPS <span class="hlt">parallel</span> manipulator and show its advantages over conventional Az-El and T-A mounts. The 3-RPS manipulator consists of three rotary (R), three prismatic (P) and three spherical (S) joints and the mirror assembly is mounted at three points in contrast to the Az-El and T-A mounts. The kinematic equations for sun tracking are derived for the 3-RPS manipulator and from the <span class="hlt">simulations</span>, we obtain the range of motion of the rotary, prismatic and spherical joints. Since the mirror assembly is mounted at three points, the wind load and self-weight are distributed and as a consequence, the deflections due to loading are smaller than in conventional mounts. It is shown that the weight of the supporting structure is between 15% and 65% less than that of conventional systems. Hence, even though one additional actuator is used, the larger area mirrors can be used and costs can be reduced.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22218315','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22218315"><span id="translatedtitle">The relation between reconnected flux, the <span class="hlt">parallel</span> electric field, and the reconnection rate in a three-dimensional kinetic <span class="hlt">simulation</span> of magnetic reconnection</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Wendel, D. E.; Olson, D. K.; Hesse, M.; Kuznetsova, M.; Adrian, M. L.; Aunai, N.; Karimabadi, H.; Daughton, W.</p> <p>2013-12-15</p> <p>We investigate the distribution of <span class="hlt">parallel</span> electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell <span class="hlt">simulation</span> of 3D turbulent magnetic reconnection with open boundary conditions. The <span class="hlt">simulation</span>'s guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated <span class="hlt">parallel</span> electric field (or quasi-potential). Furthermore, we investigate the distribution of the <span class="hlt">parallel</span> electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first–order trends in the <span class="hlt">parallel</span> electric field while the contribution from fluctuations of the <span class="hlt">parallel</span> electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection <span class="hlt">parallel</span> electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the <span class="hlt">parallel</span> electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhPl...20l2105W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhPl...20l2105W"><span id="translatedtitle">The relation between reconnected flux, the <span class="hlt">parallel</span> electric field, and the reconnection rate in a three-dimensional kinetic <span class="hlt">simulation</span> of magnetic reconnection</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wendel, D. E.; Olson, D. K.; Hesse, M.; Aunai, N.; Kuznetsova, M.; Karimabadi, H.; Daughton, W.; Adrian, M. L.</p> <p>2013-12-01</p> <p>We investigate the distribution of <span class="hlt">parallel</span> electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell <span class="hlt">simulation</span> of 3D turbulent magnetic reconnection with open boundary conditions. The <span class="hlt">simulation</span>'s guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated <span class="hlt">parallel</span> electric field (or quasi-potential). Furthermore, we investigate the distribution of the <span class="hlt">parallel</span> electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the <span class="hlt">parallel</span> electric field while the contribution from fluctuations of the <span class="hlt">parallel</span> electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection <span class="hlt">parallel</span> electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the <span class="hlt">parallel</span> electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/981693','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/981693"><span id="translatedtitle">A Moving Window Technique in <span class="hlt">Parallel</span> Finite Element Time Domain Electromagnetic <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Lee, Lie-Quan; Candel, Arno; Ng, Cho; Ko, Kwok; ,</p> <p>2010-06-07</p> <p>A moving window technique for the finite element time domain (FETD) method is developed to <span class="hlt">simulate</span> the propagation of electromagnetic waves induced by the transit of a charged particle beam inside large and long structures. The window moving along with the beam in the computational domain adopts high-order finite-element basis functions through p refinement and/or a high-resolution mesh through h refinement so that a sufficient accuracy is attained with substantially reduced computational costs. Algorithms to transfer discretized fields from one mesh to another, which are the key to implementing a moving window in a finite-element unstructured mesh, are presented. Numerical experiments are carried out using the moving window technique to compute short-range wakefields in long accelerator structures. The results are compared with those obtained from the normal FETD method and the advantages of using the moving window technique are discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5473715','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5473715"><span id="translatedtitle">Dynamic analysis of the <span class="hlt">parallel</span>-plate EMP (Electromagnetic Pulse) <span class="hlt">simulator</span> using a wire-mesh approximation and the numerical electromagnetics code. Final report</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Gedney, S.D.</p> <p>1987-09-01</p> <p>The electromagnetic pulse (EMP) produced by a high-altitude nuclear blast presents a severe threat to electronic systems due to its extreme characteristics. To test the vulnerability of large systems, such as airplanes, missiles, or satellites, they must be subjected to a <span class="hlt">simulated</span> EMP environment. One type of <span class="hlt">simulator</span> that has been used to approximate the EMP environment is the Large <span class="hlt">Parallel</span>-Plate Bounded-Wave <span class="hlt">Simulator</span>. It is a guided-wave <span class="hlt">simulator</span> which has properties of a transmission line and supports a single TEM model at sufficiently low frequencies. This type of <span class="hlt">simulator</span> consists of finite-width <span class="hlt">parallel</span>-plate waveguides, which are excited by a wave launcher and terminated by a wave receptor. This study addresses the field distribution within a finite-width <span class="hlt">parallel</span>-plate waveguide that is matched to a conical tapered waveguide at either end. Characteristics of a <span class="hlt">parallel</span>-plate bounded-wave EMP <span class="hlt">simulator</span> were developed using scattering theory, thin-wire mesh approximation of the conducting surfaces, and the Numerical Electronics Code (NEC). Background is provided for readers to use the NEC as a tool in solving thin-wire scattering problems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4095900','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4095900"><span id="translatedtitle">Advancing predictive models for particulate formation in turbulent flames via massively <span class="hlt">parallel</span> direct numerical <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz</p> <p>2014-01-01</p> <p>Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical <span class="hlt">simulation</span> (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/25024412','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/25024412"><span id="translatedtitle">Advancing predictive models for particulate formation in turbulent flames via massively <span class="hlt">parallel</span> direct numerical <span class="hlt">simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz</p> <p>2014-08-13</p> <p>Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical <span class="hlt">simulation</span> (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19890016258','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19890016258"><span id="translatedtitle">Feasibility of using the Massively <span class="hlt">Parallel</span> Processor for large eddy <span class="hlt">simulations</span> and other Computational Fluid Dynamics applications</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Bruno, John</p> <p>1984-01-01</p> <p>The results of an investigation into the feasibility of using the MPP for direct and large eddy <span class="hlt">simulations</span> of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively <span class="hlt">Parallel</span> Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical <span class="hlt">simulations</span> in CFD.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/945695','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/945695"><span id="translatedtitle">SEISMIC <span class="hlt">SIMULATIONS</span> USING <span class="hlt">PARALLEL</span> COMPUTING AND THREE-DIMENSIONAL EARTH MODELS TO IMPROVE NUCLEAR EXPLOSION PHENOMENOLOGY AND MONITORING</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Rodgers, A; Matzel, E; Pasyanos, M; Petersson, A; Sjogreen, B; Bono, C; Vorobiev, O; Antoun, T; Walter, W; Myers, S; Lomov, I</p> <p>2008-07-07</p> <p>The development of accurate numerical methods to <span class="hlt">simulate</span> wave propagation in three-dimensional (3D) earth models and advances in computational power offer exciting possibilities for modeling the motions excited by underground nuclear explosions. This presentation will describe recent work to use new numerical techniques and <span class="hlt">parallel</span> computing to model earthquakes and underground explosions to improve understanding of the wave excitation at the source and path-propagation effects. Firstly, we are using the spectral element method (SEM, SPECFEM3D code of Komatitsch and Tromp, 2002) to model earthquakes and explosions at regional distances using available 3D models. SPECFEM3D <span class="hlt">simulates</span> anelastic wave propagation in fully 3D earth models in spherical geometry with the ability to account for free surface topography, anisotropy, ellipticity, rotation and gravity. Results show in many cases that 3D models are able to reproduce features of the observed seismograms that arise from path-propagation effects (e.g. enhanced surface wave dispersion, refraction, amplitude variations from focusing and defocusing, tangential component energy from isotropic sources). We are currently investigating the ability of different 3D models to predict path-specific seismograms as a function of frequency. A number of models developed using a variety of methodologies are available for testing. These include the WENA/Unified model of Eurasia (e.g. Pasyanos et al 2004), the global CUB 2.0 model (Shapiro and Ritzwoller, 2002), the partitioned waveform model for the Mediterranean (van der Lee et al., 2007) and stochastic models of the Yellow Sea Korean Peninsula region (Pasyanos et al., 2006). Secondly, we are extending our Cartesian anelastic finite difference code (WPP of Nilsson et al., 2007) to model the effects of free-surface topography. WPP models anelastic wave propagation in fully 3D earth models using mesh refinement to increase computational speed and improve memory efficiency. Thirdly</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1997PhDT.......250L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1997PhDT.......250L"><span id="translatedtitle"><span class="hlt">Simulations</span> of the loading and radiated sound of airfoils and wings in unsteady flow using computational aeroacoustics and <span class="hlt">parallel</span> computers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lockard, David Patrick</p> <p></p> <p>This thesis makes contributions towards the use of computational aeroacoustics (CAA) as a tool for noise analysis. CAA uses numerical methods to <span class="hlt">simulate</span> acoustic phenomena. CAA algorithms have been shown to reproduce wave propagation much better than traditional computational fluid dynamics (CFD) methods. In the current approach, a finite-difference, time-domain algorithm is used to <span class="hlt">simulate</span> unsteady, compressible flows. Dispersion-relation-preserving methodology is used to extend the range of frequencies that can be represented properly by the scheme. Since CAA algorithms are relatively inefficient at obtaining a steady-state solution, multigrid methods are applied to accelerate the convergence. All of the calculations are performed on <span class="hlt">parallel</span> computers. Excellent speedup ratios are obtained for the explicit, time-stepping algorithm used in this research. A common problem in the area of broadband noise is the prediction of the acoustic field generated by a vortical gust impinging on a solid body. The problem is modeled initially in two-dimensions by a flat plate experiencing a uniform mean flow with a sinusoidal, vertical velocity perturbation. Good agreement is obtained with results from semi-analytic methods for several gust frequencies. Then, a cascade of plates is used to <span class="hlt">simulate</span> a turbomachinery blade row. A new approach is used to impose the vortical disturbance inside the computational domain rather than imposing it at the computational boundary. The influence of the mean flow on the radiated noise is examined by considering NACA0012 and RAE2822 airfoils. After a steady-state is obtained from the multigrid method, the un-steady <span class="hlt">simulation</span> is used to model the vortical gust's interaction with the airfoil. The mean loading on the airfoil is shown to have a significant effect on the directivity of the sound with the strongest influence observed for high frequencies. Camber is shown to have a similar effect as the angle of attack. A three-dimensional problem</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_23 --> <div id="page_24" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="461"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013NewA...23....6D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013NewA...23....6D"><span id="translatedtitle">Swarm-NG: A CUDA library for <span class="hlt">Parallel</span> n-body Integrations with focus on <span class="hlt">simulations</span> of planetary systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Dindar, Saleh; Ford, Eric B.; Juric, Mario; Yeo, Young In; Gao, Jianwei; Boley, Aaron C.; Nelson, Benjamin; Peters, Jörg</p> <p>2013-10-01</p> <p>We present Swarm-NG, a C++ library for the efficient direct integration of many n-body systems using a Graphics Processing Unit (GPU), such as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body <span class="hlt">simulations</span> with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems with 3…15 bodies each, as is typical for the study of planetary systems. Swarm-NG <span class="hlt">parallelizes</span> the <span class="hlt">simulation</span>, including both the numerical integration of the equations of motion and the evaluation of forces using NVIDIA's "Compute Unified Device Architecture" (CUDA) on the GPU. Swarm-NG includes optimized implementations of 4th order time-symmetrized Hermite integration and mixed variable symplectic integration, as well as several sample codes for other algorithms to illustrate how non-CUDA-savvy users may themselves introduce customized integrators into the Swarm-NG framework. To optimize performance, we analyze the effect of GPU-specific parameters on performance under double precision. For an ensemble of 131072 planetary systems, each containing three bodies, the NVIDIA Tesla M2070 GPU outperforms a 6-core Intel Xeon X5675 CPU by a factor of ˜2.75. Thus, we conclude that modern GPUs offer an attractive alternative to a cluster of CPUs for the integration of an ensemble of many few-body systems. Applications of Swarm-NG include studying the late stages of planet formation, testing the stability of planetary systems and evaluating the goodness-of-fit between many planetary system models and observations of extrasolar planet host stars (e.g., radial velocity, astrometry, transit timing). While Swarm-NG focuses on the <span class="hlt">parallel</span> integration of many planetary systems, the underlying integrators could be applied to a wide variety of problems that require repeatedly integrating a set of ordinary differential equations many times using different initial conditions and/or parameter values.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27379924','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27379924"><span id="translatedtitle">Molecular Dynamics <span class="hlt">Simulation</span> Study of <span class="hlt">Parallel</span> Telomeric DNA Quadruplexes at Different Ionic Strengths: Evaluation of Water and Ion Models.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Rebič, Matúš; Laaksonen, Aatto; Šponer, Jiří; Uličný, Jozef; Mocci, Francesca</p> <p>2016-08-01</p> <p>Most molecular dynamics (MD) <span class="hlt">simulations</span> of DNA quadruplexes have been performed under minimal salt conditions using the Åqvist potential parameters for the cation with the TIP3P water model. Recently, this combination of parameters has been reported to be problematic for the stability of quadruplex DNA, especially caused by the ion interactions inside or near the quadruplex channel. Here, we verify how the choice of ion parameters and water model can affect the quadruplex structural stability and the interactions with the ions outside the channel. We have performed a series of MD <span class="hlt">simulations</span> of the human full-<span class="hlt">parallel</span> telomeric quadruplex by neutralizing its negative charge with K(+) ions. Three combinations of different cation potential parameters and water models have been used: (a) Åqvist ion parameters, TIP3P water model; (b) Joung and Cheatham ion parameters, TIP3P water model; and (c) Joung and Cheatham ion parameters, TIP4Pew water model. For the combinations (b) and (c), the effect of the ionic strength has been evaluated by adding increasing amounts of KCl salt (50, 100, and 200 mM). Two independent <span class="hlt">simulations</span> using the Åqvist parameters with the TIP3P model show that this combination is clearly less suited for the studied quadruplex with K(+) as counterions. In both <span class="hlt">simulations</span>, one ion escapes from the channel, followed by significant deformation of the structure, leading to deviating conformation compared to that in the reference crystallographic data. For the other combinations of ion and water potentials, no tendency is observed for the channel ions to escape from the quadruplex channel. In addition, the internal mobility of the three loops, torsion angles, and counterion affinity have been investigated at varied salt concentrations. In summary, the selection of ion and water models is crucial as it can affect both the structure and dynamics as well as the interactions of the quadruplex with its counterions. The results obtained with the TIP4Pew</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015CoPhC.188...21J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015CoPhC.188...21J"><span id="translatedtitle">Numerical approach to the <span class="hlt">parallel</span> gradient operator in tokamak scrape-off layer turbulence <span class="hlt">simulations</span> and application to the GBS code</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jolliet, S.; Halpern, F. D.; Loizu, J.; Mosetto, A.; Riva, F.; Ricci, P.</p> <p>2015-03-01</p> <p>This paper presents two discretisation schemes for the <span class="hlt">parallel</span> gradient operator used in scrape-off layer plasma turbulence <span class="hlt">simulations</span>. First, a simple model describing the propagation of electrostatic shear-Alfvén waves, and retaining the key elements of the <span class="hlt">parallel</span> dynamics, is used to test the accuracy of the different schemes against analytical predictions. The most promising scheme is then tested in <span class="hlt">simulations</span> of limited scrape-off layer turbulence with the flux-driven 3D fluid code GBS (Ricci et al., 2012): the new approach is successfully benchmarked against the original <span class="hlt">parallel</span> gradient discretisation implemented in GBS. Finally, GBS <span class="hlt">simulations</span> using a radially varying safety profile, which were inapplicable with the original scheme are carried out for the first time: the well-known stabilisation of resistive ballooning modes at negative magnetic shear is recovered. The main conclusion of this paper is that a simple approach to the <span class="hlt">parallel</span> gradient, namely centred finite differences in the poloidal and toroidal direction, is able to <span class="hlt">simulate</span> scrape-off layer turbulence provided that a higher resolution and higher convergence order are used.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013EGUGA..15.3010A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013EGUGA..15.3010A"><span id="translatedtitle"><span class="hlt">Simulations</span> of ice shelves in the <span class="hlt">Parallel</span> Ocean Program (POP), the ocean model of the Community Earth System Model (CESM)</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Asay-Davis, Xylar</p> <p>2013-04-01</p> <p>We present a series of <span class="hlt">simulations</span> using POP2X, a modified version of the LANL <span class="hlt">Parallel</span> Ocean Program version 2 (POP2) that includes circulations in ice-shelf cavities. The geometry of the ice-shelf/ocean interface is represented using the partial-top cells, following the approach developed by Losch (2008) for the Massachusetts Institute of Technology General Circulation Model (MITgcm). The model domain is an idealized domain reminiscent of the Ronne-Filchner Ice Shelf cavity. Our <span class="hlt">simulations</span> show relatively warm circumpolar deep water (CDW) flowing into the Filchner trough, causing a large increase in melting under the ice shelf. Using more realistic geometry and climate forcing, Helmer et al. (2012) saw a drastic increase in melting in the late twenty-first century as a result of similar processes. We show that vertical model resolution can have a strong impact on the melt rate and circulation in the vicinity of the ice shelf. The results suggest that a resolution-conscious parameterization of the buoyancy-driven plume under ice shelves is needed. This work is an early step toward coupling POP2X to the Community Ice Sheet Model (CISM) in order to perform more advanced modeling of ice-sheet/ocean interactions. Remarkable advances in ice-sheet model physics and numerical methods in recent years mean that a number of these models (e.g. the CISM; the Ice Sheet System Model; the Elmer Ice Sheet Model) have both sufficient physical accuracy and numerical scalability to be ready for inclusion in Earth System Models (ESMs). A significant stumbling block preventing full ice-sheet/ocean coupling is the inability of ocean models to handle ice-shelf cavity geometries that change in time. This is a major focus of our ongoing research.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015PMag...95.3078K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015PMag...95.3078K"><span id="translatedtitle">A <span class="hlt">parallel</span> computing tool for large-scale <span class="hlt">simulation</span> of massive fluid injection in thermo-poro-mechanical systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Karrech, Ali; Schrank, Christoph; Regenauer-Lieb, Klaus</p> <p>2015-10-01</p> <p>Massive fluid injections into the earth's upper crust are commonly used to stimulate permeability in geothermal reservoirs, enhance recovery in oil reservoirs, store carbon dioxide and so forth. Currently used models for reservoir <span class="hlt">simulation</span> are limited to small perturbations and/or hydraulic aspects that are insufficient to describe the complex thermal-hydraulic-mechanical behaviour of natural geomaterials. Comprehensive approaches, which take into account the non-linear mechanical deformations of rock masses, fluid flow in percolating pore spaces, and changes of temperature due to heat transfer, are necessary to predict the behaviour of deep geo-materials subjected to high pressure and temperature changes. In this paper, we introduce a thermodynamically consistent poromechanics formulation which includes coupled thermal, hydraulic and mechanical processes. Moreover, we propose a numerical integration strategy based on massively <span class="hlt">parallel</span> computing. The proposed formulations and numerical integration are validated using analytical solutions of simple multi-physics problems. As a representative application, we investigate the massive injection of fluids within deep formation to mimic the conditions of reservoir stimulation. The model showed, for instance, the effects of initial pre-existing stress fields on the orientations of stimulation-induced failures.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015AGUFM.H43L..05K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015AGUFM.H43L..05K"><span id="translatedtitle"><span class="hlt">Simulations</span> of flow mode distributions on rough fracture surfaces using a <span class="hlt">parallelized</span> Smoothed Particle Hydrodynamics (SPH) model</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Kordilla, J.; Shigorina, E.; Tartakovsky, A. M.; Pan, W.; Geyer, T.</p> <p>2015-12-01</p> <p>Under idealized conditions (smooth surfaces, linear relationship between Bond number and Capillary number of droplets) steady-state flow modes on fracture surfaces have been shown to develop from sliding droplets to rivulets and finally (wavy) film flow, depending on the specified flux. In a recent study we demonstrated the effect of surface roughness on droplet flow in unsaturated wide aperture fractures, however, its effect on other prevailing flow modes is still an open question. The objective of this work is to investigate the formation of complex flow modes on fracture surfaces employing an efficient three-dimensional <span class="hlt">parallelized</span> SPH model. The model is able to <span class="hlt">simulate</span> highly intermittent, gravity-driven free-surface flows under dynamic wetting conditions. The effect of surface tension is included via efficient pairwise interaction forces. We validate the model using various analytical and semi-analytical relationships for droplet and complex flow dynamics. To investigate the effect of surface roughness on flow dynamics we construct surfaces with a self-affine fractal geometry and roughness characterized by the Hurst exponent. We demonstrate the effect of surface roughness (on macroscopic scales this can be understood as a tortuosity) on the steady-state distribution of flow modes. Furthermore we show the influence of a wide range of natural wetting conditions (defined by static contact angles) on the final distribution of surface coverage, which is of high importance for matrix-fracture interaction processes.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1165754','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1165754"><span id="translatedtitle">A Scalable O(N) Algorithm for Large-Scale <span class="hlt">Parallel</span> First-Principles Molecular Dynamics <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Osei-Kuffuor, Daniel; Fattebert, Jean-Luc</p> <p>2014-01-01</p> <p>Traditional algorithms for first-principles molecular dynamics (FPMD) <span class="hlt">simulations</span> only gain a modest capability increase from current petascale computers, due to their O(N<sup>3</sup>) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix, based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent <span class="hlt">parallel</span> scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008AGUFM.S21D..06V','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008AGUFM.S21D..06V"><span id="translatedtitle">Toward Improved Nuclear Explosion Monitoring With Complete Waveform <span class="hlt">Simulations</span> Using Three-Dimensional Models and <span class="hlt">Parallel</span> Computing</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Vorobiev, O.; Antoun, T.; Rodgers, A.; Matzel, E.; Myers, S.; Walter, W.; Petersson, A.; Bono, C.; Sjogreen, B.</p> <p>2008-12-01</p> <p>Next generation methods for lowering seismic monitoring thresholds and reducing uncertainties will likely rely on complete waveform <span class="hlt">simulations</span> using three-dimensional (3D) earth models. Recent advances in numerical methods for both non-linear (shock wave) and linear (anelastic, seismic wave) propagation, improved 3D models and the steady growth of <span class="hlt">parallel</span> computing promise to improve the accuracy and efficiency of explosion <span class="hlt">simulations</span>. These methods implemented in new computer codes can advance physics-based understanding of nuclear explosions as well as the propagation effects caused by path-dependent earth structure. This presentation will summarize new 3D modeling capabilities developed to improve understanding of the seismic waves emerging from an explosion. Specifically we are working in three thrust areas: 1) computation of regional distance intermediate-period (50-10 seconds) synthetic seismograms in 3D earth models to assess the ability of these models to predict observed seismograms from well-characterized events; 2) coupling of non-linear hydrodynamic <span class="hlt">simulations</span> of explosion shock waves with an anelastic finite difference code for modeling the dependence of seismic wave observables on explosion emplacement conditions and near-source heterogeneity; and 3) implementation of surface topography in our anelastic finite difference code to include scattering and mode-conversion due to a non-planar free surface. Current 3D continental-to-global scale seismic models represent long-wavelength (greater than 100 km) heterogeneity. We are investigating the efficacy of current 3D models to predict complete intermediate (50- 10 seconds) waveforms for well-characterized events (mostly earthquakes) using the spectral element code, SPECFEM3D. Intermediate period seismograms for crustal events at regional distance are strongly impacted by path propagation effects due to laterally variable crustal and upper mantle structure. We are also modeling shock wave propagation</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/106430','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/106430"><span id="translatedtitle"><span class="hlt">Parallel</span> rendering techniques for massively <span class="hlt">parallel</span> visualization</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Hansen, C.; Krogh, M.; Painter, J.</p> <p>1995-07-01</p> <p>As the resolution of <span class="hlt">simulation</span> models increases, scientific visualization algorithms which take advantage of the large memory. and <span class="hlt">parallelism</span> of Massively <span class="hlt">Parallel</span> Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in <span class="hlt">parallel</span> rendering for polygonal primitives as well as <span class="hlt">parallel</span> volumetric techniques. This paper presents rendering algorithms, developed for massively <span class="hlt">parallel</span> processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data <span class="hlt">parallel</span> approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5827089','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5827089"><span id="translatedtitle">A graph model, ParaDiGM, and a software tool, VISA, for the representation, design, and <span class="hlt">simulation</span> of <span class="hlt">parallel</span>, distributed computations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Demeure, I.M.</p> <p>1989-01-01</p> <p>The research presented here is concerned with representation techniques and tools to support the design, prototyping, <span class="hlt">simulation</span>, and evaluation of message-based <span class="hlt">parallel</span>, distributed computations. The author describes ParaDiGM-<span class="hlt">Parallel</span>, Distributed computation Graph Model-a visual representation technique for <span class="hlt">parallel</span>, message-based distributed computations. ParaDiGM provides several views of a computation depending on the aspect of concern. It is made of two complementary submodels, the DCPG-Distributed Computing Precedence Graph-model, and the PAM-Process Architecture Model-model. DCPGs are precedence graphs used to express the functionality of a computation in terms of tasks, message-passing, and data. PAM graphs are used to represent the partitioning of a computation into schedulable units or processes, and the pattern of communication among those units. There is a natural mapping between the two models. He illustrates the utility of ParaDiGM as a representation technique by applying it to various computations (e.g., an adaptive global optimization algorithm, the client-server model). ParaDiGM representations are concise. They can be used in documenting the design and the implementation of <span class="hlt">parallel</span>, distributed computations, in describing such computations to colleagues, and in comparing and contrasting various implementations of the same computation. He then describes VISA-VISual Assistant, a software tool to support the design, prototyping, and <span class="hlt">simulation</span> of message-based <span class="hlt">parallel</span>, distributed computations. VISA is based on the ParaDiGM model. In particular, it supports the editing of ParaDiGM graphs to describe the computations of interest, and the animation of these graphs to provide visual feedback during <span class="hlt">simulations</span>. The graphs are supplemented with various attributes, <span class="hlt">simulation</span> parameters, and interpretations which are procedures that can be executed by VISA.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26625408','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26625408"><span id="translatedtitle"><span class="hlt">PDE</span> <span class="hlt">Based</span> Algorithms for Smooth Watersheds.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Hodneland, Erlend; Tai, Xue-Cheng; Kalisch, Henrik</p> <p>2016-04-01</p> <p>Watershed segmentation is useful for a number of image segmentation problems with a wide range of practical applications. Traditionally, the tracking of the immersion front is done by applying a fast sorting algorithm. In this work, we explore a continuous approach based on a geometric description of the immersion front which gives rise to a partial differential equation. The main advantage of using a partial differential equation to track the immersion front is that the method becomes versatile and may easily be stabilized by introducing regularization terms. Coupling the geometric approach with a proper "merging strategy" creates a robust algorithm which minimizes over- and under-segmentation even without predefined markers. Since reliable markers defined prior to segmentation can be difficult to construct automatically for various reasons, being able to treat marker-free situations is a major advantage of the proposed method over earlier watershed formulations. The motivation for the methods developed in this paper is taken from high-throughput screening of cells. A fully automated segmentation of single cells enables the extraction of cell properties from large data sets, which can provide substantial insight into a biological model system. Applying smoothing to the boundaries can improve the accuracy in many image analysis tasks requiring a precise delineation of the plasma membrane of the cell. The proposed segmentation method is applied to real images containing fluorescently labeled cells, and the experimental results show that our implementation is robust and reliable for a variety of challenging segmentation tasks. PMID:26625408</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1136244','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1136244"><span id="translatedtitle">User's guide of TOUGH2-EGS-MP: A Massively <span class="hlt">Parallel</span> <span class="hlt">Simulator</span> with Coupled Geomechanics for Fluid and Heat Flow in Enhanced Geothermal Systems VERSION 1.0</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao; Winterfeld, Philip H.; Zhang, Keni; Wu, Yu-Shu</p> <p>2013-12-01</p> <p>TOUGH2-EGS-MP is a <span class="hlt">parallel</span> numerical <span class="hlt">simulation</span> program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for <span class="hlt">simulation</span> of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively <span class="hlt">parallel</span> version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporated into the <span class="hlt">simulation</span>. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) <span class="hlt">simulations</span>. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the <span class="hlt">parallel</span> aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for <span class="hlt">simulating</span> fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a <span class="hlt">simulation</span>. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir <span class="hlt">simulation</span> in porous and fractured media with fluid and heat flow coupled with geomechanical effects.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMDD....8.1539M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMDD....8.1539M"><span id="translatedtitle">The <span class="hlt">Parallelized</span> Large-Eddy <span class="hlt">Simulation</span> Model (PALM) version 4.0 for atmospheric and oceanic flows: model formulation, recent developments, and future perspectives</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Maronga, B.; Gryschka, M.; Heinze, R.; Hoffmann, F.; Kanani-Sühring, F.; Keck, M.; Ketelsen, K.; Letzel, M. O.; Sühring, M.; Raasch, S.</p> <p>2015-02-01</p> <p>In this paper we present the current version of the <span class="hlt">Parallelized</span> Large-Eddy <span class="hlt">Simulation</span> Model (PALM) whose core has been developed at the Institute of Meteorology and Climatology at Leibniz Universität Hannover (Germany). PALM is a Fortran 95-based code with some Fortran 2003 extensions and has been applied for the <span class="hlt">simulation</span> of a variety of atmospheric and oceanic boundary layers for more than 15 years. PALM is optimized for use on massively <span class="hlt">parallel</span> computer architectures and was recently ported to general-purpose graphics processing units. In the present paper we give a detailed description of the current version of the model and its features, such as an embedded Lagrangian cloud model and the possibility to use Cartesian topography. Moreover, we discuss recent model developments and future perspectives for LES applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMD.....8.2515M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMD.....8.2515M"><span id="translatedtitle">The <span class="hlt">Parallelized</span> Large-Eddy <span class="hlt">Simulation</span> Model (PALM) version 4.0 for atmospheric and oceanic flows: model formulation, recent developments, and future perspectives</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Maronga, B.; Gryschka, M.; Heinze, R.; Hoffmann, F.; Kanani-Sühring, F.; Keck, M.; Ketelsen, K.; Letzel, M. O.; Sühring, M.; Raasch, S.</p> <p>2015-08-01</p> <p>In this paper we present the current version of the <span class="hlt">Parallelized</span> Large-Eddy <span class="hlt">Simulation</span> Model (PALM) whose core has been developed at the Institute of Meteorology and Climatology at Leibniz Universität Hannover (Germany). PALM is a Fortran 95-based code with some Fortran 2003 extensions and has been applied for the <span class="hlt">simulation</span> of a variety of atmospheric and oceanic boundary layers for more than 15 years. PALM is optimized for use on massively <span class="hlt">parallel</span> computer architectures and was recently ported to general-purpose graphics processing units. In the present paper we give a detailed description of the current version of the model and its features, such as an embedded Lagrangian cloud model and the possibility to use Cartesian topography. Moreover, we discuss recent model developments and future perspectives for LES applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23877155','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23877155"><span id="translatedtitle">[Design and study of <span class="hlt">parallel</span> computing environment of Monte Carlo <span class="hlt">simulation</span> for particle therapy planning using a public cloud-computing infrastructure].</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Yokohama, Noriya</p> <p>2013-07-01</p> <p>This report was aimed at structuring the design of architectures and studying performance measurement of a <span class="hlt">parallel</span> computing environment using a Monte Carlo <span class="hlt">simulation</span> for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost. PMID:23877155</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/20630685','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/20630685"><span id="translatedtitle">Development of the 3D <span class="hlt">Parallel</span> Particle-In-Cell Code IMPACT to <span class="hlt">Simulate</span> the Ion Beam Transport System of VENUS (Abstract)</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.</p> <p>2005-03-15</p> <p>The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable <span class="hlt">simulations</span> of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the <span class="hlt">parallel</span> 3D particle-in-cell code IMPACT to <span class="hlt">simulate</span> the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a <span class="hlt">parallel</span>, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to <span class="hlt">simulate</span> DC beams from the ECR ion source extraction. By using the high performance of <span class="hlt">parallel</span> supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2005AIPC..749..247Q','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2005AIPC..749..247Q"><span id="translatedtitle">Development of the 3D <span class="hlt">Parallel</span> Particle-In-Cell Code IMPACT to <span class="hlt">Simulate</span> the Ion Beam Transport System of VENUS (Abstract)</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.</p> <p>2005-03-01</p> <p>The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable <span class="hlt">simulations</span> of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the <span class="hlt">parallel</span> 3D particle-in-cell code IMPACT to <span class="hlt">simulate</span> the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a <span class="hlt">parallel</span>, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to <span class="hlt">simulate</span> DC beams from the ECR ion source extraction. By using the high performance of <span class="hlt">parallel</span> supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015JCoPh.281..844X','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015JCoPh.281..844X"><span id="translatedtitle">Accelerating population balance-Monte Carlo <span class="hlt">simulation</span> for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU <span class="hlt">parallel</span> computing</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Xu, Zuwei; Zhao, Haibo; Zheng, Chuguang</p> <p>2015-01-01</p> <p>This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) <span class="hlt">simulation</span> of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) <span class="hlt">parallel</span> computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted <span class="hlt">simulation</span> particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all <span class="hlt">simulation</span> particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance-rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC <span class="hlt">simulations</span> (based on the Markov jump model) is reduced greatly to be proportional to the number of <span class="hlt">simulation</span> particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) <span class="hlt">simulation</span>, the proposed fast PBMC is performed in each cell, and multiple cells are <span class="hlt">parallel</span> processed by multi-cores on a GPU that can implement the massively threaded data-<span class="hlt">parallel</span> tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU <span class="hlt">parallel</span> computing is as high as 200 in a case of 100 cells with 10 000 <span class="hlt">simulation</span> particles per cell). These accelerating approaches of PBMC are</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22382171','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22382171"><span id="translatedtitle">Accelerating population balance-Monte Carlo <span class="hlt">simulation</span> for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU <span class="hlt">parallel</span> computing</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Xu, Zuwei; Zhao, Haibo Zheng, Chuguang</p> <p>2015-01-15</p> <p>This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) <span class="hlt">simulation</span> of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) <span class="hlt">parallel</span> computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted <span class="hlt">simulation</span> particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all <span class="hlt">simulation</span> particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC <span class="hlt">simulations</span> (based on the Markov jump model) is reduced greatly to be proportional to the number of <span class="hlt">simulation</span> particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) <span class="hlt">simulation</span>, the proposed fast PBMC is performed in each cell, and multiple cells are <span class="hlt">parallel</span> processed by multi-cores on a GPU that can implement the massively threaded data-<span class="hlt">parallel</span> tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU <span class="hlt">parallel</span> computing is as high as 200 in a case of 100 cells with 10 000 <span class="hlt">simulation</span> particles per cell). These accelerating approaches of PBMC are</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015EGUGA..17.2764H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015EGUGA..17.2764H"><span id="translatedtitle">Evidence of downstream high speed jets by a non-stationary and rippled front of quasi-<span class="hlt">parallel</span> shock: 2-D hybrid <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hao, Yufei; Lu, Quanming; Lembege, Bertrand; Huang, Can; Wu, Mingyu; Guo, Fan; Shan, Lican; Zheng, Jian; Wang, Shui</p> <p>2015-04-01</p> <p>Experimental observations from space missions (including Cluster more recently) have clearly revealed the existence of high speed jets (HSJ) in the downstream region of the quasi-<span class="hlt">parallel</span> terrestrial bow shock. Presently, two-dimensional (2-D) hybrid <span class="hlt">simulations</span> are performed to reproduce and investigate the formation of such HSJ through a rippled quasi-<span class="hlt">parallel</span> shock front. The <span class="hlt">simulation</span> results show (i) that such shock fronts are strongly nonstationary (self reformation) along the shock normal, and (ii) that ripples are evidenced along the shock front as the upstream ULF waves (excited by interaction between incoming and reflected ions) are convected back to the front by the solar wind and contribute to the rippling formation. Then, these ripples are inherent structures of a quasi-<span class="hlt">parallel</span> shock and the self reformation of the shock is not synchronous along the surface of the shock front. As a consequence, new incoming solar wind ions interact differently at different locations along the shock surface, and some can be only deflected (instead of being decelerated) at locations where ripples are large enough to play the role of local « secondary » shock. Therefore, the ion bulk velocity is also different locally after ions are transmitted dowstream, and local high-speed jets patterns are formed somewhere downstream. After a short reminder of main quasi-<span class="hlt">parallel</span> shock features, this presentation will focus (i) on experimental observations of HSJ, (ii) on our preliminary <span class="hlt">simulation</span> results obtained on HSJ, (iii) on their relationship with local bursty patterns of (turbulent) magnetic field evidenced at the front, and (iv) on the spatial and time scales of HSJ to be compared later on with experimental observations. Such downstream HSJ are shown to be generated by the nonstationary shock front itself and do not require any upstream perturbations (such as tangential/rotational discontinuity, HFA, etc..) to be convected by the solar wind and to interact with the shock</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_24 --> <div id="page_25" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li class="active"><span>25</span></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="481"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/903411','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/903411"><span id="translatedtitle">Continuation of the Application of <span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> to Laser and Electron Transport Through Plasmas Under Conditions Relevant to ICF and SBSS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Warren B. Mori</p> <p>2007-04-20</p> <p>One of the important research questions in high energy density science (HEDS) is how intense laser and electron beams penetrate into and interact with matter. At high beam intensities the self-fields of the laser and particle beams can fully ionize matter so that beam -matter interactions become beam-plasma interactions. These interactions involve a disparity of length and time scales, and they involve interactions between particles, between particles and waves, and between waves and waves. In a plasma what happens in one region can significantly impact another because the particles are free to move and many types of waves can be excited. Therefore, <span class="hlt">simulating</span> these interactions requires tools that include wave particle interactions and that include wave nonlinearities. One methodology for studying such interactions is particle-in-cell (PIC) <span class="hlt">simulations</span>. While PIC codes include most of the relevant physics they are also the most computer intensive. However, with the development of sophisticated software and the use of massively <span class="hlt">parallel</span> computers, PIC codes can now be used to accurately study a wide range of problems in HEDS. The research in this project involved building, maintaining, and using the UCLA <span class="hlt">parallel</span> computing infrastructure. This infrastructure includes the codes OSIRIS and UPIC which have been improved or developed during this grant period. Specifically, we used this PIC infrastructure to study laser-plasma interactions relevant to future NIF experiments and high-intensity laser and beam plasma interactions relevant to fast ignition fusion. The research has led to fundamental knowledge in how to write <span class="hlt">parallel</span> PIC codes and use <span class="hlt">parallel</span> PIC <span class="hlt">simulations</span>, as well as increased the fundamental knowledge of HEDS. This fundamental knowledge will not only impact Inertial Confinement Fusion but other fields such as plasma-based acceleration and astrophysics.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014JKPS...64.1055L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014JKPS...64.1055L"><span id="translatedtitle">The effect of high-resolution <span class="hlt">parallel</span>-hole collimator materials with a pixelated semiconductor SPECT system at equivalent sensitivities: Monte Carlo <span class="hlt">simulation</span> studies</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lee, Young-Jin; Kim, Dae-Hong; Kim, Hee-Joung</p> <p>2014-04-01</p> <p>In nuclear medicine, the use of a pixelated semiconductor detector with cadmium telluride (CdTe) or cadmium zinc telluride (CdZnTe) is of growing interest for new devices. Especially, the spatial resolution can be improved by using a pixelated <span class="hlt">parallel</span>-hole collimator with equal holes and pixel sizes based on the above-mentioned detector. High-absorption and high-stopping-power pixelated <span class="hlt">parallel</span>-hole collimator materials are often chosen because of their good spatial resolution. Capturing more gamma rays, however, may result in decreased sensitivity with the same collimator geometric designs. Therefore, a trade-off between spatial resolution and sensitivity is very important in nuclear medicine imaging. The purpose of this study was to compare spatial resolutions using a pixelated semiconductor single photon emission computed tomography (SPECT) system with lead, tungsten, gold, and depleted uranium pixelated <span class="hlt">parallel</span>-hole collimators at equal sensitivity. We performed a <span class="hlt">simulation</span> study of the PID 350 (Ajat Oy Ltd., Finland) CdTe pixelated semiconductor detector (pixel size: 0.35 × 0.35 mm2) by using a Geant4 Application for Tomographic Emission (GATE) <span class="hlt">simulation</span>. Spatial resolutions were measured with different collimator materials at equivalent sensitivities. Additionally, hot-rod phantom images were acquired for each source-to-collimator distance by using a GATE <span class="hlt">simulation</span>. At equivalent sensitivities, measured averages of the full width at half maximum (FWHM) using lead, tungsten, and gold were 4.32, 2.93, and 2.23% higher than that of depleted uranium, respectively. Furthermore, for the full width at tenth maximum (FWTM), measured averages when using lead, tungsten, and gold were 6.29, 4.10, and 2.65% higher than that of depleted uranium, respectively. Although, the spatial resolution showed little differences among the different pixelated <span class="hlt">parallel</span>-hole collimator materials, lower absorption and stopping power materials such as lead and tungsten had</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2007CoPhC.177..362F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2007CoPhC.177..362F"><span id="translatedtitle">A fine grained <span class="hlt">parallel</span> smooth particle mesh Ewald algorithm for biophysical <span class="hlt">simulation</span> studies: Application to the 6-D torus QCDOC supercomputer</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Fang, Bin; Martyna, Glenn; Deng, Yuefan</p> <p>2007-08-01</p> <p>In order to model complex heterogeneous biophysical macrostructures with non-trivial charge distributions such as globular proteins in water, it is important to evaluate the long range forces present in these systems accurately and efficiently. The Smooth Particle Mesh Ewald summation technique (SPME) is commonly used to determine the long range part of electrostatic energy in large scale molecular <span class="hlt">simulations</span>. While the SPME technique does not give rise to a performance bottleneck on a single processor, current implementations of SPME on massively <span class="hlt">parallel</span>, supercomputers become problematic at large processor numbers, limiting the time and length scales that can be reached. Here, a synergistic investigation involving method improvement, <span class="hlt">parallel</span> programming and novel architectures is employed to address this difficulty. A relatively simple modification of the SPME technique is described which gives rise to both improved accuracy and efficiency on both massively <span class="hlt">parallel</span> and scalar computing platforms. Our fine grained <span class="hlt">parallel</span> implementation of the modified SPME method for the novel QCDOC supercomputer with its 6D-torus architecture is then given. Numerical tests of algorithm performance on up to 1024 processors of the QCDOC machine at BNL are presented for two systems of interest, a β-hairpin solvated in explicit water, a system which consists of 1142 water molecules and a 20 residue protein for a total of 3579 atoms, and the HIV-1 protease solvated in explicit water, a system which consists of 9331 water molecules and a 198 residue protein for a total of 29508 atoms.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/925544','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/925544"><span id="translatedtitle">TMVOC-MP: a <span class="hlt">parallel</span> numerical <span class="hlt">simulator</span> for Three-PhaseNon-isothermal Flows of Multicomponent Hydrocarbon Mixtures inporous/fractured media</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten</p> <p>2008-02-15</p> <p>TMVOC-MP is a massively <span class="hlt">parallel</span> version of the TMVOC code (Pruess and Battistelli, 2002), a numerical <span class="hlt">simulator</span> for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively <span class="hlt">parallel</span> computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted source remediation. With its sophisticated <span class="hlt">parallel</span> computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2009AGUFM.H34D..04G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2009AGUFM.H34D..04G"><span id="translatedtitle">Modeling of Calcite Precipitation Driven by Bacteria-facilitated Urea Hydrolysis in A Flow Column Using A Fully Coupled, Fully Implicit <span class="hlt">Parallel</span> Reactive Transport <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.</p> <p>2009-12-01</p> <p>One approach for immobilizing subsurface metal contaminants involves stimulating the in situ production of mineral phases that sequester or isolate contaminants. One example is using calcium carbonate to immobilize strontium. The success of such approaches depends on understanding how various processes of flow, transport, reaction and resulting porosity-permeability change couple in subsurface systems. Reactive transport models are often used for such purpose. Current subsurface reactive transport <span class="hlt">simulators</span> typically involve a de-coupled solution approach, such as operator-splitting, that solves the transport equations for components and batch chemistry sequentially, which has limited applicability for many biogeochemical processes with fast kinetics and strong medium property-reaction interactions. A massively <span class="hlt">parallel</span>, fully coupled, fully implicit reactive transport <span class="hlt">simulator</span> has been developed based on a <span class="hlt">parallel</span> multi-physics object oriented software environment computing framework (MOOSE) developed at the Idaho National Laboratory. Within this <span class="hlt">simulator</span>, the system of transport and reaction equations is solved simultaneously in a fully coupled manner using the Jacobian Free Newton-Krylov (JFNK) method with preconditioning. The <span class="hlt">simulator</span> was applied to model reactive transport in a one-dimensional column where conditions that favor calcium carbonate precipitation are generated by urea hydrolysis that is catalyzed by urease enzyme. <span class="hlt">Simulation</span> results are compared to both laboratory column experiments and those obtained using the reactive transport <span class="hlt">simulator</span> STOMP in terms of: the spatial and temporal distributions of precipitates and reaction rates and other major species in the reaction system; the changes in porosity and permeability; and the computing efficiency based on wall clock <span class="hlt">simulation</span> time.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013JInst...8.3009L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013JInst...8.3009L"><span id="translatedtitle">A Monte Carlo <span class="hlt">simulation</span> study of the feasibility of a high resolution <span class="hlt">parallel</span>-hole collimator with a CdTe pixelated semiconductor SPECT system</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lee, Y.-J.; Park, S.-J.; Lee, S.-W.; Kim, D.-H.; Kim, Y.-S.; Jo, B.-D.; Kim, H.-J.</p> <p>2013-03-01</p> <p>It is recommended that a pixelated <span class="hlt">parallel</span>-hole collimator in which the hole and pixel sizes are equal be used to improve the sensitivity and spatial resolution when using a small pixel size and a single-photon emission computed tomography (SPECT) system with pixelated semiconductor detector materials (e.g., CdTe and CZT). However, some significant problems arise in the manufacturing of a pixelated <span class="hlt">parallel</span>-hole collimator. Therefore, we sought to <span class="hlt">simulate</span> a pixelated semiconductor SPECT system with various collimator geometric designs. The purpose of this study was to compare the quality of images generated with a pixelated semiconductor SPECT system <span class="hlt">simulated</span> with pixelated <span class="hlt">parallel</span>-hole collimators of various geometric designs. The sensitivity and spatial resolution of the various collimator geometric designs with varying septal heights and hole sizes were measured. Moreover, to evaluate the overall performance of the imaging system, a hot-rod phantom was designed using a Monte Carlo <span class="hlt">simulation</span>. According to the results, the average sensitivity using a 15 mm septal height was 1.80, 2.87, and 4.16 times higher than that obtained with septal heights of 20, 25, and 30 mm, respectively. Also, the average spatial resolution using the 30 mm septal height was 44.33, 22.08, and 9.26% better than that attained with 15, 20, and 25 mm septal heights, respectively. When the results acquired with 0.3 and 0.6 mm hole sizes were compared, the average sensitivity with the 0.6 mm hole size was 3.97 times higher than that obtained with the 0.3 mm hole size, and the average spatial resolution with the 0.3 mm hole size was 45.76% better than that with the 0.6 mm hole size. We have presented the pixelated <span class="hlt">parallel</span>-hole collimators of various collimator geometric designs and evaluations. Our results showed that the effect of various collimator geometric designs can be investigated by Monte Carlo <span class="hlt">simulation</span> so as to evaluate the feasibility of a high resolution <span class="hlt">parallel</span></p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003APS..MAR.R1284P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003APS..MAR.R1284P"><span id="translatedtitle">Multi-layer <span class="hlt">Parallel</span> Beta-Sheet Structure of Amyloid Beta peptide (1-40) aggregate observed by discrete molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Peng, Shouyong; Urbanc, Brigita; Ding, Feng; Cruz, Luis; Buldyrev, Sergey; Dokholyan, Nikolay; Stanley, H. E.</p> <p>2003-03-01</p> <p>New evidence shows that oligomeric forms of Amyloid-Beta are potent neurotoxins that play a major role in neurodegeneration of Alzheimer's disease. Detailed knowledge of the structure and assembly dynamics of Amyloid-Beta is important for the development of new therapeutic strategies. Here we apply a two-atom model with Go interactions to model aggregation of Amyloid-Beta (1-40) peptides using the discrete molecular dynamics <span class="hlt">simulation</span>. At temperatures above the transition temperature from an alpha-helical to random coil, we obtain two types of <span class="hlt">parallel</span> beta-sheet structures, (a) a helical beta-sheet structure at a lower temperature and (b) a <span class="hlt">parallel</span> beta-sheet structure at a higher temperature, both with inter-sheet distance of 10 A and with free edges which possibly enable further fibrillar elongation.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27317328','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27317328"><span id="translatedtitle">Three pillars for achieving quantum mechanical molecular dynamics <span class="hlt">simulations</span> of huge systems: Divide-and-conquer, density-functional tight-binding, and massively <span class="hlt">parallel</span> computation.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi</p> <p>2016-08-01</p> <p>The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively <span class="hlt">parallel</span> program that achieves on-the-fly molecular reaction dynamics <span class="hlt">simulations</span> of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for <span class="hlt">parallelizing</span> the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. PMID:27317328</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMD.....8.3333H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMD.....8.3333H"><span id="translatedtitle">A <span class="hlt">parallelization</span> scheme to <span class="hlt">simulate</span> reactive transport in the subsurface environment with OGS#IPhreeqc 5.5.7-3.1.2</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.</p> <p>2015-10-01</p> <p>The open-source scientific software packages OpenGeoSys and IPhreeqc have been coupled to set up and <span class="hlt">simulate</span> thermo-hydro-mechanical-chemical coupled processes with simultaneous consideration of aqueous geochemical reactions faster and easier on high-performance computers. In combination with the elaborated and extendable chemical database of IPhreeqc, it will be possible to set up a wide range of multiphysics problems with numerous chemical reactions that are known to influence water quality in porous and fractured media. A flexible <span class="hlt">parallelization</span> scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand and the underlying processes such as for groundwater flow or solute transport on the other. This technical paper presents the implementation, verification, and <span class="hlt">parallelization</span> scheme of the coupling interface, and discusses its performance and precision.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PMag...93.4198P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PMag...93.4198P"><span id="translatedtitle"><span class="hlt">Parallel</span> three-dimensional Monte Carlo <span class="hlt">simulations</span> for effects of precipitates and sub-boundaries on abnormal grain growth of Goss grains in Fe-3%Si steel</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Park, Chang-Soo; Na, Tae-Wook; Kang, Jul-Ki; Lee, Byeong-Joo; Han, Chan-Hee; Hwang, Nong-Moon</p> <p>2013-12-01</p> <p>Using <span class="hlt">parallel</span> three-dimensional Monte Carlo <span class="hlt">simulations</span>, we investigated the effects of precipitates and sub-boundaries on abnormal grain growth (AGG) of Goss grains based on real orientation data of primary recrystallized Fe-3%Si steel. The <span class="hlt">simulations</span> showed that AGG occurred in the presence of precipitates which inhibited the grain growth of matrix grains, whereas it did not in the absence of precipitates. The role of precipitates in enhancing AGG is to maintain a relatively high fraction of high energy boundaries between matrix grains, which increases the probability of sub-boundary-enhanced solid-state wetting of an abnormally growing grain. The microstructure evolved by the <span class="hlt">simulation</span> could reproduce many realistic features of abnormally growing grains, such as the formation of island and peninsular grains and merging of abnormally growing grains which appeared to be separated initially on the cross-section.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19990109142&hterms=Arrhythmia&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D80%26Ntt%3DArrhythmia','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19990109142&hterms=Arrhythmia&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D80%26Ntt%3DArrhythmia"><span id="translatedtitle">The Application of a Massively <span class="hlt">Parallel</span> Computer to the <span class="hlt">Simulation</span> of Electrical Wave Propagation Phenomena in the Heart Muscle Using Simplified Models</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Karpoukhin, Mikhii G.; Kogan, Boris Y.; Karplus, Walter J.</p> <p>1995-01-01</p> <p>The <span class="hlt">simulation</span> of heart arrhythmia and fibrillation are very important and challenging tasks. The solution of these problems using sophisticated mathematical models is beyond the capabilities of modern super computers. To overcome these difficulties it is proposed to break the whole <span class="hlt">simulation</span> problem into two tightly coupled stages: generation of the action potential using sophisticated models. and propagation of the action potential using simplified models. The well known simplified models are compared and modified to bring the rate of depolarization and action potential duration restitution closer to reality. The modified method of lines is used to <span class="hlt">parallelize</span> the computational process. The conditions for the appearance of 2D spiral waves after the application of a premature beat and the subsequent traveling of the spiral wave inside the <span class="hlt">simulated</span> tissue are studied.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015JInst..10.7004L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015JInst..10.7004L"><span id="translatedtitle">Performance evaluation of high-resolution square <span class="hlt">parallel</span>-hole collimators with a CZT room temperature pixelated semiconductor SPECT system: a Monte Carlo <span class="hlt">simulation</span> study</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lee, Y.; Kang, W.</p> <p>2015-07-01</p> <p>The pixelated semiconductor based on cadmium zinc telluride (CZT) is a promising imaging device that provides many benefits compared with conventional scintillation detectors. By using a high-resolution square <span class="hlt">parallel</span>-hole collimator with a pixelated semiconductor detector, we were able to improve both sensitivity and spatial resolution. Here, we present a <span class="hlt">simulation</span> of a CZT pixleated semiconductor single-photon emission computed tomography (SPECT) system with a high-resolution square <span class="hlt">parallel</span>-hole collimator using various geometric designs of 0.5, 1.0, 1.5, and 2.0 mm X-axis hole size. We performed a <span class="hlt">simulation</span> study of the eValuator-2500 (eV Microelectronics Inc., Saxonburg, PA, U.S.A.) CZT pixelated semiconductor detector using a Geant4 Application for Tomographic Emission (GATE). To evaluate the performances of these systems, the sensitivity and spatial resolution was evaluated. Moreover, to evaluate the overall performance of the imaging system, a hot-rod phantom was designed. Our results showed that the average sensitivity of the 2.0 mm collimator X-axis hole size was 1.34, 1.95, and 3.92 times higher than that of the 1.5, 1.0, and 0.5 mm collimator X-axis hole size, respectively. Also, the average spatial resolution of the 0.5 mm collimator X-axis hole size was 28.69, 44.65, and 55.73% better than that of the 1.0, 1.5, and 2.0 mm collimator X-axis hole size, respectively. We discuss the high-resolution square <span class="hlt">parallel</span>-hole collimator of various collimator geometric designs and our evaluations. In conclusion, we have successfully designed a high-resolution square <span class="hlt">parallel</span>-hole collimator with a CZT pixelated semiconductor SPECT system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19840011285','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19840011285"><span id="translatedtitle"><span class="hlt">Parallel</span> processor engine model program</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Mclaughlin, P.</p> <p>1984-01-01</p> <p>The <span class="hlt">Parallel</span> Processor Engine Model Program is a generalized engineering tool intended to aid in the design of <span class="hlt">parallel</span> processing real-time <span class="hlt">simulations</span> of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP <span class="hlt">simulation</span> system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and <span class="hlt">simulation</span> functions are completely self-contained. A framework in which a wide variety of <span class="hlt">parallel</span> processing architectures could be evaluated and tools with which the <span class="hlt">parallel</span> implementation of a real-time <span class="hlt">simulation</span> technique could be assessed are provided.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012GMD.....5..167B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012GMD.....5..167B"><span id="translatedtitle">CELLS v1.0: updated and <span class="hlt">parallelized</span> version of an electrical scheme to <span class="hlt">simulate</span> multiple electrified clouds and flashes over large domains</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Barthe, C.; Chong, M.; Pinty, J.-P.; Bovalo, C.; Escobar, J.</p> <p>2012-01-01</p> <p>The paper describes the fully <span class="hlt">parallelized</span> electrical scheme CELLS which is suitable to <span class="hlt">simulate</span> explicitly electrified storm systems on <span class="hlt">parallel</span> computers. Our motivation here is to show that a cloud electricity scheme can be developed for use on large grids with complex terrain. Large computational domains are needed to perform real case meteorological <span class="hlt">simulations</span> with many independent convective cells. The scheme computes the bulk electric charge attached to each cloud particle and hydrometeor. Positive and negative ions are also taken into account. Several parametrizations of the dominant non-inductive charging process are included and an inductive charging process as well. The electric field is obtained by inverting the Gauss equation with an extension to terrain-following coordinates. The new feature concerns the lightning flash scheme which is a simplified version of an older detailed sequential scheme. Flashes are composed of a bidirectional leader phase (vertical extension from the triggering point) and a phase obeying a fractal law (with horizontal extension on electrically charged zones). The originality of the scheme lies in the way the branching phase is treated to get a <span class="hlt">parallel</span> code. The complete electrification scheme is tested for the 10 July 1996 STERAO case and for the 21 July 1998 EULINOX case. Flash characteristics are analysed in detail and additional sensitivity experiments are performed for the STERAO case. Although the <span class="hlt">simulations</span> were run for flat terrain conditions, they show that the model behaves well on multiprocessor computers. This opens a wide area of application for this electrical scheme with the next objective of running real meterological case on large domains.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011GMDD....4.2849B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011GMDD....4.2849B"><span id="translatedtitle">CELLS v1.0: updated and <span class="hlt">parallelized</span> version of an electrical scheme to <span class="hlt">simulate</span> multiple electrified clouds and flashes over large domains</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Barthe, C.; Chong, M.; Pinty, J.-P.; Bovalo, C.; Escobar, J.</p> <p>2011-10-01</p> <p>The paper describes the fully <span class="hlt">parallelized</span> electrical scheme CELLS which is suitable to <span class="hlt">simulate</span> explicitly electrified storm systems on <span class="hlt">parallel</span> computers. Our motivation here is to show that a cloud electricity scheme can be developed for use on large grids with complex terrain. Large computational domains are needed to perform real case meteorological <span class="hlt">simulations</span> with many independent convective cells. The scheme computes the bulk electric charge attached to each cloud particle. Positive and negative ions are also taken into account. Several parametrizations of the dominant non-inductive charging process are included and an inductive charging process as well. The electric field is obtained by inverting the Gauss equation with an extension to terrain-following coordinates. The new feature concerns the lightning flash scheme which is a simplified version of an older detailed sequential scheme. Flashes are composed of a bidirectional leader phase (vertical extension from the triggering point) and a phase obeying a fractal law (with horizontal extension on electrically charged zones). The originality of the scheme lies in the way the branching phase is treated to get a <span class="hlt">parallel</span> code. The comp