While these samples are representative of the content of Science.gov,

they are not comprehensive nor are they the most current set.

We encourage you to perform a real-time search of Science.gov

to obtain the most current and comprehensive results.

Last update: August 15, 2014.

1

Automatic parallelization of discrete event simulation programs

Developing parallel discrete event simulation code is currently very time-consuming and requires a high level of expertise. Few tools, if any, exist to aid conversion of existing sequential simulation programs to efficient parallel code. Traditional approaches to automatic parallelization, as used in many parallelizing compilers, are not well-suited for this application because of the irregular, data dependent nature of discrete

Jya-Jang Tsai; Richard M. Fujimoto

1993-01-01

2

Automatic Parallelization of Discrete Event Simulation Programs

Developing parallel discrete event simulation code is currently very time-consuming and requires a high level of expertise. Few tools, if any, exist to aid conversion of existing sequential simulation programs to efficient parallel code. Traditional approaches to automatic parallelization, as used in many parallelizing compilers, are not well-suited for this application because of the irregular, data dependent nature of discrete

Jya-Jang Tsai; Richard M. Fujimoto

1993-01-01

3

Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

Steinman, Jeffrey S.

1992-01-01

4

Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

5

Cost of Conservative Synchronization in Parallel Discrete Event Simulations.

National Technical Information Service (NTIS)

The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic ...

D. M. Nicol

1990-01-01

6

Optimal topology for parallel discrete-event simulations

NASA Astrophysics Data System (ADS)

The effect of shortcuts on the task completion landscape in parallel discrete-event simulation (PDES) is investigated. The morphology of the task completion landscape in PDES is known to be described well by the Langevin-type equation for nonequillibrium interface growth phenomena, such as the Kardar-Parisi-Zhang equation. From the numerical simulations, we find that the root-mean-squared fluctuation of task completion landscape, W(t,N), scales as W(t??,N)~N when the number of shortcuts, ?, is finite. Here N is the number of nodes. This behavior can be understood from the mean-field type argument with effective defects when ? is finite. We also study the behavior of W(t,N) when ? increases as N increases and provide a criterion to design an optimal topology to achieve a better synchronizability in PDES.

Kim, Yup; Kim, Jung-Hwa; Yook, Soon-Hyung

2011-05-01

7

The cost of conservative synchronization in parallel discrete event simulations

NASA Technical Reports Server (NTRS)

The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.

Nicol, David M.

1990-01-01

8

Synchronous Parallel System for Emulation and Discrete Event Simulation

NASA Technical Reports Server (NTRS)

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

Steinman, Jeffrey S. (Inventor)

2001-01-01

9

The effects of parallel processing architectures on discrete event simulation

As systems become more complex, particularly those containing embedded decision algorithms, mathematical modeling presents a rigid framework that often impedes representation to a sufficient level of detail. Using discrete event simulation, one can build models that more closely represent physical reality, with actual algorithms incorporated in the simulations. Higher levels of detail increase simulation run time. Hardware designers have succeeded

William Cave; Edward Slatt; Robert E. Wassmer

2005-01-01

10

Scalable Algorithms for Parallel Discrete Event Simulation Systems in Multicore Environments.

National Technical Information Service (NTIS)

This project investigated techniques for improving performance and scalability of parallel discrete event simulation systems on multicore and many core processors and clusters of multicores. Specifically, we designed and optimized a multithreaded version ...

D. Ponomarev N. Abu-Ghazaleh

2013-01-01

11

PARALLEL DISCRETE EVENT SIMULATION FOR BUSINESS PROCESS RE-ENGINEERING

This paper describes the development of a parallel simulation model for analysis of a business process system. Today's increasing complexity of such systems, together with the need for more accurate results means that the task of simulation of business processes frequently outgrows the computational power of economically viable single process systems. One solution is to partition the computational task into

BEHROUZ ZAREI

12

Discrete event system simulation

This book provides a basic treatment of one of the most widely used operations research tools: discrete-event simulation. Prerequisites are calculus, probability theory, and elementary statistics. Contents, abridged: Introduction to discrete-event system simulation. Mathematical and statistical models. Random numbers. Analysis of simulation data. Index.

J. Banks; J. S. Carson

1984-01-01

13

On the Trade-off between Time and Space in Optimistic Parallel Discrete-Event Simulation

Optimistically synchronized parallel discrete-event simulation is based on the use of communicating sequential processes. Optimistic syn- chronization means that the processes execute under the assumption that synchronization is fortuitous. Periodic checkpointing of the state of a pro- cess allows the process to roll back to an earlier state when synchronization errors occur. This paper examines the effects of varying the

Bruno R. Preiss; Ian D. MacIntyre; Wayne M. Loucks

1992-01-01

14

Surface Growth Modeling of Load Balancing in Parallel Discrete Event Simulations (PDES)

NASA Astrophysics Data System (ADS)

We study a non-equilibrium surface growth model of load balancing for conservative Parallel Discrete Event Simulations (PDES) [Korniss et al., Science 299, 677 (2003); Guclu et al., Phys. Rev. E 73, 066115 (2006)]. Load balancing improves the performance of the parallel simulations by distributing the work load over all processors evenly. These models for static load balancing are in the Kardar-Parisi-Zhang (KPZ) universality class, with the KPZ process often mixed with a Random Deposition (RD) process [Kolakowska et al., Phys. Rev. E 73, 011603 (2006)]. We study how the utilization and the desynchronization behave when the load changes randomly during the simulation. We compare the static and dynamic load balancing results for the models of PDES. The underlying framework proposed in [L. N. Shchur and M. A. Novotny Phys. Rev. E 70, 026703 (2004)], is that the Local Simulated time (LST) is associated with the nodes and not with the processing elements.

Verma, Poonam; Novotny, Mark

2007-03-01

15

SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

Steinman, Jeff S.

1992-01-01

16

Application of Parallel Discrete Event Simulation to the Space Surveillance Network

NASA Astrophysics Data System (ADS)

In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to thousands of nodes. There are many PDES platforms we might have used, but two requirements led us to build our own. First, the parallel components of our SSN simulation are coded and maintained by separate teams, so TESSA is designed to support transparent coupling and interoperation of separately compiled components written in any of six programming languages. Second, conventional PDES simulators are designed so that while the parallel components run concurrently, each of them is internally sequential, whereas for TESSA we needed to support MPI-based parallelism within each component. The TESSA simulator is still a work in progress and currently has some significant limitations. The paper describes those as well.

Jefferson, D.; Leek, J.

2010-09-01

17

Algorithmic Scalability of Constrained Parallel Conservative Discrete Event Simulations in 1-D

NASA Astrophysics Data System (ADS)

We consider the parallel simulation of asynchronous systems employing processing elements that are arranged on a ring and that communicate only among nearest neighbors. Each processor has its own local simulated time. A processor is allowed to update its local time if it is guaranteed not to violate causality. Otherwise, it remains idle. At each simultaneously performed parallel update step, the utilization is defined as the fraction of non-idling processors and the simulated time horizon (STH) is defined as the set of local times generated at all sites. In the limit of large systems utilization scales with the system size; however, the width of the STH (i.e., measurement phase of the algorithm) diverges (Korniss et al, PRL 84). In this presentation, we introduce a technique of a moving window, which modifies the algorithm by imposing a global constraint on the times generated at sites locally. Our studies indicate that discrete event simulations with the constraint of a moving window are scalable. This result may find numerous applications in modeling the evolution of general spatially extended interacting systems, including dynamic Monte Carlo studies. This work is supported in part by NSF grant DMR-0113049.

Kolakowska, Alice; Novotny, Mark; Korniss, Gyorgy

2002-08-01

18

Multi-core processors are commonly available now, but most traditional computer architectural simulators still use single-thread execution. In this paper we use parallel discrete event simulation (PDES) to speedup a cycle-accurate event-driven many-core processor simulator. Evaluation against the sequential version shows that the parallelized one achieves an average speedup of 10.9x (up to 13.6x) running SPLASH-2 kernel on a 16-core host

Huiwei Lv; Yuan Cheng; Lu Bai; Mingyu Chen; Dongrui Fan; Ninghui Sun

2010-01-01

19

Writing parallel, discrete-event simulations in ModSim: Insight and experience

The Time Warp Operating System (TWOS) has been the focus of much research in parallel simulation. A new language, called ModSim, has been developed for use in conjunction with TWOS. The coupling of ModSim and TWOS provides a tool to construct large, complex simulation models that will run on several parallel and distributed computer systems. As part of the Griffin Project'' underway here at Los Alamos National Laboratory, there is strong interest in assessing the coupling of ModSim and TWOS from an application-oriented perspective. To this end, a key component of the Eagle combat simulation has been implemented in ModSim for execution on TWOS. In this paper brief overviews of ModSim and TWOS will be presented. Finally, the compatibility of the computational models presented by the language and the operating system will be examined in light of experience gained to date. 18 refs., 4 figs.

Rich, D.O.; Michelsen, R.E.

1989-09-11

20

Scalability of Parallel Discrete-Event Algorithms

NASA Astrophysics Data System (ADS)

We continue our previous studies [1-4] of scalability of parallel discrete event simulations (PDES). Previously, ignoring communication overhead, we have shown that ALL short- ranged PDES can be made perfectly scalable [2]. These works simulated the PDES simulations and used ideas of non- equilibrium surface growth to analyze the virtual time surfaces of PDES. We present results that expand on these results in two ways. First, we also include communication times in the simulations. For short-ranged simulations we observe perfect scalability including communication times. Second, we study relaxation of the short-ranged requirement. Rather we limit the number of sites each processing element can communicate with. Hence we study scalability of systems with sparse communication patterns.[1] G. Korniss et al, Phys. Rev. Lett., vol. 84, p. 1341 (2000).[2] G. Korniss et al, Science, vol. 299, p. 677 (2003).[3] A. Kolakowska, et al, Phys. Rev. E, vol. 68, 046705 (2003).[4] L.N. Shchur and M.A. Novotny, Phys. Rev. E, vol. 70, 026703 (2004).

Novotny, M. A.; Korniss, Gyorgy

2005-03-01

21

A discrete event method for wave simulation

This article describes a discrete event interpretation of the finite difference time domain (FDTD) and digital wave guide network (DWN) wave simulation schemes. The discrete event method is formalized using the discrete event system specification (DEVS). The scheme is shown to have errors that are proportional to the resolution of the spatial grid. A numerical example demonstrates the relative efficiency of the scheme with respect to FDTD and DWN schemes. The potential for the discrete event scheme to reduce numerical dispersion and attenuation errors is discussed.

Nutaro, James J [ORNL

2006-01-01

22

Distributed discrete-event simulation computing surveys

Traditional discrete-event simulations employ an inherently sequential algorithm. In practice, simulations of large systems are limited by this sequentiality, because only a modest number of events can be simulated. Distributed discrete-event simulation (carried out on a network of processors with asynchronous message-communicating capabilities) is proposed as an alternative; it may provide better performance by partitioning the simulation among the component

Jayadev Misra

1986-01-01

23

Distributed discrete event simulation. Final report

The presentation given here is restricted to discrete event simulation. The complexity of and time required for many present and potential discrete simulations exceeds the reasonable capacity of most present serial computers. The desire, then, is to implement the simulations on a parallel machine. However, certain problems arise in an effort to program the simulation on a parallel machine. In one category of methods deadlock care arise and some method is required to either detect deadlock and recover from it or to avoid deadlock through information passing. In the second category of methods, potentially incorrect simulations are allowed to proceed. If the situation is later determined to be incorrect, recovery from the error must be initiated. In either case, computation and information passing are required which would not be required in a serial implementation. The net effect is that the parallel simulation may not be much better than a serial simulation. In an effort to determine alternate approaches, important papers in the area were reviewed. As a part of that review process, each of the papers was summarized. The summary of each paper is presented in this report in the hopes that those doing future work in the area will be able to gain insight that might not otherwise be available, and to aid in deciding which papers would be most beneficial to pursue in more detail. The papers are broken down into categories and then by author. Conclusions reached after examining the papers and other material, such as direct talks with an author, are presented in the last section. Also presented there are some ideas that surfaced late in the research effort. These promise to be of some benefit in limiting information which must be passed between processes and in better understanding the structure of a distributed simulation. Pursuit of these ideas seems appropriate.

De Vries, R.C. [Univ. of New Mexico, Albuquerque, NM (United States). EECE Dept.

1988-02-01

24

C based discrete event simulation support system

The C programming language is gaining in popularity with professional programmers because of its powerful constructs, versatility and portability. The state of development of C compilers has advanced to the degree that applications generally require less memory and execute more quickly when coded in C compared to other languages such as BASIC, FORTRAN, or Pascal. DISC (DIscrete event Simulation in

Sathyakumar Selvaraj; Eric L. Blair; Milton L. Smith; William M. Marcy

1988-01-01

25

Using Java for Discrete Event Simulation

A discrete event simulation library has been written in the Javalanguage, based on the SIM++ library for C++. This allows livesimulations to be incorporated into web pages and run remotely. Thispaper presents a performance comparison with the equivalent C++simulator and discusses advantages and disadvantages of Java as asimulation language.1 MotivationThe primary purpose of writing simulations in the Java language was

R. Mcnab; F. w. Howell

1996-01-01

26

Distributed simulation offers a faster means of executing complex and time-consuming discrete event simulations than does conventional simulation. However, many problems remain to be solved before distributed simulation can become commonplace. Exploitation of the natural parallelism in hierarchical discrete event models is considered in this article, and a methodology for finding an optimal mapping of a given hierarchical model structure

Bernard P. Zeigler; Guoqing Zhang

1990-01-01

27

A Message-Based Approach to Discrete-Event Simulation

This paper develops a message-based approach to discrete-event simulation. Although message-based simulators have the same expressive power as traditional discrete-event simulation lanuages, they provide a more natural environment for simulating distributed systems. In message-based simulations, a physical system is modeled by a set of message-communicating processes. The events in the system are modeled by message-communications. The paper proposes the entity

Rajive Bagrodia; K. Mani Chandy; Jayadev Misra

1987-01-01

28

Maisie: A Language for the Design of Efficient Discrete-Event Simulations

Maisie is a C-based discrete-event simulation language that was designed to cleanly separatea simulation model from the underlying algorithm (sequential or parallel) used for the executionof the model. With few modifications, a Maisie program may be executed using a sequentialsimulation algorithm, a parallel conservative algorithm or a parallel optimistic algorithm. Thelanguage constructs allow the runtime system to implement optimizations that

Rajive L. Bagrodia; Wen-toh Liao

1994-01-01

29

Non-item based discrete-event simulation tools

Discrete event simulation has traditionally been defined by items (or entities). This modeling paradigm has served the simulation industry well, but falls far short for many industries in which the parts\\/pieces mindset simply does not accurately portray their particular processes. For the last ten years Simulation Dynamics has been working with industries where the item paradigm falls short as a

R. A. Phelps; D. J. Parsons; A. J. Siprelle

2002-01-01

30

Discrete-Event Simulation and the Event Horizon Part 2: Event List Management

The event horizon is a very important concept that applies to both parallel and sequential discrete-event simulations. By exploiting the event horizon, parallel simulations can processes events optimistically in a risk-free manner (i.e., without requiring antimessages) using adaptable \\

Jeffrey S. Steinman

1996-01-01

31

Discrete-event simulation for evaluating virtual organizations

Outsourcing is a major driver in the growth of virtual organizations (VOs) in the manufacturing domain. These outsourcing networks can be classified as VOs as they are created from different organizational entities for a specific purpose and exist for a specified period of time. This paper proposes the use of discrete event simulation (DES) as a means of evaluating proposed

P. Liston; J. Byrne; C. Heavey; P. J. Byrne

2008-01-01

32

Optimization of Operations Resources via Discrete Event Simulation Modeling

NASA Technical Reports Server (NTRS)

The resource levels required for operation and support of reusable launch vehicles are typically defined through discrete event simulation modeling. Minimizing these resources constitutes an optimization problem involving discrete variables and simulation. Conventional approaches to solve such optimization problems involving integer valued decision variables are the pattern search and statistical methods. However, in a simulation environment that is characterized by search spaces of unknown topology and stochastic measures, these optimization approaches often prove inadequate. In this paper, we have explored the applicability of genetic algorithms to the simulation domain. Genetic algorithms provide a robust search strategy that does not require continuity and differentiability of the problem domain. The genetic algorithm successfully minimized the operation and support activities for a space vehicle, through a discrete event simulation model. The practical issues associated with simulation optimization, such as stochastic variables and constraints, were also taken into consideration.

Joshi, B.; Morris, D.; White, N.; Unal, R.

1996-01-01

33

On constructing optimistic simulation algorithms for the discrete event system specification

This article describes a Time Warp simulation algorithm for discrete event models that are described in terms of the Discrete Event System Specification (DEVS). The article shows how the total state transition and total output function of a DEVS atomic model can be transformed into an event processing procedure for a logical process. A specific Time Warp algorithm is constructed around this logical process, and it is shown that the algorithm correctly simulates a DEVS coupled model that consists entirely of interacting atomic models. The simulation algorithm is presented abstractly; it is intended to provide a basis for implementing efficient and scalable parallel algorithms that correctly simulate DEVS models.

Nutaro, James J [ORNL

2008-01-01

34

Reversible Discrete Event Formulation and Optimistic Parallel Execution of Vehicular Traffic Models

Vehicular traffic simulations are useful in applications such as emergency planning and traffic management. High speed of traffic simulations translates to speed of response and level of resilience in those applications. Discrete event formulation of traffic flow at the level of individual vehicles affords both the flexibility of simulating complex scenarios of vehicular flow behavior as well as rapid simulation time advances. However, efficient parallel/distributed execution of the models becomes challenging due to synchronization overheads. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Our approach resolves the challenges that arise in parallel execution of microscopic, vehicular-level models of traffic. We apply a reverse computation-based optimistic execution approach to address the parallel synchronization problem. This is achieved by formulating a reversible version of a discrete event model of vehicular traffic, and by utilizing this reversible model in an optimistic execution setting. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed close to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance. The benefits of optimistic execution are demonstrated, including a speed up of nearly 20 on 32 processors observed on a vehicular network of over 65,000 intersections and over 13 million vehicles.

Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

2009-01-01

35

Reversible Parallel Discrete Event Formulation of a TLM-based Radio Signal Propagation Model

Radio signal strength estimation is essential in many applications, including the design of military radio communications and industrial wireless installations. For scenarios with large or richly- featured geographical volumes, parallel processing is required to meet the memory and computa- tion time demands. Here, we present a scalable and efficient parallel execution of the sequential model for radio signal propagation recently developed by Nutaro et al. Starting with that model, we (a) provide a vector-based reformulation that has significantly lower computational overhead for event handling, (b) develop a parallel decomposition approach that is amenable to reversibility with minimal computational overheads, (c) present a framework for transparently mapping the conservative time-stepped model into an optimistic parallel discrete event execution, (d) present a new reversible method, along with its analysis and implementation, for inverting the vector-based event model to be executed in an optimistic parallel style of execution, and (e) present performance results from implementation on Cray XT platforms. We demonstrate scalability, with the largest runs tested on up to 127,500 cores of a Cray XT5, enabling simulation of larger scenarios and with faster execution than reported before on the radio propagation model. This also represents the first successful demonstration of the ability to efficiently map a conservative time-stepped model to an optimistic discrete-event execution.

Seal, Sudip K [ORNL; Perumalla, Kalyan S [ORNL

2011-01-01

36

Discrete event simulation in an artificial intelligence environment: Some examples

Several Los Alamos National Laboratory (LANL) object-oriented discrete-event simulation efforts have been completed during the past three years. One of these systems has been put into production and has a growing customer base. Another (started two years earlier than the first project) was completed but has not yet been used. This paper will describe these simulation projects. Factors which were pertinent to the success of the one project, and to the failure of the second project will be discussed (success will be measured as the extent to which the simulation model was used as originally intended). 5 figs.

Roberts, D.J.; Farish, T.

1991-01-01

37

Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallel execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.

Perumalla, Kalyan S [ORNL; Seal, Sudip K [ORNL

2011-01-01

38

Enhancing Complex System Performance Using Discrete-Event Simulation

In this paper, we utilize discrete-event simulation (DES) merged with human factors analysis to provide the venue within which the separation and deconfliction of the system/human operating principles can occur. A concrete example is presented to illustrate the performance enhancement gains for an aviation cargo flow and security inspection system achieved through the development and use of a process DES. The overall performance of the system is computed, analyzed, and optimized for the different system dynamics. Various performance measures are considered such as system capacity, residual capacity, and total number of pallets waiting for inspection in the queue. These metrics are performance indicators of the system's ability to service current needs and respond to additional requests. We studied and analyzed different scenarios by changing various model parameters such as the number of pieces per pallet ratio, number of inspectors and cargo handling personnel, number of forklifts, number and types of detection systems, inspection modality distribution, alarm rate, and cargo closeout time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted performance measures identified effective ways to meet inspection requirements while maintaining or reducing overall operational cost and eliminating any shipping delays associated with any proposed changes in inspection requirements. With this understanding effective operational strategies can be developed to optimally use personnel while still maintaining plant efficiency, reducing process interruptions, and holding or reducing costs.

Allgood, Glenn O [ORNL; Olama, Mohammed M [ORNL; Lake, Joe E [ORNL

2010-01-01

39

The main problem associated with comparing distributed discrete event simulation mechanisms is the need to base the comparisons on some common problem specification. This paper presents a specification strategy and language which allows the same simulation problem specification to be used for both distributed discrete event simulation mechanisms as well as the traditional single event list mechanism. This paper includes:

Bruno R. Preiss; Wayne M. Loucks; V. Carl Hamacher

1988-01-01

40

Ordinal optimization concentrates on finding a subset of good designs, by approximately evaluating a parallel set of designs, and reduces the required simulation time dramatically for discrete-event simulation and optimization. The estimation of the confidence probability (CP) that the selected designs contain at least one good design is crucial to ordinal optimization. However, it is very difficult to estimate this

Chun-Hung Chen

1996-01-01

41

Graphical object-oriented discrete-event simulation system

Building graphical models and running simulations directly on the models provides an attractive way of performing simulation. Most existing network-based simulation systems use graphical models to represent real world systems; however, users have to create separate simulation input files (usually text) to run the simulation. Inconsistency and mistakes between the two representations of a model are common and difficult to

Liang Y. Liu; Photios G. Ioannou

1992-01-01

42

Using a discrete event simulation program in an engineering probability and statistics course

The discrete event simulation software QUEST is being used in a sophomore level engineering probability and statistics course. This course is required for all engineering students in the school. Students learn about the basics of the QUEST operating environment and how to assemble and run simple simulation models. For more complex systems, the students modify and run existing simulation models

Robert Van Til; Michael Banachowski; Christian Wagner; Sankar Sengupta; Patrick Hillberg

2009-01-01

43

Non-item based tools: non-item based discrete-event simulation tools

Discrete event simulation has traditionally been defined by items (or entities). This modeling paradigm has served the simulation industry well, but falls far short for many industries in which the parts \\/ pieces mindset simply does not accurately portray their particular processes. For the last ten years Simulation Dynamics has been working with industries where the item paradigm falls short

Richard A. Phelps; David J. Parsons; Andrew J. Siprelle

2002-01-01

44

Simjava: A Discrete Event Simulation Library For Java

simjava is a toolkit for building working modelsof complex systems. It is based around a discreteevent simulation kernel and includes facilitiesfor representing simulation objects as animatedicons on screen.simjava simulations may beincorporated as "live diagrams" into web documents.This paper describes the design, componentmodel, applications and future of simjava .1 IntroductionOur motivation for writing simulations in Javawas to allow "live diagrams" to

Fred Howell; Ross Mcnab

1998-01-01

45

Discrete-event based simulation conceptual modeling of systems biology

The protein production from DNA to protein via RNA is a very complicated process, which could be called central dogma. In this paper, we used event based simulation to model, simulate, analyze and specify the three main processes that are involved in the process of protein production: replication, transcription, and translation. The whole control flow of event-based simulation is composed

Joe W. Yeol; Issac Barjis; Yeong S. Ryu; Joseph Barjis

2005-01-01

46

A graphical, intelligent interface for discrete-event simulations

This paper will present a prototype of anengagement analysis simulation tool. This simulation environment is to assist a user (analyst) in performing sensitivity analysis via the repeated execution of user-specified engagement scenarios. This analysis tool provides an intelligent front-end which is easy to use and modify. The intelligent front-end provides the capabilities to assist the use in the selection of appropriate scenario value. The incorporated graphics capabilities also provide additional insight into the simulation events as they are )openreverse arrowquotes)unfolding.)closingreverse arrowquotes) 4 refs., 4 figs.

Michelsen, C.; Dreicer, J.; Morgeson, D.

1988-01-01

47

Business process modelling and analysis using discrete-event simulation

Globalisation and competitive pressure urge many organisations to radically change business processes. Although this approach can provide significant benefits such as reducing costs or improving efficiency, there are substantial risks associated with it. Using simulation for modelling and analysis of business processes can reduce that risk and increase the chance for success of Business Process Re-engineering projects. This paper investigates

Vlatka Hlupic; Stewart Robinson

1998-01-01

48

A history of discrete event simulation programming languages

The history of simulation programming languages is organized as a progression in periods of similar developments. The five periods, spanning 1955-1986, are labeled: The Period Search (1955-1960); The Advent (1961-1965); The Formative Period (1966-1970); The Expansional Period (1971-1978), and The Period of Consolidation and Regeneration (1979-1986). The focus is on recognizing the people and places that have made important contributions

Richard E. Nance

1993-01-01

49

Improving the rigor of discrete-event simulation in logistics and supply chain research

Purpose – The purpose of this paper is to present an eight-step simulation model development process (SMDP) for the design, implementation, and evaluation of logistics and supply chain simulation models, and to identify rigor criteria for each step. Design\\/methodology\\/approach – An extensive review of literature is undertaken to identify logistics and supply chain studies that employ discrete-event simulation modeling. From

Ila Manuj; John T. Mentzer; Melissa R. Bowers

2009-01-01

50

Discrete event simulation of the Defense Waste Processing Facility (DWPF) analytical laboratory

A discrete event simulation of the Savannah River Site (SRS) Defense Waste Processing Facility (DWPF) analytical laboratory has been constructed in the GPSS language. It was used to estimate laboratory analysis times at process analytical hold points and to study the effect of sample number on those times. Typical results are presented for three different simultaneous representing increasing levels of complexity, and for different sampling schemes. Example equipment utilization time plots are also included. SRS DWPF laboratory management and chemists found the simulations very useful for resource and schedule planning.

Shanahan, K.L.

1992-02-01

51

Using discrete event simulation to design a more efficient hospital pharmacy for outpatients.

We present the findings of a discrete event simulation study of the hospital pharmacy outpatient dispensing systems at two London hospitals. Having created a model and established its face validity, we tested scenarios to estimate the likely impact of changes in prescription workload, staffing levels and skill-mix, and utilisation of the dispensaries' automatic dispensing robots. The scenarios were compared in terms of mean prescription turnaround times and percentage of prescriptions completed within 45 min. The findings are being used to support business cases for changes in staffing levels and skill-mix in response to changes in workload. PMID:21344201

Reynolds, Matthew; Vasilakis, Christos; McLeod, Monsey; Barber, Nicholas; Mounsey, Ann; Newton, Sue; Jacklin, Ann; Franklin, Bryony Dean

2011-09-01

52

Discrete-event simulation for the design and evaluation of physical protection systems

This paper explores the use of discrete-event simulation for the design and control of physical protection systems for fixed-site facilities housing items of significant value. It begins by discussing several modeling and simulation activities currently performed in designing and analyzing these protection systems and then discusses capabilities that design/analysis tools should have. The remainder of the article then discusses in detail how some of these new capabilities have been implemented in software to achieve a prototype design and analysis tool. The simulation software technology provides a communications mechanism between a running simulation and one or more external programs. In the prototype security analysis tool, these capabilities are used to facilitate human-in-the-loop interaction and to support a real-time connection to a virtual reality (VR) model of the facility being analyzed. This simulation tool can be used for both training (in real-time mode) and facility analysis and design (in fast mode).

Jordan, S.E.; Snell, M.K.; Madsen, M.M. [Sandia National Labs., Albuquerque, NM (United States); Smith, J.S.; Peters, B.A. [Texas A and M Univ., College Station, TX (United States). Industrial Engineering Dept.

1998-08-01

53

A Framework for the Optimization of Discrete-Event Simulation Models

NASA Technical Reports Server (NTRS)

With the growing use of computer modeling and simulation, in all aspects of engineering, the scope of traditional optimization has to be extended to include simulation models. Some unique aspects have to be addressed while optimizing via stochastic simulation models. The optimization procedure has to explicitly account for the randomness inherent in the stochastic measures predicted by the model. This paper outlines a general purpose framework for optimization of terminating discrete-event simulation models. The methodology combines a chance constraint approach for problem formulation, together with standard statistical estimation and analyses techniques. The applicability of the optimization framework is illustrated by minimizing the operation and support resources of a launch vehicle, through a simulation model.

Joshi, B. D.; Unal, R.; White, N. H.; Morris, W. D.

1996-01-01

54

DeMO: An Ontology for Discrete-event Modeling and Simulation

Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community.

Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

2011-01-01

55

Parallel processing and simulation

Summary form only given. High instruction execution rates may be achieved through a vorpal of inexpensive processors operating in parallel. The harnessing of this raw computing power to discrete event simulation applications is an active area of research. Three major approaches to the problem, of assigning computational tasks to processing elements may be identified: (1) model based assignment, (2) local

John C. Comfort; David Jefferson; Y. V. Reddy; Paul Reynolds; Sallie Sheppard

1983-01-01

56

Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling

NASA Technical Reports Server (NTRS)

NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand DES capabilities to address KSC's planning needs.

Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.

2010-01-01

57

Developing Flexible Discrete Event Simulation Models in an Uncertain Policy Environment

NASA Technical Reports Server (NTRS)

On February 1st, 2010 U.S. President Barack Obama submitted to Congress his proposed budget request for Fiscal Year 2011. This budget included significant changes to the National Aeronautics and Space Administration (NASA), including the proposed cancellation of the Constellation Program. This change proved to be controversial and Congressional approval of the program's official cancellation would take many months to complete. During this same period an end-to-end discrete event simulation (DES) model of Constellation operations was being built through the joint efforts of Productivity Apex Inc. (PAl) and Science Applications International Corporation (SAIC) teams under the guidance of NASA. The uncertainty in regards to the Constellation program presented a major challenge to the DES team, as to: continue the development of this program-of-record simulation, while at the same time remain prepared for possible changes to the program. This required the team to rethink how it would develop it's model and make it flexible enough to support possible future vehicles while at the same time be specific enough to support the program-of-record. This challenge was compounded by the fact that this model was being developed through the traditional DES process-orientation which lacked the flexibility of object-oriented approaches. The team met this challenge through significant pre-planning that led to the "modularization" of the model's structure by identifying what was generic, finding natural logic break points, and the standardization of interlogic numbering system. The outcome of this work resulted in a model that not only was ready to be easily modified to support any future rocket programs, but also a model that was extremely structured and organized in a way that facilitated rapid verification. This paper discusses in detail the process the team followed to build this model and the many advantages this method provides builders of traditional process-oriented discrete event simulations.

Miranda, David J.; Fayez, Sam; Steele, Martin J.

2011-01-01

58

NASA Astrophysics Data System (ADS)

Sudden Cardiac Death (SCD) is responsible for at least 180,000 deaths a year and incurs an average cost of $286 billion annually in the United States alone. Herein, we present a novel discrete event simulation model of SCD, which quantifies the chains of events associated with the formation, growth, and rupture of atheroma plaques, and the subsequent formation of clots, thrombosis and on-set of arrhythmias within a population. The predictions generated by the model are in good agreement both with results obtained from pathological examinations on the frequencies of three major types of atheroma, and with epidemiological data on the prevalence and risk of SCD. These model predictions allow for identification of interventions and importantly for the optimal time of intervention leading to high potential impact on SCD risk reduction (up to 8-fold reduction in the number of SCDs in the population) as well as the increase in life expectancy.

Andreev, Victor P.; Head, Trajen; Johnson, Neil; Deo, Sapna K.; Daunert, Sylvia; Goldschmidt-Clermont, Pascal J.

2013-05-01

59

Sudden Cardiac Death (SCD) is responsible for at least 180,000 deaths a year and incurs an average cost of $286 billion annually in the United States alone. Herein, we present a novel discrete event simulation model of SCD, which quantifies the chains of events associated with the formation, growth, and rupture of atheroma plaques, and the subsequent formation of clots, thrombosis and on-set of arrhythmias within a population. The predictions generated by the model are in good agreement both with results obtained from pathological examinations on the frequencies of three major types of atheroma, and with epidemiological data on the prevalence and risk of SCD. These model predictions allow for identification of interventions and importantly for the optimal time of intervention leading to high potential impact on SCD risk reduction (up to 8-fold reduction in the number of SCDs in the population) as well as the increase in life expectancy.

Andreev, Victor P.; Head, Trajen; Johnson, Neil; Deo, Sapna K.; Daunert, Sylvia; Goldschmidt-Clermont, Pascal J.

2013-01-01

60

The effects of indoor environmental exposures on pediatric asthma: a discrete event simulation model

Background In the United States, asthma is the most common chronic disease of childhood across all socioeconomic classes and is the most frequent cause of hospitalization among children. Asthma exacerbations have been associated with exposure to residential indoor environmental stressors such as allergens and air pollutants as well as numerous additional factors. Simulation modeling is a valuable tool that can be used to evaluate interventions for complex multifactorial diseases such as asthma but in spite of its flexibility and applicability, modeling applications in either environmental exposures or asthma have been limited to date. Methods We designed a discrete event simulation model to study the effect of environmental factors on asthma exacerbations in school-age children living in low-income multi-family housing. Model outcomes include asthma symptoms, medication use, hospitalizations, and emergency room visits. Environmental factors were linked to percent predicted forced expiratory volume in 1 second (FEV1%), which in turn was linked to risk equations for each outcome. Exposures affecting FEV1% included indoor and outdoor sources of NO2 and PM2.5, cockroach allergen, and dampness as a proxy for mold. Results Model design parameters and equations are described in detail. We evaluated the model by simulating 50,000 children over 10 years and showed that pollutant concentrations and health outcome rates are comparable to values reported in the literature. In an application example, we simulated what would happen if the kitchen and bathroom exhaust fans were improved for the entire cohort, and showed reductions in pollutant concentrations and healthcare utilization rates. Conclusions We describe the design and evaluation of a discrete event simulation model of pediatric asthma for children living in low-income multi-family housing. Our model simulates the effect of environmental factors (combustion pollutants and allergens), medication compliance, seasonality, and medical history on asthma outcomes (symptom-days, medication use, hospitalizations, and emergency room visits). The model can be used to evaluate building interventions and green building construction practices on pollutant concentrations, energy savings, and asthma healthcare utilization costs, and demonstrates the value of a simulation approach for studying complex diseases such as asthma.

2012-01-01

61

A statistical process control approach to selecting a warm-up period for a discrete-event simulation

The selection of a warm-up period for a discrete-event simulation continues to be problematic. A variety of selection methods have been devised, and are briefly reviewed. It is apparent that no one method can be recommended above any other. A new approach, based upon the principles of statistical process control, is described (SPC method). Because simulation output data are often

Stewart Robinson

2007-01-01

62

Discrete-event simulation of a wide-area health care network.

OBJECTIVE: Predict the behavior and estimate the telecommunication cost of a wide-area message store-and-forward network for health care providers that uses the telephone system. DESIGN: A tool with which to perform large-scale discrete-event simulations was developed. Network models for star and mesh topologies were constructed to analyze the differences in performances and telecommunication costs. The distribution of nodes in the network models approximates the distribution of physicians, hospitals, medical labs, and insurers in the Province of Saskatchewan, Canada. Modeling parameters were based on measurements taken from a prototype telephone network and a survey conducted at two medical clinics. Simulation studies were conducted for both topologies. RESULTS: For either topology, the telecommunication cost of a network in Saskatchewan is projected to be less than $100 (Canadian) per month per node. The estimated telecommunication cost of the star topology is approximately half that of the mesh. Simulations predict that a mean end-to-end message delivery time of two hours or less is achievable at this cost. A doubling of the data volume results in an increase of less than 50% in the mean end-to-end message transfer time. CONCLUSION: The simulation models provided an estimate of network performance and telecommunication cost in a specific Canadian province. At the expected operating point, network performance appeared to be relatively insensitive to increases in data volume. Similar results might be anticipated in other rural states and provinces in North America where a telephone-based network is desired.

McDaniel, J G

1995-01-01

63

Discrete event simulation tool for analysis of qualitative models of continuous processing systems

NASA Technical Reports Server (NTRS)

An artificial intelligence design and qualitative modeling tool is disclosed for creating computer models and simulating continuous activities, functions, and/or behavior using developed discrete event techniques. Conveniently, the tool is organized in four modules: library design module, model construction module, simulation module, and experimentation and analysis. The library design module supports the building of library knowledge including component classes and elements pertinent to a particular domain of continuous activities, functions, and behavior being modeled. The continuous behavior is defined discretely with respect to invocation statements, effect statements, and time delays. The functionality of the components is defined in terms of variable cluster instances, independent processes, and modes, further defined in terms of mode transition processes and mode dependent processes. Model construction utilizes the hierarchy of libraries and connects them with appropriate relations. The simulation executes a specialized initialization routine and executes events in a manner that includes selective inherency of characteristics through a time and event schema until the event queue in the simulator is emptied. The experimentation and analysis module supports analysis through the generation of appropriate log files and graphics developments and includes the ability of log file comparisons.

Malin, Jane T. (inventor); Basham, Bryan D. (inventor); Harris, Richard A. (inventor)

1990-01-01

64

Towards High Performance Discrete-Event Simulations of Smart Electric Grids

Future electric grid technology is envisioned on the notion of a smart grid in which responsive end-user devices play an integral part of the transmission and distribution control systems. Detailed simulation is often the primary choice in analyzing small network designs, and the only choice in analyzing large-scale electric network designs. Here, we identify and articulate the high-performance computing needs underlying high-resolution discrete event simulation of smart electric grid operation large network scenarios such as the entire Eastern Interconnect. We focus on the simulator's most computationally intensive operation, namely, the dynamic numerical solution for the electric grid state, for both time-integration as well as event-detection. We explore solution approaches using general-purpose dense and sparse solvers, and propose a scalable solver specialized for the sparse structures of actual electric networks. Based on experiments with an implementation in the THYME simulator, we identify performance issues and possible solution approaches for smart grid experimentation in the large.

Perumalla, Kalyan S [ORNL; Nutaro, James J [ORNL; Yoginath, Srikanth B [ORNL

2011-01-01

65

Performance Prediction of Large Parallel Applications using Parallel Simulations

Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance prediction of programs that include both communication and I\\/O intensive applications. The simulator has been used to predict the performance of such applications on both distributed

Rajive Bagrodia; Ewa Deeljman; Steven Docy; Thomas Phan

1999-01-01

66

This study uses a simulation model as a tool for strategic capacity planning for an outpatient physical therapy clinic in Taipei, Taiwan. The clinic provides a wide range of physical treatments, with 6 full-time therapists in each session. We constructed a discrete-event simulation model to study the dynamics of patient mixes with realistic treatment plans, and to estimate the practical capacity of the physical therapy room. The changes in time-related and space-related performance measurements were used to evaluate the impact of various strategies on the capacity of the clinic. The simulation results confirmed that the clinic is extremely patient-oriented, with a bottleneck occurring at the traction units for Intermittent Pelvic Traction (IPT), with usage at 58.9 %. Sensitivity analysis showed that attending to more patients would significantly increase the number of patients staying for overtime sessions. We found that pooling the therapists produced beneficial results. The average waiting time per patient could be reduced by 45 % when we pooled 2 therapists. We found that treating up to 12 new patients per session had no significantly negative impact on returning patients. Moreover, we found that the average waiting time for new patients decreased if they were given priority over returning patients when called by the therapists. PMID:23525907

Rau, Chi-Lun; Tsai, Pei-Fang Jennifer; Liang, Sheau-Farn Max; Tan, Jhih-Cian; Syu, Hong-Cheng; Jheng, Yue-Ling; Ciou, Ting-Syuan; Jaw, Fu-Shan

2013-12-01

67

Parallel processing and simulation

Summary form only given. High instruction execution rates may be achieved through a vorpal of inexpensive processors operating in parallel. The harnessing of this raw computing power to discrete event simulation applications is an active area of research. Three major approaches to the problem, of assigning computational tasks to processing elements may be identified: (1) model based assignment, (2) local function based assignment, and (3) global function based assignment.

Comfort, J.C.

1983-01-01

68

A Discrete-Event Simulation Framework for the Validation of Agent-based and Multi-Agent Systems

Simulation of agent-based systems is an inherent requirement of the development process which provides developers with a powerful means to validate both agents' dynamic behavior and the agent system as a whole and investigate the implications of alternative architectures and coordination strategies. In this paper, we present a discrete-event simulation framework which supports the validation activity of agent-based and multi-

Giancarlo Fortino; Alfredo Garro; Wilma Russo

2005-01-01

69

Large-scale emergencies require groups of response personnel to seek and handle information from an evolving range of sources in order to meet an evolving set of goals, often under conditions of high risk. Because emergencies induce time constraint, efforts spent on planning activities reduce the time available for execution activities. This paper discusses the design and implementation of a discrete-event

Qing Gu; David Mendonça

2006-01-01

70

Combining Simulation and Animation of Queueing Scenarios in a Flash-Based Discrete Event Simulator

eLearning is an effective medium for delivering knowledge and skills to scattered learners. In spite of improvements in electronic\\u000a delivery technologies, eLearning is still a long way away from offering anything close to efficient and effective learning\\u000a environments. Much of the literature supports eLearning materials which embed simulation will improve eLearning experience\\u000a and promise many benefits for both teachers and

Ruzelan Khalid; Wolfgang Kreutzer; Tim Bell

2009-01-01

71

This report outlines a methodology to study the effects of disruptive events on nuclear waste material in stable geologic sites. The methodology is based upon developing a discrete events model that can be simulated on the computer. This methodology allows a natural development of simulation models that use computer resources in an efficient manner. Accurate modeling in this area depends in large part upon accurate modeling of ion transport behavior in the storage media. Unfortunately, developments in this area are not at a stage where there is any consensus on proper models for such transport. Consequently, our work is directed primarily towards showing how disruptive events can be properly incorporated in such a model, rather than as a predictive tool at this stage. When and if proper geologic parameters can be determined, then it would be possible to use this as a predictive model. Assumptions and their bases are discussed, and the mathematical and computer model are described.

Aggarwal, S.; Ryland, S.; Peck, R.

1980-06-19

72

Introduction to military training simulation: a guide for discrete event simulationists

An overview of military training simulation in the form of an introductory tutorial is provided. Basic terminology is introduced, and current trends and research focus in the military training simulation domain are described.

Ernest H. Page; Roger Smith

1998-01-01

73

Using X-machines to model and test discrete event simulation programs

Abstract: - Simulation is a powerful technique for the study of real-life complex problems. Simulation requires the development of a program, which mimics the problem under consideration. The simulation p rogram development follows closely the software e ngineering process. Although, many techniques for modelling problems have been proposed, the testing phase of the developed programs has been under-discussed. In this

E. Kehris; G. Eleftherakis; P. Kefalas

2000-01-01

74

Discrete event fuzzy airport control

A discrete event simulation that uses a modified expert system as a controller is described. Fuzzy logic concepts from analog controllers are applied in the expert system controller to mimic human control of an airport, modeled with a combined discrete and continuous state space. The controller is adaptive so rule confidences are automatically varied to achieve near optimum system performance.

John R. Clymer; Philip D. Corey; Judith A. Gardner

1992-01-01

75

PID controller tuning for mixed continuous\\/discrete event processes using dynamic simulation

The paper discusses the use of dynamic simulation as a PID controller tuning tool for application with real plant or simulated worksheet data. The case study presented is but one example of a broad class of problems in control of processes with multi-loop interactions and combined discrete\\/continuous processes where these tools and methodologies may be helpful. The described example centers

Charlie Kern; Michael Manness

1997-01-01

76

Simulation modeling combined with decision control can offer important benefits for analysis, design, and operation of semiconductor supply-chain network systems. Detailed simulation of physical processes provides information for its controller to account for (expected) stochasticity present in the manufacturing processes. In turn, the controller can provide (near) optimal decisions for the operation of the processes and thus handle uncertainty in

Hessam S. Sarjoughian; Dongping Huang; Gary W. Godding; Karl G. Kempf; Wenlin Wang; Daniel E. Rivera; Hans D. Mittelmann

2005-01-01

77

Discrete Event Simulation and Social Science: The XeriScape Artificial Society

Social scientists use artificial society simulations to explore complex behaviors that result from the interaction of agents. One such artificial society is Epstein and Axtell's Sugarscape simulation. In Sugarscape, agents are born, eat 'sugar', travel, reproduce, and die in a torus-shaped virtual world. Sugarscape uses simple rules to create a virtual society where agent interactions aggregate to form surprisingly complicated

Gordon C. Zaft; Bernard P. Zeigler

78

Background Computer simulation studies of the emergency department (ED) are often patient driven and consider the physician as a human resource whose primary activity is interacting directly with the patient. In many EDs, physicians supervise delegates such as residents, physician assistants and nurse practitioners each with different skill sets and levels of independence. The purpose of this study is to present an alternative approach where physicians and their delegates in the ED are modeled as interacting pseudo-agents in a discrete event simulation (DES) and to compare it with the traditional approach ignoring such interactions. Methods The new approach models a hierarchy of heterogeneous interacting pseudo-agents in a DES, where pseudo-agents are entities with embedded decision logic. The pseudo-agents represent a physician and delegate, where the physician plays a senior role to the delegate (i.e. treats high acuity patients and acts as a consult for the delegate). A simple model without the complexity of the ED is first created in order to validate the building blocks (programming) used to create the pseudo-agents and their interaction (i.e. consultation). Following validation, the new approach is implemented in an ED model using data from an Ontario hospital. Outputs from this model are compared with outputs from the ED model without the interacting pseudo-agents. They are compared based on physician and delegate utilization, patient waiting time for treatment, and average length of stay. Additionally, we conduct sensitivity analyses on key parameters in the model. Results In the hospital ED model, comparisons between the approach with interaction and without showed physician utilization increase from 23% to 41% and delegate utilization increase from 56% to 71%. Results show statistically significant mean time differences for low acuity patients between models. Interaction time between physician and delegate results in increased ED length of stay and longer waits for beds. Conclusion This example shows the importance of accurately modeling physician relationships and the roles in which they treat patients. Neglecting these relationships could lead to inefficient resource allocation due to inaccurate estimates of physician and delegate time spent on patient related activities and length of stay.

2013-01-01

79

A methodology for fabrication of intelligent discrete-event simulation models

In this article a meta-specification for the software requirements and design of intelligent discrete next-event simulation models has been presented. The specification is consistent with established practices for software development as presented in the software engineering literature. The specification has been adapted to take into consideration the specialized needs of object-oriented programming resulting in the actor-centered taxonomy. The heart of the meta-specification is the methodology for requirements specification and design specification of the model. The software products developed by use of the methodology proposed herein are at the leading edge of technology in two very synergistic disciplines - expert systems and simulation. By incorporating simulation concepts into expert systems a deeper reasoning capability is obtained - one that is able to emulate the dynamics or behavior of the object system or process over time. By including expert systems concepts into simulation, the capability to emulate the reasoning functions of decision-makers involved with (and subsumed by) the object system is attained. In either case the robustness of the technology is greatly enhanced.

Morgeson, J.D.; Burns, J.R.

1987-01-01

80

Forest biomass supply logistics for a power plant using the discrete-event simulation approach

This study investigates the logistics of supplying forest biomass to a potential power plant. Due to the complexities in such a supply logistics system, a simulation model based on the framework of Integrated Biomass Supply Analysis and Logistics (IBSAL) is developed in this study to evaluate the cost of delivered forest biomass, the equilibrium moisture content, and carbon emissions from the logistics operations. The model is applied to a proposed case of 300 MW power plant in Quesnel, BC, Canada. The results show that the biomass demand of the power plant would not be met every year. The weighted average cost of delivered biomass to the gate of the power plant is about C$ 90 per dry tonne. Estimates of equilibrium moisture content of delivered biomass and CO2 emissions resulted from the processes are also provided.

Mobini, Mahdi [University of British Columbia, Vancouver; Sowlati, T. [University of British Columbia, Vancouver; Sokhansanj, Shahabaddine [ORNL

2011-04-01

81

Objective Develop and validate particular, concrete, and abstract yet plausible in silico mechanistic explanations for large intra- and interindividual variability observed for eleven bioequivalence study participants. Do so in the face of considerable uncertainty about mechanisms. Methods We constructed an object-oriented, discrete event model called subject (we use small caps to distinguish computational objects from their biological counterparts). It maps abstractly to a dissolution test system and study subject to whom product was administered orally. A subject comprises four interconnected grid spaces and event mechanisms that map to different physiological features and processes. Drugs move within and between spaces. We followed an established, Iterative Refinement Protocol. Individualized mechanisms were made sufficiently complicated to achieve prespecified Similarity Criteria, but no more so. Within subjects, the dissolution space is linked to both a product-subject Interaction Space and the GI tract. The GI tract and Interaction Space connect to plasma, from which drug is eliminated. Results We discovered parameterizations that enabled the eleven subject simulation results to achieve the most stringent Similarity Criteria. Simulated profiles closely resembled those with normal, odd, and double peaks. We observed important subject-by-formulation interactions within subjects. Conclusion We hypothesize that there were interactions within bioequivalence study participants corresponding to the subject-by-formulation interactions within subjects. Further progress requires methods to transition currently abstract subject mechanisms iteratively and parsimoniously to be more physiologically realistic. As that objective is achieved, the approach presented is expected to become beneficial to drug development (e.g., controlled release) and to a reduction in the number of subjects needed per study plus faster regulatory review.

2012-01-01

82

Part 2 of the paper presents the neural network as a meta?model for one?dimensional fibrous materials which was elaborated on the basis of a discrete?event simulation model that was presented in Part 1 of the paper. The architecture of the full?size and part?size network was developed and tested. The training set for the neural networks was obtained from the simulation

Arkady Cherkassky

2011-01-01

83

Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

84

Symbolic discrete event system specification

NASA Technical Reports Server (NTRS)

Extending discrete event modeling formalisms to facilitate greater symbol manipulation capabilities is important to further their use in intelligent control and design of high autonomy systems. An extension to the DEVS formalism that facilitates symbolic expression of event times by extending the time base from the real numbers to the field of linear polynomials over the reals is defined. A simulation algorithm is developed to generate the branching trajectories resulting from the underlying nondeterminism. To efficiently manage symbolic constraints, a consistency checking algorithm for linear polynomial constraints based on feasibility checking algorithms borrowed from linear programming has been developed. The extended formalism offers a convenient means to conduct multiple, simultaneous explorations of model behaviors. Examples of application are given with concentration on fault model analysis.

Zeigler, Bernard P.; Chi, Sungdo

1992-01-01

85

Simulating Billion-Task Parallel Programs

In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

Perumalla, Kalyan S [ORNL] [ORNL; Park, Alfred J [ORNL] [ORNL

2014-01-01

86

M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed.

Khalid, Ruzelan; M. Nawawi, Mohd Kamal; Kawsar, Luthful A.; Ghani, Noraida A.; Kamil, Anton A.; Mustafa, Adli

2013-01-01

87

Background Recent reforms in Portugal aimed at strengthening the role of the primary care system, in order to improve the quality of the health care system. Since 2006 new policies aiming to change the organization, incentive structures and funding of the primary health care sector were designed, promoting the evolution of traditional primary health care centres (PHCCs) into a new type of organizational unit - family health units (FHUs). This study aimed to compare performances of PHCC and FHU organizational models and to assess the potential gains from converting PHCCs into FHUs. Methods Stochastic discrete event simulation models for the two types of organizational models were designed and implemented using Simul8 software. These models were applied to data from nineteen primary care units in three municipalities of the Greater Lisbon area. Results The conversion of PHCCs into FHUs seems to have the potential to generate substantial improvements in productivity and accessibility, while not having a significant impact on costs. This conversion might entail a 45% reduction in the average number of days required to obtain a medical appointment and a 7% and 9% increase in the average number of medical and nursing consultations, respectively. Conclusions Reorganization of PHCC into FHUs might increase accessibility of patients to services and efficiency in the provision of primary care services.

2011-01-01

88

Objective To assess the budgetary impact of switching from screen-film mammography to full-field digital mammography in a population-based breast cancer screening program. Methods A discrete-event simulation model was built to reproduce the breast cancer screening process (biennial mammographic screening of women aged 50 to 69 years) combined with the natural history of breast cancer. The simulation started with 100,000 women and, during a 20-year simulation horizon, new women were dynamically entered according to the aging of the Spanish population. Data on screening were obtained from Spanish breast cancer screening programs. Data on the natural history of breast cancer were based on US data adapted to our population. A budget impact analysis comparing digital with screen-film screening mammography was performed in a sample of 2,000 simulation runs. A sensitivity analysis was performed for crucial screening-related parameters. Distinct scenarios for recall and detection rates were compared. Results Statistically significant savings were found for overall costs, treatment costs and the costs of additional tests in the long term. The overall cost saving was 1,115,857€ (95%CI from 932,147 to 1,299,567) in the 10th year and 2,866,124€ (95%CI from 2,492,610 to 3,239,638) in the 20th year, representing 4.5% and 8.1% of the overall cost associated with screen-film mammography. The sensitivity analysis showed net savings in the long term. Conclusions Switching to digital mammography in a population-based breast cancer screening program saves long-term budget expense, in addition to providing technical advantages. Our results were consistent across distinct scenarios representing the different results obtained in European breast cancer screening programs.

Comas, Merce; Arrospide, Arantzazu; Mar, Javier; Sala, Maria; Vilaprinyo, Ester; Hernandez, Cristina; Cots, Francesc; Martinez, Juan; Castells, Xavier

2014-01-01

89

On extending parallelism to serial simulators

NASA Technical Reports Server (NTRS)

This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.

Nicol, David; Heidelberger, Philip

1994-01-01

90

Inflated speedups in parallel simulations via malloc()

NASA Technical Reports Server (NTRS)

Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

Nicol, David M.

1990-01-01

91

Optimistic synchronization protocols for parallel discrete event simulation employ roll- back techniques to ensure causally consistent execution of simulation events. Although event preemptive rollback (i.e. rollback based on timely event execution interruption upon the arrival of a message revealing a causality inconsistency) is recognized as an approach for increasing the performance and tackling run-time anomalies of this type of synchro-

Andrea Santoro; Francesco Quaglia

2005-01-01

92

Parallel Atomistic Simulations

Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

HEFFELFINGER,GRANT S.

2000-01-18

93

Validation of a language measure for discrete event supervisory control

Language Measure Summary This paper validates a signed real measure of regu- lar languages and analyzes its robustness by simula- tion experiments. The simulation scenario consists of four different finite state automaton models of a fighter airplane, each under control of four different discrete- event supervisors. The results of simulation experi- ments reveal that the language measure is a usable

T. Ortogero; A. Ray; S. Phoha

2003-01-01

94

Automated Trace Analysis of Discrete-Event System Models

In this paper, we describe a novel technique that helps a modeler gain insight into the dynamic behavior of a complex stochastic discrete event simulation model based on trace analysis. We propose algorithms to distinguish progressive from repetitive behavior in a trace and to extract a minimal progressive fragment of a trace. The implied combinatorial optimization problem for trace reduction

Peter Kemper; Carsten Tepper

2009-01-01

95

Scaling Time Warp-based Discrete Event Execution to 10^{4} Processors on Blue Gene Supercomputer

Lately, important large-scale simulation applications, such as emergency/event planning and response, are emerging that are based on discrete event models. The applications are characterized by their scale (several millions of simulated entities), their fine-grained nature of computation (microseconds per event), and their highly dynamic inter-entity event interactions. The desired scale and speed together call for highly scalable parallel discrete event simulation (PDES) engines. However, few such parallel engines have been designed or tested on platforms with thousands of processors. Here an overview is given of a unique PDES engine that has been designed to support Time Warp-style optimistic parallel execution as well as a more generalized mixed, optimistic-conservative synchronization. The engine is designed to run on massively parallel architectures with minimal overheads. A performance study of the engine is presented, including the first results to date of PDES benchmarks demonstrating scalability to as many as 16,384 processors, on an IBM Blue Gene supercomputer. The results show, for the first time, the promise of effectively sustaining very large scale discrete event execution on up to 10^{4} processors.

Perumalla, Kalyan S [ORNL

2007-01-01

96

Using Trace Theory to Model Discrete Events.

National Technical Information Service (NTIS)

Discrete processes are defined by means of trace structures. Every symbol in a trace denotes (the occurrence of) a discrete event. The trace alphabet is split into two disjoint sets, one denoting the communication events, the other denoting the exogenous ...

R. Smedinga

1987-01-01

97

An algebra of discrete event processes

NASA Technical Reports Server (NTRS)

This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.

Heymann, Michael; Meyer, George

1991-01-01

98

Optimal Discrete Event Supervisory Control of Aircraft Gas Turbine Engines

NASA Technical Reports Server (NTRS)

This report presents an application of the recently developed theory of optimal Discrete Event Supervisory (DES) control that is based on a signed real measure of regular languages. The DES control techniques are validated on an aircraft gas turbine engine simulation test bed. The test bed is implemented on a networked computer system in which two computers operate in the client-server mode. Several DES controllers have been tested for engine performance and reliability.

Litt, Jonathan (Technical Monitor); Ray, Asok

2004-01-01

99

Diagnosability of discrete-event systems

Fault detection and isolation is a crucial and challenging task in the automatic control of large complex systems. We propose a discrete-event system (DES) approach to the problem of failure diagnosis. We introduce two related notions of diagnosability of DES's in the framework of formal languages and compare diagnosability with the related notions of observability and invertibility. We present a

M. Sampath; R. Sengupta; S. Lafortune; K. Sinnamohideen; D. Teneketzis

1995-01-01

100

Discrete Events as Units of Perceived Time

ERIC Educational Resources Information Center

In visual images, we perceive both space (as a continuous visual medium) and objects (that inhabit space). Similarly, in dynamic visual experience, we perceive both continuous time and discrete events. What is the relationship between these units of experience? The most intuitive answer may be similar to the spatial case: time is perceived as an…

Liverence, Brandon M.; Scholl, Brian J.

2012-01-01

101

Control of Parameterized Discrete Event Systems

This paper investigates the control of parameterized discrete event systems when specifications are given in terms of predicates\\u000a and satisfy a similarity assumption. This study is motivated by a weakness in current synthesis methods that do not scale\\u000a well to huge systems. For systems consisting of similar processes under total or partial observation, conditions are given\\u000a to deduce properties of

Hans Bherer; Jules Desharnais; Richard St.-Denis

2009-01-01

102

Parallel implementation of VHDL simulations on the Intel iPSC/2 hypercube. Master's thesis

VHDL models are executed sequentially in current commercial simulators. As chip designs grow larger and more complex, simulations must run faster. One approach to increasing simulation speed is through parallel processors. This research transforms the behavioral and structural models created by Intermetrics' sequential VHDL simulator into models for parallel execution. The models are simulated on an Intel iPSC/2 hypercube with synchronization of the nodes being achieved by utilizing the Chandy Misra paradigm for discrete-event simulations. Three eight-bit adders, the ripple carry, the carry save, and the carry-lookahead, are each run through the parallel simulator. Simulation time is cut in at least half for all three test cases over the sequential Intermetrics model. Results with regard to speedup are given to show effects of different mappings, varying workloads per node, and overhead due to output messages.

Comeau, R.C.

1991-12-01

103

GVT Algorithms and Discrete Event Dynamics on 128K+ Processor Cores

Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on over 64,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event execution dynamics on massively parallel platforms.

Perumalla, Kalyan S [ORNL; Park, Alfred J [ORNL; Tipparaju, Vinod [ORNL

2011-01-01

104

Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores

Global virtual time (GVT) computation is a key determinant of the efficiency and runtime dynamics of parallel discrete event simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of up to 54 billion events executed per second is registered. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.

Perumalla, Kalyan S [ORNL] [ORNL; Park, Alfred J [ORNL] [ORNL; Tipparaju, Vinod [ORNL] [ORNL

2014-01-01

105

Planning and supervision of reactor defueling using discrete event techniques

New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.

Garcia, H.E.; Imel, G.R. [Argonne National Lab., IL (United States); Houshyar, A. [Western Michigan Univ., Kalamazoo, MI (United States). Dept. of Physics

1995-12-31

106

Evolution of time horizons in parallel and grid simulations.

We analyze the evolution of the local simulation times (LST) in parallel discrete event simulations. The new ingredients introduced are (i) we associate the LST with the nodes and not with the processing elements, and (ii) we propose to minimize the exchange of information between different processing elements by freezing the LST on the boundaries between processing elements for some time of processing and then releasing them by a wide-stream memory exchange between processing elements. The highlights of our approach are (i) it keeps the highest level of processor time utilization during the algorithm evolution, (ii) it takes a reasonable time for the memory exchange, excluding the time consuming and complicated process of message exchange between processors, and (iii) the communication between processors is decoupled from the calculations performed on a processor. The effectiveness of our algorithm grows with the number of nodes (or threads). This algorithm should be applicable for any parallel simulation with short-range interactions, including parallel or grid simulations of partial differential equations. PMID:15447616

Shchur, L N; Novotny, M A

2004-08-01

107

Parallelizing Timed Petri Net Simulations.

National Technical Information Service (NTIS)

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then si...

D. M. Nicol

1993-01-01

108

PSSA: Parallel Stretched Simulated Annealing

NASA Astrophysics Data System (ADS)

We consider the problem of finding all the global (and some local) minimizers of a given nonlinear optimization function (a class of problems also known as multi-local programming problems), using a novel approach based on Parallel Computing. The approach, named Parallel Stretched Simulated Annealing (PSSA), combines simulated annealing with stretching function technique, in a parallel execution environment. Our PSSA software allows to increase the resolution of the search domains (thus facilitating the discovery of new solutions) while keeping the search time bounded. The software was tested with a set of well known problems and some numerical results are presented.

Ribeiro, Tiago; Rufino, José; Pereira, Ana I.

2011-09-01

109

Exploiting lookahead in parallel simulation

Parallel simulation is an important practical technique for improving the performance of simulations. The most effective approach to parallel simulation depends on the characteristics of the system being simulated. One key characteristic is called lookahead. Another kind of lookahead, called implicit lookahead, was introduced for simulating FCFS stochastic queueing systems; impact lookahead can be exploited to yield performance benefits even when explicit lookahead does not exist. In this paper, the authors show the feasibility of implicit lookahead for non-FCFS systems. They propose several lookahead exploiting techniques for round-robin (RR) system simulations. They design an algorithm that generates lookahead in 0(1) time. Both analytical models and experiments are constructed to evaluate these techniques. The authors also evaluate a lookahead technique for preemptive priority (PP) systems using an analytical model.

Lin, Y.B.; Lazowska, E.D. (Washington Univ., Seattle, WA (USA). Dept. of Computer Science)

1990-10-01

110

Parallelizing Timed Petri Net simulations

NASA Technical Reports Server (NTRS)

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

Nicol, David M.

1993-01-01

111

Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†

Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.

Khan, Md. Ashfaquzzaman; Herbordt, Martin C.

2011-01-01

112

PARALLEL AND DISTRIBUTED SIMULATION SYSTEMS

Originating from basic research conducted in the 1970's and 1980's, the parallel and distributed simulation field has ma- tured over the last few decades. Today, operational systems have been fielded for applications such as military training, analysis of communication networks, and air traffic control systems, to mention a few. This tutorial gives an overview of technologies to distribute the execution

Richard M. Fujimoto

1999-01-01

113

Discrete Event Supervisory Control Applied to Propulsion Systems

NASA Technical Reports Server (NTRS)

The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.

Litt, Jonathan S.; Shah, Neerav

2005-01-01

114

Xyce parallel electronic simulator design.

This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

2010-09-01

115

A case study of Web server benchmarking using parallel WAN emulation

This paper describes the use of a parallel discrete-event network emulator called the Internet Protocol Trac and Network Emulator (IP-TNE) for Web server benchmarking. The experiments in this paper demonstrate the feasibility of high- performance WAN emulation using parallel discrete-event simulation techniques on a single shared-memory multiprocessor. Our experiments with an Apache Web server achieve up to 8000 HTTP\\/1.1 transactions

Carey L. Williamson; Rob Simmonds; Martin F. Arlitt

2002-01-01

116

State Estimation and Detectability of Probabilistic Discrete Event Systems1

A probabilistic discrete event system (PDES) is a nondeterministic discrete event system where the probabilities of nondeterministic transitions are specified. State estimation problems of PDES are more difficult than those of non-probabilistic discrete event systems. In our previous papers, we investigated state estimation problems for non-probabilistic discrete event systems. We defined four types of detectabilities and derived necessary and sufficient conditions for checking these detectabilities. In this paper, we extend our study to state estimation problems for PDES by considering the probabilities. The first step in our approach is to convert a given PDES into a nondeterministic discrete event system and find sufficient conditions for checking probabilistic detectabilities. Next, to find necessary and sufficient conditions for checking probabilistic detectabilities, we investigate the “convergence” of event sequences in PDES. An event sequence is convergent if along this sequence, it is more and more certain that the system is in a particular state. We derive conditions for convergence and hence for detectabilities. We focus on systems with complete event observation and no state observation. For better presentation, the theoretical development is illustrated by a simplified example of nephritis diagnosis.

Shu, Shaolong; Ying, Hao; Chen, Xinguang

2009-01-01

117

Parallel simulation of the Sharks World problem

The Sharks World problem has been suggested as a suitable application to evaluate the effectiveness of parallel simulation algorithms. This paper develops a simulation model in Maisie, a C-based simulation language. With minor modifications, a Maisie progrmm may be executed using either sequential or parallel simulation algorithms. The paper presents the results of executing the Maisie model on a multicomputer

Rajive L. Bagrodia; Wen-Toh Liao

1990-01-01

118

Parallel manipulator robots design and simulation

Parallel manipulator robots have complex kinematics and present singular positions within their workspace. For these reasons, in most software simulating parallel robots, each kinematic model should be given in advance by users or programmers. In this paper we present a new tool used to design and to simulate parallel manipulator robots. Explicit kinematic equations are generated automatically depending on the

SAMIR LAHOUAR; SAID ZEGHLOUL; LOTFI ROMDHANE

119

CAISSON: Interconnect Network Simulator

NASA Technical Reports Server (NTRS)

Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.

Springer, Paul L.

2006-01-01

120

Hierarchical Discrete Event Supervisory Control of Aircraft Propulsion Systems

NASA Technical Reports Server (NTRS)

This paper presents a hierarchical application of Discrete Event Supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the mission objectives. Each engine is operated under a continuously varying control system that maintains the specified performance and a local discrete-event supervisor for condition monitoring and life extending control. A global upper level DES controller is designed for load balancing and overall health management of the propulsion system.

Yasar, Murat; Tolani, Devendra; Ray, Asok; Shah, Neerav; Litt, Jonathan S.

2004-01-01

121

PST: A Simulation Tool for Parallel Systems.

National Technical Information Service (NTIS)

The objective of this research effort was to develop a tool to simulate various parallel computer systems. The tools would give users insight into the differ classes of parallel machines in terms of architecture, software, synchronization, communication, ...

D. J. Potter W. A. Rivet H. Awad

1994-01-01

122

Optimal supervisory control of aircraft propulsion: a discrete event approach

This paper presents an application of discrete event supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the

Murat Yasar; Devendra Tolani; Asok Ray

2005-01-01

123

Synchronization and Linearity: an algebra for discrete event systems

This book proposes a unified mathematical treatment of a class of 'linear' discrete event systems, which contains important subclasses of Petri nets and queuing networks with synchronization constraints. The linearity has to be understood with respect to nonstandard algebraic structures, e.g. the 'max-plus algebra'. A calculus is developed based on such structures, which is followed by tools for computing the

F. Baccelli; G. Cohen; G. J. Olsder; J. P. Quadrat

1992-01-01

124

Maximally Permissive Hierarchical Control of Decentralized Discrete Event Systems

The subject of this paper is the synthesis of natural projections that serve as nonblocking and maximally permissive ab- stractions for the hierarchical and decentralized control of large- scale discrete event systems. To this end, existing concepts for non- blocking abstractions such as natural observers and marked string accepting (msa)-observers are extended by local control consistency (LCC) as a novel

Klaus Schmidt; Christian Breindl

2011-01-01

125

Online diagnosis of discrete event systems based on Petri nets

A novel approach to fault diagnosis of discrete event systems is presented in this paper. The standard approach is based on the offline computation of the set of fault events that may have occurred at each reachable state, providing a fast online diagnosis at a price of excessive memory requirements. A different approach is here adopted, which is based on

Francesco Basile; Pasquale Chiacchio; Gianmaria De Tommasi

2008-01-01

126

Stochastic quasigradient methods for optimization of discrete event systems

In this paper, stochastic programming techniques are adapted and further developed for applications to discrete event systems. We consider cases where the sample path of the system depends discontinuously on control parameters (e.g. modeling of failures, several competing processes), which could make the computation of estimates of the gradient difficult. Methods which use only samples of the performance criterion are

Yury M. Ermoliev; Alexei A. Gaivoronski

1992-01-01

127

Accelerated Waveform Methods for Parallel Transient Simulation

In this paper we compare accelerated waveform relaxation algorithms to pointwise direct and iterative methods for the parallel transient simulation of semiconductor device s on parallel machines. Experimental results are presented for simulations on single (serial) workstations, clusters of workstations, and an IBM SP-2. The results show that ac- celerated waveform methods are competitive with standard pointwise methods on serial

Andrew Lumsdaine; Mark W. Reichelt; Jeffrey M. Squyres; Jacob K. White

128

Hybrid parallel tempering and simulated annealing method

In this paper, we propose a new hybrid scheme of parallel tempering and simulated annealing (hybrid PT\\/SA). Within the hybrid PT\\/SA scheme, a composite system with multi- ple conformations is evolving in parallel on a temperature ladder with various transition step sizes. The simulated annealing (SA) process uses a cooling scheme to decrease the temperature values in the temperature ladder

Yaohang Li; Vladimir A. Protopopescu; Nikita Arnold; Xinyu Zhang; Andrey Gorin

2009-01-01

129

Hierarchical control of aircraft propulsion systems: Discrete event supervisor approach

This paper presents an application of the recently developed theory of language-measure-based discrete event supervisory (DES) control to aircraft propulsion systems. A two-layer hierarchical architecture is proposed to coordinate the operations of a twin-engine propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, as necessary for fulfilling the mission objectives. Each engine, together with its

Murat Yasar; Asok Ray

2007-01-01

130

Pade approximation for stochastic discrete-event systems

We show that Pade approximation can be effectively used for approximation of performance functions in discrete-event systems. The method is (1) obtaining the MacLaurin coefficients of the performance function and (2) finding a Pade approximant from the MacLaurin coefficients and use it to approximate the function. We use the method with the expected number of renewals in a random interval,

Wei-Bo Gong; S. Nananukul; A. Yan

1995-01-01

131

State-space supervision of reconfigurable discrete event systems

The Discrete Event Systems (DES) theory of supervisory and state feedback control offers many advantages for implementing supervisory systems. Algorithmic concepts have been introduced to assure that the supervising algorithms are correct and meet the specifications. It is often assumed that the supervisory specifications are invariant or, at least, until a given supervisory task is completed. However, there are many practical applications where the supervising specifications update at real time. For example, in a Reconfigurable Discrete Event System (RDES) architecture, a bank of supervisors is defined to accommodate each identified operational condition or different supervisory specifications. This adaptive supervisory control system changes the supervisory configuration to accept coordinating commands or to adjust for changes in the controlled process. This paper addresses reconfiguration at the supervisory level of hybrid systems along with a RDES underlying architecture. It reviews the state-based supervisory control theory and extends it to the paradigm of RDES and in view of process control applications. The paper addresses theoretical issues with a limited number of practical examples. This control approach is particularly suitable for hierarchical reconfigurable hybrid implementations.

Garcia, H.E. [Argonne National Lab., IL (United States); Ray, A. [Pennsylvania State Univ., University Park, PA (United States)

1995-12-31

132

Simulating the scheduling of parallel supercomputer applications

An Event Driven Simulator for Evaluating Multiprocessing Scheduling (EDSEMS) disciplines is presented. The simulator is made up of three components: machine model; parallel workload characterization ; and scheduling disciplines for mapping parallel applications (many processes cooperating on the same computation) onto processors. A detailed description of how the simulator is constructed, how to use it and how to interpret the output is also given. Initial results are presented from the simulation of parallel supercomputer workloads using Dog-Eat-Dog,'' Family'' and Gang'' scheduling disciplines. These results indicate that Gang scheduling is far better at giving the number of processors that a job requests than Dog-Eat-Dog or Family scheduling. In addition, the system throughput and turnaround time are not adversely affected by this strategy. 10 refs., 8 figs., 1 tab.

Seager, M.K.; Stichnoth, J.M.

1989-09-19

133

Parallel simulation for aviation applications

The Detailed Policy Assessment Tool (DPAT) is a widely used simulation of air traffic control that incorporates advanced technology for user-friendly operation. DPAT computes congestion-related air traffic delays, throughputs, traffic densities, and arrival\\/departure schedules while incorporating ground delay and ground stop programs, in- trail restrictions, historical, current, or future traffic demand, a fixed or free-flight route structure, and other relevant

Frederick Wieland; McLean VA

1998-01-01

134

Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

135

Parallel processing of a rotating shaft simulation

NASA Technical Reports Server (NTRS)

A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

Arpasi, Dale J.

1989-01-01

136

Representing Dynamic Social Networks in Discrete Event Social Simulation.

National Technical Information Service (NTIS)

One of the key structural components of social systems is the social network. The representation of this network structure is key to providing a valid representation of the society under study. The social science concept of homophily provides a conceptual...

J. K. Alt S. Lieberman

2010-01-01

137

Discrete Event Simulation for Complex High-Power Medical Systems

This position paper describes research activities in the field of targeted lifetime extension of components which are used in medical devices and systems as well as in high-energy physics and industrial applications. The considered medical areas are mainly in the therapy field and imaging diagnostics. Industrial areas are cargo and container scanning as well as food treatment and nondestructive material

Oliver Heuermann; Alexander Fleischer; Wolfgang Fengler

2012-01-01

138

Parallel distributed-time logic simulation

The Chandy-Misra algorithm offers more parallelism than the standard event-driven algorithm for digital logic simulation. With suitable enhancements, the Chandy-Misra algorithm also offers significantly better parallel performance. The authors present methods to optimize the algorithm using information about the large number of global synchronization points, called deadlocks, that limit performance. They classify deadlocks and describe them in terms of circuit

L. Soule; A. Gupta

1989-01-01

139

Parallel logic simulation on general purpose machines

Three parallel algorithms for logic simulation have been developed and implemented on a general purpose shared-memory parallel machine. The first algorithm is a synchronous version of a traditional event-driven algorithm which achieves speed-ups of 6 to 9 with 15 processors. The second algorithm is a synchronous unit-delay compiled mode algorithm which achieves speed-ups of 10 to 13 with 15 processors.

Larry Soulé; Tom Blank

1988-01-01

140

Xyce parallel electronic simulator release notes.

The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

2010-05-01

141

Unsteady flow simulation on a parallel computer

NASA Astrophysics Data System (ADS)

For the simulation of the flow through compressor stages, an interactive flow simulation system is set up on an MIMD-type parallel computer. An explicit scheme is used in order to resolve the time-dependent interaction between the blades. The 2D Navier-Stokes equations are transformed into their general moving coordinates. The parallelization of the solver is based on the idea of domain decomposition. Results are presented for a problem of fixed size (4096 grid nodes for the Hakkinen case).

Faden, M.; Pokorny, S.; Engel, K.

142

Parallel and Distributed System Simulation

NASA Technical Reports Server (NTRS)

This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.

Dongarra, Jack

1998-01-01

143

Daphne:. Data Parallelism Neural Network Simulator

NASA Astrophysics Data System (ADS)

In this paper we describe the guideline of Daphne, a parallel simulator for supervised recurrent neural networks trained by Backpropagation through time. The simulator has a modular structure, based on a parallel training kernel running on the CM-2 Connection Machine. The training kernel is written in CM Fortran in order to exploit some advantages of the slicewise execution model. The other modules are written in serial C code. They are used for designing and testing the network, and for interfacing with the training data. A dedicated language is available for defining the network architecture, which allows the use of linked modules. The implementation of the learning procedures is based on training example parallelism. This dimension of parallelism has been found to be effective for learning static patterns using feedforward networks. We extend training example parallelism for learning sequences with full recurrent networks. Daphne is mainly conceived for applications in the field of Automatic Speech Recognition, though it can also serve for simulating feedforward networks.

Frasconi, Paolo; Gori, Marco; Soda, Giovanni

144

A parallel computational model for GATE simulations.

GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. PMID:24070545

Rannou, F R; Vega-Acevedo, N; El Bitar, Z

2013-12-01

145

Efficient optimistic parallel simulations using reverse computation

In optimistic parallel simulations, state-saving techniques have traditionally been used to realize rollback. In this article, we propose reverse computation as an alternative approach, and compare its execution performance against that of state-saving. Using compiler techniques, we describe an approach to automatically generate reversible computations, and to optimize them to reap the performance benefits of reverse computation transparently. For certain

Christopher D. Carothers; Kaylan S. Perumalla; Richard M. Fujimoto

1999-01-01

146

Parallel simulation of cellular neural networks

In this paper a new simulator for cellular neural networks (CNN) called PSIMCNN is presented. It has been studied for a parallel general-purpose computing architecture based on transputers. The Gauss-Jacobi waveform relaxation (WR) algorithm has been adopted. It has been analytically proved that the WR algorithm is convergent for the most common CNN models. Implementation issues have been described and

L. Fortuna; G. Manganaro; G. Muscato; G. Nunnari

1996-01-01

147

NASA Technical Reports Server (NTRS)

Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.

Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.

1994-01-01

148

Parallel algorithm strategies for circuit simulation.

Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

2010-01-01

149

Parallelism extraction and program restructuring for parallel simulation of digital systems

Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.

Vellandi, B.L.

1990-01-01

150

Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype

ERIC Educational Resources Information Center

This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a…

Sanchez, A.; Bucio, J.

2012-01-01

151

A Survey of Petri Net Methods for Controlled Discrete Event Systems

This paper surveys recent research on the application of Petri net models to the analysis and synthesis of controllers for discrete event systems. Petri nets have been used extensively in applications such as automated manufacturing, and there exists a large body of tools for qualitative and quantitative analysis of Petri nets. The goal of Petri net research in discrete event

L. E. HOLLOWAY; B. H. KROGH; A. GIUA

1997-01-01

152

Fracture simulations via massively parallel molecular dynamics

Fracture simulations at the atomistic level have heretofore been carried out for relatively small systems of particles, typically 10,000 or less. In order to study anything approaching a macroscopic system, massively parallel molecular dynamics (MD) must be employed. In two spatial dimensions (2D), it is feasible to simulate a sample that is 0.1 {mu}m on a side. We report on recent MD simulations of mode I crack extension under tensile loading at high strain rates. The method of uniaxial, homogeneously expanding periodic boundary conditions was employed to represent tensile stress conditions near the crack tip. The effects of strain rate, temperature, material properties (equation of state and defect energies), and system size were examined. We found that, in order to mimic a bulk sample, several tricks (in addition to expansion boundary conditions) need to be employed: (1) the sample must be pre-strained to nearly the condition at which the crack will spontaneously open; (2) to relieve the stresses at free surfaces, such as the initial notch, annealing by kinetic-energy quenching must be carried out to prevent unwanted rarefactions; (3) sound waves emitted as the crack tip opens and dislocations emitted from the crack tip during blunting must be absorbed by special reservoir regions. The tricks described briefly in this paper will be especially important to carrying out feasible massively parallel 3D simulations via MD.

Holian, B.L. [Los Alamos National Lab., NM (United States); Abraham, F.F. [IBM Research Div., San Jose, CA (United States). Almaden Research Center; Ravelo, R. [Texas Univ., El Paso, TX (United States)

1993-09-01

153

Parallel Strategies for Crash and Impact Simulations

We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

1998-12-07

154

Massively Parallel Direct Simulation of Multiphase Flow

The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

2000-08-10

155

Simulation of a master-slave event set processor

Event set manipulation may consume a considerable amount of the computation time spent in performing a discrete-event simulation. One way of minimizing this time is to allow event set processing to proceed in parallel with the remainder of the simulation computation. The paper describes a multiprocessor simulation computer, in which all non-event set processing is performed by the principal processor

J. C. Comfort

1984-01-01

156

PARALLELIZATION OF THE PENELOPE MONTE CARLO PARTICLE TRANSPORT SIMULATION PACKAGE

We have parallelized the PENELOPE Monte Carlo particle transport simulation package (1). The motivation is to increase efficiency of Monte Carlo simulations for medical applications. Our parallelization is based on the standard MPI message passing interface. The parallel code is especially suitable for a distributed memory environment, and has been run on up to 256 processors on the Indiana University

R. B. Cruise; R. W. Sheppard; V. P. Moskvin

2003-01-01

157

Parallel Simulation of Cloth on Distributed Memory Architectures

The physically based simulation of clothes in virtual environments is a highly demanding problem. It involves both modeling the internal material properties of the textile and the interaction with the surrounding scene. We present a parallel cloth simulation approach designed for distributed memory parallel architectures, in particular clusters built of commodity components. In this paper, we focus on the parallelization

Bernhard Thomaszewski; Wolfgang Blochinger

2006-01-01

158

Using zpl to develop a parallel chaos router simulator

This paper reports on our experience in writing a parallel version of a chaos router simulator using the new datadriven parallel language ZPL. The simulator is a large program that tests the capabilities of ZPL. The (parallel) ZPL program is compared with the existing serial implementation on two very different architectures: a 16-processor Intel Paragon and a cluster of eight

Wilkey Richardson; Mary L. Bailey; William H. Sanders

1996-01-01

159

Using ZPL to develop a parallel chaos router simulator

This paper reports on our experience in writing a parallel version of a chaos router simulator using the new data driven parallel language ZPL. The simulator is a large program that tests the capabilities of ZPL. The (parallel) ZPL program is compared with the existing serial implementation on two very different architectures: a 16-processor Intel Paragon and a cluster of

Wilkey Richardson; Mary L. Bailey; William H. Sanders

1996-01-01

160

Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

Carr, Eric; Nicol, David M.

1994-01-01

161

A polymorphic reconfigurable emulator for parallel simulation

NASA Technical Reports Server (NTRS)

Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

1980-01-01

162

Parallel Proximity Detection for Computer Simulation

NASA Technical Reports Server (NTRS)

The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

1997-01-01

163

Parallel Proximity Detection for Computer Simulations

NASA Technical Reports Server (NTRS)

The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

1998-01-01

164

Parallel multiscale simulations of a brain aneurysm

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver ?? ?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( ?? ?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2012-01-01

165

Parallel multiscale simulations of a brain aneurysm

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)] [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Fedosov, Dmitry A. [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany)] [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)

2013-07-01

166

Parallelization of Rocket Engine Simulator Software (PRESS)

NASA Technical Reports Server (NTRS)

Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be encapsulated in C + + code thereby facilitating reuse without undue development effort. The details are covered in the aforementioned section of the interim report filed on April 28, 1997.

Cezzar, Ruknet

1997-01-01

167

Parallel magnetic field perturbations in gyrokinetic simulations

At low beta it is common to neglect parallel magnetic field perturbations on the basis that they are of order beta{sup 2}. This is only true if effects of order beta are canceled by a term in the nablaB drift also of order beta[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B{sub ||} is required for accurate gyrokinetic simulations at low beta for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B{sub ||} are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with eta{sub e}. The effect of B{sub ||} on the ETG instability is shown to depend on the normalized pressure gradient beta{sup '} at constant beta.

Joiner, N.; Hirose, A. [Department of Physics and Engineering Physics, University of Saskatchewan, Saskatoon, Saskatchewan S7N 5E2 (Canada); Dorland, W. [University of Maryland, College Park, Maryland 20742 (United States)

2010-07-15

168

Discrete event command and control for networked teams with multiple missions

NASA Astrophysics Data System (ADS)

During mission execution in military applications, the TRADOC Pamphlet 525-66 Battle Command and Battle Space Awareness capabilities prescribe expectations that networked teams will perform in a reliable manner under changing mission requirements, varying resource availability and reliability, and resource faults. In this paper, a Command and Control (C2) structure is presented that allows for computer-aided execution of the networked team decision-making process, control of force resources, shared resource dispatching, and adaptability to change based on battlefield conditions. A mathematically justified networked computing environment is provided called the Discrete Event Control (DEC) Framework. DEC has the ability to provide the logical connectivity among all team participants including mission planners, field commanders, war-fighters, and robotic platforms. The proposed data management tools are developed and demonstrated on a simulation study and an implementation on a distributed wireless sensor network. The results show that the tasks of multiple missions are correctly sequenced in real-time, and that shared resources are suitably assigned to competing tasks under dynamically changing conditions without conflicts and bottlenecks.

Lewis, Frank L.; Hudas, Greg R.; Pang, Chee Khiang; Middleton, Matthew B.; McMurrough, Christopher

2009-05-01

169

Comparison of the Accuracy of Discrete Event and Discrete Time.

National Technical Information Service (NTIS)

Many combat and agent-based models use time-step as their simulation time advance mechanism. Since time discretization is known to affect the results when numerically solving differential equations, it stands to reason that it might likewise affect the re...

A. Al Rowaei A. Buss

2010-01-01

170

Parssec: A Parallel Simulation Environment for Complex Systems

ulating large-scale systems. Widespread use ofparallel simulation, however, has been significantlyhindered by a lack of tools for integrating parallelmodel execution into the overall framework of systemsimulation. Although a number of algorithmicalternatives exist for parallel execution of discreteeventsimulation models, performance analysts notexpert in parallel simulation have relatively few toolsgiving them flexibility to experiment with multiplealgorithmic or architectural...

Rajive Bagrodia; Richard A. Meyer; Mineo Takai; Yu-An Chen; Xiang Zeng; Jay Martin; Ha Yoon Song

1998-01-01

171

Parallel methods for dynamic simulation of multiple manipulator systems

NASA Technical Reports Server (NTRS)

In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

Mcmillan, Scott; Sadayappan, P.; Orin, David E.

1993-01-01

172

Implementation of a parallel 3-D vacuum electronics simulation code

A three-dimensional electromagnetic particle-in-cell simulation code has been implemented on the NRL Connection Machine-2. Internally, standard algorithms are used as much as possible, but in a parallel formulation. To use advantageously the parallel machine architecture, which includes a large local memory at each processor, optimized nearest-neighbor communication, and flexible hypercube communication, new algorithms have been applied. A fully parallel particle

E. Zaidman

1989-01-01

173

Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

NASA Technical Reports Server (NTRS)

Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

1990-01-01

174

Bridging the gap: Discrete-Event Systems for software engineering (short position paper)

Discrete-Event System Theory (DES) allows the automatic control of a system with respect to a specification describing desirable sequences of events. It offers a large body of work with strong theoretical results and tool support. In this paper, we advocate the application of DES to software engineering problems. We summarize preliminary results and provide a list of directions for future

Juergen Dingel; Karen Rudie; Christopher Dragert

2009-01-01

175

This paper is concerned with the logical control of hybrid control systems (HCS). It is assumed that a discrete-event system (DES) plant model has already been extracted from the continuous-time plant. The problem of hybrid control system design can then be solved by applying logical DES controller synthesis techniques to the extracted DES plant. Traditional DES synthesis methods, however, are

Xiaojun Yang; Michael D. Lemmon; Panos J. Antsaklis

1995-01-01

176

A discrete event model of clinical trial enrollment at eli lilly and company

Clinical trials constitute large, complex, and resource intensive activities for pharmaceutical companies. Accurate prediction of patient enrollment would represent a major step forward in optimizing clinical trials. Currently models for patient enrollment that are both accurate and fast are not available. We present a discrete event model of the patient enrollment process that is accurate and uses relatively small CPU

Bernard M. McGarvey; Nancy J. Dynes; Burch C. Lin; Wesley H. Anderson; James P. Kremidas; James C. Felli

2007-01-01

177

A discrete event model of clinical trial enrollment at Eli Lilly and company

Clinical trials constitute large, complex, and resource in- tensive activities for pharmaceutical companies. Accurate prediction of patient enrollment would represent a major step forward in optimizing clinical trials. Currently models for patient enrollment that are both accurate and fast are not available. We present a discrete event model of the pa- tient enrollment process that is accurate and uses relatively

Bernard M. Mcgarvey; Nancy J. Dynes; Burch C. Lin; Wesley H. Anderson; James P. Kremidas; James C. Felli

2007-01-01

178

Scheduling trains on a railway network using a discrete event model of railway traffic

Scheduling trains in a railway network is a fundamental operational problem in the railway industry. A local feedback-based travel advance strategy is developed using a discrete event model of train advances along lines of the railway. This approach can quickly handle perturbations in the schedule and is shown to perform well on three time-performance criteria while maintaining the local nature

M. J. Dorfman; J. Medanic

2004-01-01

179

Design of discrete event control systems for programmable logic controllers using T-Timed Petri nets

As automated manufacturing systems become more complex, the need for an effective design tool to produce both a high level discrete event control system (DECS) and a low level ladder logic implementation, becomes increasingly more important. Petri nets represent the most effective method for the design of such DECSs. The conversion of such Petri net controllers into ladder logic has

A. H. Jones; M. Uzam; N. Ajlouni

1996-01-01

180

Classification of keystroke dynamics - a case study of fuzzified discrete event handling

The expressiveness of discrete events can be enhanced by incorporating additional attributes, which can be employed for a more detailed classification of an event or to formulate certain requirements towards future instantiations of an event in greater depth. This work is dedicated especially to attributes gained from potentially unreliable sources, such as virtually all sensor systems are. The two main

Gernot Herbst; Steffen F. Bocklisch

2008-01-01

181

Discrete-event requirements model for sensor fusion to provide real-time diagnostic feedback

NASA Astrophysics Data System (ADS)

Minimally-invasive surgical techniques reduce the size of the access corridor and affected zones resulting in limited real-time perceptual information available to the practitioners. A real-time feedback system is required to offset deficiencies in perceptual information. This feedback system acquires data from multiple sensors and fuses these data to extract pertinent information within defined time windows. To perform this task, a set of computing components interact with each other resulting in a discrete event dynamic system. In this work, a new discrete event requirements model for sensor fusion has been proposed to ensure logical and temporal correctness of the operation of the real-time diagnostic feedback system. This proposed scheme models system requirements as a Petri net based discrete event dynamic machine. The graphical representation and quantitative analysis of this model has been developed. Having a natural graphical property, this Petri net based model enables the requirements engineer to communicate intuitively with the client to avoid faults in the early phase of the development process. The quantitative analysis helps justify the logical and temporal correctness of the operation of the system. It has been shown that this model can be analyzed to check the presence of deadlock, reachability, and repetitiveness of the operation of the sensor fusion system. This proposed novel technique to model the requirements of sensor fusion as a discrete event dynamic system has the potential to realize highly reliable real-time diagnostic feedback system for many applications, such as minimally invasive instrumentation.

Rokonuzzaman, Mohd; Gosine, Raymond G.

1998-06-01

182

As automated manufacturing systems become more complex, the need for an effective design tool to produce both high-level discrete event control systems (DECS) and low-level implementations becomes more important. Petri nets represent the most effective method for both the design and implementation of DECSs. In this paper, automation Petri nets (APN) are introduced to provide a new method for the

M. Uzam; A. H. Jones

1998-01-01

183

A Model for PLC Implementation of Supervisory Control of Discrete Event Systems

This paper deals with the implementation of supervisory control of discrete events systems in programmable logic controllers. A modular approach of the supervisory control theory is applied for performing the formal synthesis of supervisors. Some problems that arise in the implementation of supervisory control are reviewed from the literature. The main contribution of this paper is the introduction of conceptual

Agnelo Denis Vieira; José Eduardo Ribeiro Cury; Max Hering De Queiroz

2006-01-01

184

The concept of postponed event in timed discrete event systems and its PLC implementation

Supervisory Control Theory (SCT) is one of the most important frameworks for establishing formal discrete event control systems. Despite some difficulties, the programmable logic controller (PLC) implementation of supervisory controllers obtained from SCT for untimed systems is well established within the literature. However, there is no general and easy to use method for obtaining supervisory controllers with time delay functions.

Gokhan Gelen; Murat Uzam; Recep Dalci

2010-01-01

185

Knowledge representation for Petri net based PLC stage program of discrete-event control design

In a programmable logic controller of a discrete-event control system, up to 60% of the coding effort is devoted to dealing with interlocking. Stage programming is a new concept by breaking a program into logical stages, making complex systems design easier. The stages can then be programmed individually without concern for how they will affect the rest of the program.

ShihSen Peng; MengChu Zhou

2002-01-01

186

Towards scalable parallel-in-time turbulent flow simulations

NASA Astrophysics Data System (ADS)

We present a reformulation of unsteady turbulent flow simulations. The initial condition is relaxed and information is allowed to propagate both forward and backward in time. Simulations of chaotic dynamical systems with this reformulation can be proven to be well-conditioned time domain boundary value problems. The reformulation can enable scalable parallel-in-time simulation of turbulent flows.

Wang, Qiqi; Gomez, Steven A.; Blonigan, Patrick J.; Gregory, Alastair L.; Qian, Elizabeth Y.

2013-11-01

187

Use of Parallel Tempering for the Simulation of Polymer Melts

The parallel tempering algorithm(C. J. Geyer, Computing Science and Statistics: Proceedings of the 23rd Symposium of the Interface, 156 (1991).) is based on simulating several systems in parallel, each of which have a slightly different Hamiltonian. The systems are put in equilibrium with each other by stochastic swaps between neighboring Hamiltonians. Previous implementations have mainly focused on the temperature as

Alex Bunker; Burkhard Duenweg; Doros Theodorou

2000-01-01

188

Traffic simulations on parallel computers using domain decomposition techniques.

National Technical Information Service (NTIS)

Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simu...

U. R. Hanebutte A. M. Tentner

1995-01-01

189

Turbulent Transport Reduction by Zonal Flows: Massively Parallel Simulations.

National Technical Information Service (NTIS)

The dynamics of turbulence-driven E x B zonal flows has been systematically studied in fully 3-dimensional gyrokinetic simulations of microturbulence in magnetically confined toroidal plasmas using recently available massively parallel computers. Linear f...

Z. Lin T. S. Hahm W. W. Lee W. M. Tang R. B. White

1998-01-01

190

Parallel computing in conceptual sewer simulations.

Integrated urban drainage modelling is used to analyze how existing urban drainage systems respond to particular conditions. Based on these integrated models, researchers and engineers are able to e.g. estimate long-term pollution effects, optimize the behaviour of a system by comparing impacts of different measures on the desired target value or get new insights on systems interactions. Although the use of simplified conceptual models reduces the computational time significantly, searching the enormous vector space that is given by comparing different measures or that the input parameters span, leads to the fact, that computational time is still a limiting factor. Owing to the stagnation of single thread performance in computers and the rising number of cores one needs to adapt algorithms to the parallel nature of the new CPUs to fully utilize the available computing power. In this work a new developed software tool named CD3 for parallel computing in integrated urban drainage systems is introduced. From three investigated parallel strategies two showed promising results and one results in a speedup of up to 4.2 on an eight-way hyperthreaded quad core CPU and shows even for all investigated sewer systems significant run-time reductions. PMID:20107253

Burger, G; Fach, S; Kinzel, H; Rauch, W

2010-01-01

191

Parallel Temperature-Accelerated Dynamics Simulations of Epitaxial Growth

NASA Astrophysics Data System (ADS)

The temperature-accelerated dynamics (TAD) method is a powerful tool for carrying out non-equilibrium simulations of systems with infrequent events over extended timescales. However, since the computational time for a typical TAD simulation increases rapidly with the number of atoms N, TAD simulations have so far been limited to relatively small system sizes. By applying a recently proposed synchronous sublattice algorithm to parallel TAD simulations, we have been able to simulate the evolution of systems over much larger length- as well as time-scales. As a first test of our method, we have carried out simulations of the surface diffusion of Cu atoms on the Cu(100) surface. In contrast to serial TAD simulations for which the computational time scales as N^2.5 - N^3, in our parallel TAD simulations the computational time scales as log(N) and may even be independent of N for larger system sizes. In particular we find that for intermediate size systems our parallel TAD simulations are several orders of magnitude faster than the corresponding serial TAD simulations. Preliminary results for low-temperature multilayer Cu/Cu(100) growth obtained using parallel TAD simulations are also presented.

Shim, Y.; Amar, J. G.; Uberuaga, B. P.; Voter, A. F.

2006-03-01

192

Efficient Parallel Execution of Event-Driven Electromagnetic Hybrid Models

New discrete-event formulations of physics simulation models are emerging that can outperform traditional time-stepped models, especially in simulations containing multiple timescales. Detailed simulation of the Earth's magnetosphere, for example, requires execution of sub-models that operate at timescales that differ by orders of magnitude. In contrast to time-stepped simulation which requires tightly coupled updates to almost the entire system state at regular time intervals, the new discrete event simulation (DES) approaches help evolve the states of sub-models on relatively independent timescales. However, in contrast to relative ease of parallelization of time-stepped codes, the parallelization of DES-based models raises challenges with respect to their scalability and performance. One of the key challenges is to improve the computation granularity to offset synchronization and communication overheads within and across processors. Our previous work on parallelization was limited in scalability and runtime performance due to such challenges. Here we report on optimizations we performed on DES-based plasma simulation models to improve parallel execution performance. The mapping of the model to simulation processes is optimized via aggregation techniques, and the parallel runtime engine is optimized for communication and memory efficiency. The net result is the capability to simulate hybrid particle-in-cell (PIC) models with over 2 billion ion particles using 512 processors on supercomputing platforms.

Perumalla, Kalyan S [ORNL; Karimabadi, Dr. Homa [SciberQuest Inc.; Fujimoto, Richard [ORNL

2007-01-01

193

HPC Infrastructure for Solid Earth Simulation on Parallel Computers

NASA Astrophysics Data System (ADS)

Recently, various types of parallel computers with various types of architectures and processing elements (PE) have emerged, which include PC clusters and the Earth Simulator. Moreover, users can easily access to these computer resources through network on Grid environment. It is well-known that thorough tuning is required for programmers to achieve excellent performance on each computer. The method for tuning strongly depends on the type of PE and architecture. Optimization by tuning is a very tough work, especially for developers of applications. Moreover, parallel programming using message passing library such as MPI is another big task for application programmers. In GeoFEM project (http://gefeom.tokyo.rist.or.jp), authors have developed a parallel FEM platform for solid earth simulation on the Earth Simulator, which supports parallel I/O, parallel linear solvers and parallel visualization. This platform can efficiently hide complicated procedures for parallel programming and optimization on vector processors from application programmers. This type of infrastructure is very useful. Source codes developed on PC with single processor is easily optimized on massively parallel computer by linking the source code to the parallel platform installed on the target computer. This parallel platform, called HPC Infrastructure will provide dramatic efficiency, portability and reliability in development of scientific simulation codes. For example, line number of the source codes is expected to be less than 10,000 and porting legacy codes to parallel computer takes 2 or 3 weeks. Original GeoFEM platform supports only I/O, linear solvers and visualization. In the present work, further development for adaptive mesh refinement (AMR) and dynamic load-balancing (DLB) have been carried out. In this presentation, examples of large-scale solid earth simulation using the Earth Simulator will be demonstrated. Moreover, recent results of a parallel computational steering tool using an MxN communication model will be shown. In an MxN communication model, the large-scale computation modules run on M PE's and high performance parallel visualization modules run on N PE's, concurrently. This can allow computation and visualization to select suitable parallel hardware environments respectively. Meanwhile, real-time steering can be achieved during computation so that the users can check and adjust the computation process in real time. Furthermore, different numbers of PE's can achieve better configuration between computation and visualization under Grid environment.

Nakajima, K.; Chen, L.; Okuda, H.

2004-12-01

194

Xyce parallel electronic simulator : users' guide. Version 5.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-11-01

195

VORPAL: a multidimensional, parallel plasma simulation code

We have developed an object oriented framework for a new plasma physics simulation code, VORPAL. Through the use of recursion and template specialization VORPAL is designed to run in any number of physical dimensions without loss to performance. The dimension of the simulation is set at runtime allowing for a quick run to be done in 2D to get qualitative

Chet Nieter; John R. Cary

2001-01-01

196

Large nonadiabatic quantum molecular dynamics simulations on parallel computers

NASA Astrophysics Data System (ADS)

We have implemented a quantum molecular dynamics simulation incorporating nonadiabatic electronic transitions on massively parallel computers to study photoexcitation dynamics of electrons and ions. The nonadiabatic quantum molecular dynamics (NAQMD) simulation is based on Casida's linear response time-dependent density functional theory to describe electronic excited states and Tully's fewest-switches surface hopping approach to describe nonadiabatic electron-ion dynamics. To enable large NAQMD simulations, a series of techniques are employed for efficiently calculating long-range exact exchange correction and excited-state forces. The simulation program is parallelized using hybrid spatial and band decomposition, and is tested for various materials.

Shimojo, Fuyuki; Ohmura, Satoshi; Mou, Weiwei; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2013-01-01

197

Traffic simulations on parallel computers using domain decomposition techniques

Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.

Hanebutte, U.R.; Tentner, A.M.

1995-12-31

198

Parallel stochastic simulation of macroscopic calcium currents.

This work introduces MACACO, a macroscopic calcium currents simulator. It provides a parameter-sweep framework which computes macroscopic Ca(2+) currents from the individual aggregation of unitary currents, using a stochastic model for L-type Ca(2+) channels. MACACO uses a simplified 3-state Markov model to simulate the response of each Ca(2+) channel to different voltage inputs to the cell. In order to provide an accurate systematic view for the stochastic nature of the calcium channels, MACACO is composed of an experiment generator, a central simulation engine and a post-processing script component. Due to the computational complexity of the problem and the dimensions of the parameter space, the MACACO simulation engine employs a grid-enabled task farm. Having been designed as a computational biology tool, MACACO heavily borrows from the way cell physiologists conduct and report their experimental work. PMID:17688315

González-Vélez, Virginia; González-Vélez, Horacio

2007-06-01

199

Parallel Performance in Multi-physics Simulation

A comprehensive simulation of solidification\\/melting pro- cesses requires the simultaneous representation of free surface fluid flow, heat transfer, phase change, non-linear solid mechanics and, possibly, electromagnetics together with their interactions in what is now referred to as 'multi-physics' simulation. A 3D computational procedure and soft- ware tool, PHYSICA, embedding the above multi-physics models using finite volume methods on unstructured meshes

Kevin Mcmanus; Mark Cross; Chris Walshaw; Nick Croft; Alison Williams

2002-01-01

200

Beam dynamics simulations using a parallel version of PARMILA

The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code.

Ryne, R.D.

1996-12-01

201

High Temperature Materials Simulations on Parallel Computers.

National Technical Information Service (NTIS)

Final Progress (1 May 99 - 30 Apr 03): This project deals with properties and processes in high-temperature materials (HTMs) that are vital to the DoD technology base. In this project, molecular-dynamics (MD) simulations have been performed to investigate...

P. Vashishta R. K. Kalia A. Nakano

2003-01-01

202

Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.

Nunes, Ruben; Almeida, José A.

2010-08-01

203

Parallel solution for railway power network simulation

The Streaming SIMD extension (SSE) is a special feature that is available in the Intel Pentium III and P4 classes of microprocessors. As its name implies, SSE enables the execution of SIMD (Single Instruction Multiple Data) operations upon 32-bit floating-point data therefore, performance of floating-point algorithms can be improved. In electrified railway system simulation, the computation involves the solving of

Y. F. Fung; T. K. Ho; W. L. Cheung; M. F. Ercan

2001-01-01

204

Improved task scheduling for parallel simulations. Master's thesis

The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.

McNear, A.E.

1991-12-01

205

Liveness Verification of Discrete Event Systems Modeled by n Safe Ordinary Petri Nets

This paper discusses liveness verification of discrete-event systems modeled by n-safe ordinary Petri nets. A Petri net is live, if it is possible to fire any transition from any reachable marking. The verification method we propose is based on a partial\\u000a order method called network unfolding. Network unfolding maps the original Petri net to an acyclic occurrence net. A finite

Kevin X. He; Michael D. Lemmon

2000-01-01

206

Asynchronous implementation of discrete event controllers based on safe automation Petri nets

In this paper, a new method is proposed for digital hardware implementation of Petri net-based specifications. The purpose\\u000a of this paper is to introduce a new discrete event control system paradigm, where the control system is modeled with extended\\u000a Petri nets and implemented as an asynchronous controller using circuit elements. The applicability of the proposed method\\u000a is demonstrated by an

Murat Uzam; ?. Burak Koç; Gökhan Gelen; B. Hakan Aksebzeci

2009-01-01

207

Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models and look-up tables. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important contribution Xyce makes to the designers at Sandia National Laboratories is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an ''in-house''capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Furthermore, these capabilities will then be migrated to the end users.

HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

2002-11-01

208

Parallel circuit simulation based on nonlinear relaxation methods

The authors describe a number of techniques to perform parallel circuit simulation on a shared-memory multiprocessor using nonlinear relaxation algorithms. These approaches have been implemented on the Alliant FX\\/80 multiprocessor, which is a shared-memory machine with 8 processors. Various schemes based on the iterated timing analysis (ITA) algorithm exploiting subcircuit-level parallelism are described, including multiple barrier, single barrier, event-driven and

G. G. Hung; K. Gallivan; R. Saleh

1991-01-01

209

Applying Parallel Processing Techniques to Tether Dynamics Simulation

NASA Technical Reports Server (NTRS)

The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

Wells, B. Earl

1996-01-01

210

Xyce Parallel Electronic Simulator : users' guide, version 2.0.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

211

Parallel runway requirement analysis study. Volume 2: Simulation manual

NASA Technical Reports Server (NTRS)

This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.

Ebrahimi, Yaghoob S.; Chun, Ken S.

1993-01-01

212

Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

213

Study of a Multilevel Approach to Partitioning for Parallel Logic Simulation

Parallel simulation techniques are often employed to meet the computational requirements of large hardware simulations in order to reduce simulation time. In addi- tion, partitioning for parallel simulations has been shown to be vital for achieving higher simulation throughput. This paper presents the results of our partitioning studies con- ducted on an optimistic parallel logic simulation frame- work based on

Swaminathan Subramanian; Dhananjai Madhava Rao; Philip A. Wilsey

2000-01-01

214

Efficient parallel CFD-DEM simulations using OpenMP

NASA Astrophysics Data System (ADS)

The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.

Amritkar, Amit; Deb, Surya; Tafti, Danesh

2014-01-01

215

Simulation of branching blood flows on parallel computers.

We present a fully parallel nonlinearly implicit algorithm for the numerical simulation of some branching blood flow problems, which require efficient and robust solver technologies in order to handle the high nonlinearity and the complex geometry. Parallel processing is necessary because of the large number of mesh points needed to accurately discretize the system of differential equations. In this paper we introduce a parallel Newton-Krylov-Schwarz based implicit method, and software for distributed memory parallel computers, for solving the nonlinear algebraic systems arising from a Q2-Q1 finite element discretization of the incompressible Navier-Stokes equations that we use to model the blood flow in the left anterior descending coronary artery. PMID:15133979

Yue, Xue; Hwang, Feng-Nan; Shandas, Robin; Cai, Xiao-Chuan

2004-01-01

216

Xyce Parallel Electronic Simulator : reference guide, version 2.0.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

217

A Parallel Multiblock\\/Multidomain Approach for Reservoir Simulation

Our approach for parallel multiphysics and multiscale simulation uses two levels of domain decomposition: physical and computational. First, the physical domain is decomposed into subdomains or blocks according to the geometry, geology, and physics\\/chemistry\\/biology. Each subdomain represents a single physical system, on a reasonable range of scales, such as a black oil region, a compositional region, a region to one

Mary F. Wheeler; Arbogast Todd; Bryant Steven; Eaton Joe; Lu Qin; Peszynska Malgorzata; Yotov Ivan

1999-01-01

218

DEVELOPMENT OF PALPATION SIMULATOR USING PNEUMATIC PARALLEL MANIPULATOR

Nowadays, the breast cancer, over ahead of stomach one, holds the highest cancer incidence rate among Japanese women. Detection of breast cancer is done by palpation diagnosis, which require doctor to have highly experi- enced technique. In this study, we develop a palpation simulator which forms a woman's actual breast model holding active controlled variable stiffness property. A pneumatic parallel

Masahiro TAKAIWA; Toshiro NORITSUGU

2005-01-01

219

Kinematic calibration of parallel robots for docking mechanism motion simulation

A new method for calibrating a parallel robot is proposed as part of a project aimed at developing a calibration method for spacecraft docking mechanism motion simulator. To implement this method, a calibration equation is built by generating the constraint conditions of the end-effector's motion in the workspace using a three-dimensional coordinate measuring machine. According to the established calibration equation

Dayong Yu; Xiwei Sun; Sheng Liu

2009-01-01

220

Hermite spline interpolation on patches for parallel Vlasov beam simulations

NASA Astrophysics Data System (ADS)

In this paper we present a novel interpolation technique for Vlasov simulations of intense space charge dominated beams. This new technique enables to localize the cubic spline interpolation generally performed in semi-Lagrangian Vlasov codes and thus to improve the scalability of the parallel version. This new method is applied to the propagation of a potassium beam in a periodic focusing channel.

Crouseilles, N.; Latu, G.; Sonnendrücker, E.

2007-07-01

221

Parallel simulation of structural VHDL circuits on Intel Hypercubes

NASA Astrophysics Data System (ADS)

Many VLSI circuit designs are too large to be simulated with VHDL in a reasonable amount of time. One approach to reducing the simulation time is to distribute the simulation over several processors. This research creates an environment for designing and simulating structural VHDL circuits on the Intel iPSC/2 and iPSC/860 Hypercubes. Logic gates and system behaviors are partitioned among the processors, and signed changes are shared via event messages. Circuit simulations are run over the SPECTRUM parallel simulation testbed, and the null-message paradigm is used to avoid deadlock. Structural circuits ranging from forty to over one thousand logic gates are correctly simulated. Although no attempt is made to find optimal partitioning strategies, speedups are obtained for some configurations.

Breeden, Thomas A.

1992-12-01

222

Sequential Window Diagnoser for Discrete-Event Systems Under Unreliable Observations

This paper addresses the issue of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). Developed diagnosers referred to as sequential window diagnosers (SWDs) utilize the stochastic diagnoser probability transition matrices developed in [9] along with a resetting mechanism that allows on-line monitoring of special event occurrences. To illustrate their performance, the SWDs are applied to detect and count the occurrence of special events in a particular DEDS. Results show that SWDs are able to accurately track the number of times special events occur.

Wen-Chiao Lin; Humberto E. Garcia; David Thorsley; Tae-Sic Yoo

2009-09-01

223

Stochastic Event Counter for Discrete-Event Systems Under Unreliable Observations

This paper addresses the issues of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). First, we develop a noble recursive procedure that updates active counter information state sequentially with available observations. In general, the cardinality of active counter information state is unbounded, which makes the exact recursion infeasible computationally. To overcome this difficulty, we develop an approximated recursive procedure that regulates and bounds the size of active counter information state. Using the approximated active counting information state, we give an approximated minimum mean square error (MMSE) counter. The developed algorithms are then applied to count special routing events in a material flow system.

Tae-Sic Yoo; Humberto E. Garcia

2008-06-01

224

Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems

NASA Astrophysics Data System (ADS)

A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the `easiest' and `hardest' boundary cases of localization are discussed.

Cai, K.; Wonham, W. M.

2009-03-01

225

We review and develop techniques to determine associations between series of discrete events. The bootstrap, a nonparametric statistical method, allows the determination of the significance of associations with minimal assumptions about the underlying processes. We find the key requirement for this method: one of the series must be widely spaced in time to guarantee the theoretical applicability of the bootstrap. If this condition is met, the calculated significance passes a reasonableness test. We conclude with some potential future extensions and caveats on the applicability of these methods. The techniques presented have been implemented in a Python-based software toolkit.

Niehof, Jonathan T.; Morley, Steven K.

2012-01-01

226

Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems

A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the 'easiest' and 'hardest' boundary cases of localization are discussed.

Cai, K.; Wonham, W. M. [Systems Control Group, Department of Electrical and Computer Engineering, University of Toronto, 10 King's College Road, Toronto, ON, M5S 3G4 (Canada)

2009-03-05

227

This paper describes a Java-based system for allocating simulation trials to a set of P parallel processors for carrying out a simulation study involving direct-search optimization or response surface methodology. Unlike distributed simulation, where a simulation model is decomposed and its parts run in a parallel environment, the parallel replications approach allows a simulation model to run to completion with

William E. Biles; Jack P. C. Kleijnen

1999-01-01

228

Reusable Component Model Development Approach for Parallel and Distributed Simulation

Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications.

Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

2014-01-01

229

PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.

Joshi, Abhijit S [ORNL; Jain, Prashant K [ORNL; Mudrich, Jaime A [ORNL; Popov, Emilian L [ORNL

2012-01-01

230

Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

231

Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

232

Numerical simulation of supersonic wake flow with parallel computers

Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.

Wong, C.C. [Sandia National Labs., Albuquerque, NM (United States); Soetrisno, M. [Amtec Engineering, Inc., Bellevue, WA (United States)

1995-07-01

233

A new approach for high-speed simulation is applied to the analysis of nuclear power system dynamics. The proposed approach is to first identify inherent parallelism and then to develop suitable parallel computation algorithms. The latter includes numerical integration and table lookup techniques that can be used for achieving high-speed simulation. A performance evaluation of the proposed methodology has been completed, which is based on benchmark simulation for pressurized water reactor plant dynamics. The multirate integration algorithm and an innovative table lookup technique running on a parallel processing computer system have proved to be the most advantageous in computational speed.

Yeh, H.C.; Kastenberg, W.E.; Karplus, W.J.

1989-01-01

234

Non-intrusive parallelization of multibody system dynamic simulations

NASA Astrophysics Data System (ADS)

This paper evaluates two non-intrusive parallelization techniques for multibody system dynamics: parallel sparse linear equation solvers and OpenMP. Both techniques can be applied to existing simulation software with minimal changes in the code structure; this is a major advantage over Message Passing Interface, the standard parallelization method in multibody dynamics. Both techniques have been applied to parallelize a starting sequential implementation of a global index-3 augmented Lagrangian formulation combined with the trapezoidal rule as numerical integrator, in order to solve the forward dynamics of a variable-loop four-bar mechanism. Numerical experiments have been performed to measure the efficiency as a function of problem size and matrix filling. Results show that the best parallel solver (Pardiso) performs better than the best sequential solver (CHOLMOD) for multibody problems of large and medium sizes leading to matrix fillings above 10. OpenMP also proved to be advantageous even for problems of small sizes. Both techniques delivered speedups above 70% of the maximum theoretical values for a wide range of multibody problems.

González, Francisco; Luaces, Alberto; Lugrís, Urbano; González, Manuel

2009-09-01

235

Numerical simulation of parallel hole cut blasting with uncharged holes

The cavity formation and propagation process of stress wave from parallel hole cut blasting was simulated with ANSYS\\/LS-DYNA 3D nonlinear dynamic finite element software. The distribution of element plastic strain, node velocity, node time-acceleration history and the blasting cartridge volume ratio during the process were analyzed. It was found that the detonation of charged holes would cause the interaction of

Shijie Qu; Xiangbin Zheng; Lihua Fan; Ying Wang

2008-01-01

236

Massively parallel simulations of diffusion in dense polymeric structures

An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon (1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of

Jean-Loup Faulon; J. David Hobbs; David M. Ford; Robert T. Wilcox

1997-01-01

237

Parallel density matrix propagation in spin dynamics simulations.

Several methods for density matrix propagation in parallel computing environments are proposed and evaluated. It is demonstrated that the large communication overhead associated with each propagation step (two-sided multiplication of the density matrix by an exponential propagator and its conjugate) may be avoided and the simulation recast in a form that requires virtually no inter-thread communication. Good scaling is demonstrated on a 128-core (16 nodes, 8 cores each) cluster. PMID:22299862

Edwards, Luke J; Kuprov, Ilya

2012-01-28

238

Parallel algorithms for simulating continuous time Markov chains

NASA Technical Reports Server (NTRS)

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

Nicol, David M.; Heidelberger, Philip

1992-01-01

239

Micromagnetic simulation of tunneling magnetoresistance junctions with parallel hard bias

Tunneling magnetoresistance (TMR) films and devices were simulated to understand the response of the free layer with a parallel hard bias. In order to determine the effect of the granular hard bias material micromagnetic simulation was used to model both the hard bias and TMR material. Minimizing hysteresis and Barkhausen jumps in the response of the device involves an optimization of the spacing between the free layer and the hard bias coupled with the shape of the device edges. {copyright} 2001 American Institute of Physics.

Gibbons, M. R.; Sin, K.; Funada, S.; Mao, M.; Rao, D.; Chien, C.; Tong, H. C.

2001-06-01

240

A package for the simulation of asynchronous parallel systems

High interaction rate colliders impose stringent requirements on data acquisition and triggering systems. These systems can only be realised by using asynchronous parallel networks of processors. The accurate prediction of the performance of these networks is essential for the design of the system. We have written a package to simulate networks of asynchronous processors. We have written a package to simulate networks of asynchronous processors. The package is general and has been successfully used to help design the data aquisition system of the ZEUS central tracking detector.

Hallam-Baker, P.M.; Gingrich, D.M.; McArthur, I.C. (Nuclear Physics Laboratory, University of Oxford, Oxford (United Kingdom))

1990-08-01

241

Time parallelization of advanced operation scenario simulations of ITER plasma

NASA Astrophysics Data System (ADS)

This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA -an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.

Samaddar, D.; Casper, T. A.; Kim, S. H.; Berry, L. A.; Elwasif, W. R.; Batchelor, D.; Houlberg, W. A.

2013-02-01

242

NASA Technical Reports Server (NTRS)

The development process for a large software development project is very complex and dependent on many variables that are dynamic and interrelated. Factors such as size, productivity and defect injection rates will have substantial impact on the project in terms of cost and schedule. These factors can be affected by the intricacies of the process itself as well as human behavior because the process is very labor intensive. The complex nature of the development process can be investigated with software development process models that utilize discrete event simulation to analyze the effects of process changes. The organizational environment and its effects on the workforce can be analyzed with system dynamics that utilizes continuous simulation. Each has unique strengths and the benefits of both types can be exploited by combining a system dynamics model and a discrete event process model. This paper will demonstrate how the two types of models can be combined to investigate the impacts of human resource interactions on productivity and ultimately on cost and schedule.

Mizell, Carolyn Barrett; Malone, Linda

2007-01-01

243

Large Scale Parallel 3D Simulation of Regional Wave Propagation Using the Earth Simulator

This paper presents an efficient parallel code for seismic wave propagation in 3D heterogeneous structures developed for implementation on the Earth Simulator (5120 CPUs, 40 TFLOPS)_@at the JAMSTEC Yokohama Institute, a high-performance vector parallel system suitable for large-scale simulations. The equations of motion for the 3D wavefield are solved using a higher-order (8,16, 32 etc.) staggered-grid finite-difference method (FDM) in

T. Furumura

2003-01-01

244

Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

2005-06-01

245

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.

Rostrup, Scott; De Sterck, Hans

2010-12-01

246

Particle simulation of plasmas on the massively parallel processor

NASA Technical Reports Server (NTRS)

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

247

Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

248

Simulation of 2D Magnetoconvection on Parallel Computers

NASA Astrophysics Data System (ADS)

Magnetoconvection, or the interaction of magnetic fields with a thermally-driven plasma flow, is a phenomenon of great importance to stellar interiors---for example, it is thought to be the underlying cause of solar and stellar dynamos. Computer simulation is necessary for investigating problems of magnetoconvection because of its inherent nonlinearities. The advent of high-speed parallel computers opens up fresh possibilities for high-resolution simulation studies of the nonlinear dynamics of magnetoconvection. This presentation will outline the approach taken in parallelizing a 2D Boussinesq MHD model code designed to run on the Cornell Theory Center's IBM SP2 computer. Here, the plasma is imagined to be confined in a rotating, heated cylindrical annulus and threaded with azimuthal magnetic fields; this configuration is roughly analogous to a near-equatorial slice of the Sun. Such a geometry is germane to the study of many interesting types of nonlinear MHD waves, some of which may self-modulate or even reverse their direction of propagation (Lantz, Ph.D. thesis, 1992). Computational issues such as domain decomposition, parallel solution of a Poisson equation, and algorithmic optimization will be discussed.

Lantz, Steven R.

1996-05-01

249

Exception handling controllers: An application of pushdown systems to discrete event control

Recent work by the author has extended the Supervisory Control Theory to include the class of control languages defined by pushdown machines. A pushdown machine is a finite state machine extended by an infinite stack memory. In this paper, we define a specific type of deterministic pushdown machine that is particularly useful as a discrete event controller. Checking controllability of pushdown machines requires computing the complement of the controller machine. We show that Exception Handling Controllers have the property that algorithms for taking their complements and determining their prefix closures are nearly identical to the algorithms available for finite state machines. Further, they exhibit an important property that makes checking for controllability extremely simple. Hence, they maintain the simplicity of the finite state machine, while providing the extra power associated with a pushdown stack memory. We provide an example of a useful control specification that cannot be implemented using a finite state machine, but can be implemented using an Exception Handling Controller.

Griffin, Christopher H [ORNL

2008-01-01

250

NASA Astrophysics Data System (ADS)

In the framework of decentralized supervisory control of timed discrete event systems (TDESs), each local supervisor decides the set of events to be enabled to occur and the set of events to be forced to occur under its own local observation in order for a given specification to be satisfied. In this paper, we focus on fusion rules for the enforcement decisions and adopt the combined fusion rule using the AND rule and the OR rule. We first derive necessary and sufficient conditions for the existence of a decentralized supervisor under the combined fusion rule for a given partition of the set of forcible events. We next study how to find a suitable partition.

Nomura, Masashi; Takai, Shigemasa

251

Safety analysis of discrete event systems using a simplified Petri net controller.

This paper deals with the problem of forbidden states in discrete event systems based on Petri net models. So, a method is presented to prevent the system from entering these states by constructing a small number of generalized mutual exclusion constraints. This goal is achieved by solving three types of Integer Linear Programming problems. The problems are designed to verify the constraints that some of them are related to verifying authorized states and the others are related to avoiding forbidden states. The obtained constraints can be enforced on the system using a small number of control places. Moreover, the number of arcs related to these places is small, and the controller after connecting them is maximally permissive. PMID:24074873

Zareiee, Meysam; Dideban, Abbas; Asghar Orouji, Ali

2014-01-01

252

New techniques for parallel simulation of high-temperature superconductors

In this paper we discuss several new techniques used for the simulation of high-temperature superconductors on parallel computers. We introduce an innovative methodology to study the effects of temperature fluctuations on the vortex lattice configuration of these materials. We have found that the use of uniform orthogonal meshes results in several limitations. To address these limitations, we consider nonorthogonal meshes and describe a new discrete formulation that solves the difficult problem of maintaining gauge invariance on nonorthogonal meshes. With this discretization, adaptive refinement strategies are used to concentrate grid points where error contributions are large (in this case, near vortex cores). We describe the algorithm used for the parallel implementation of this refinement strategy, and we present computational results obtained on the Intel DELTA.

Freitag, L.; Plassmann, P. [Argonne National Lab., IL (United States)] [Argonne National Lab., IL (United States); Jones, M. [Tennessee Univ., Knoxville, TN (United States). Dept. of Computer Science] [Tennessee Univ., Knoxville, TN (United States). Dept. of Computer Science

1994-06-01

253

A fast ultrasonic simulation tool based on massively parallel implementations

NASA Astrophysics Data System (ADS)

This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.

Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain

2014-02-01

254

MRISIMUL: a GPU-based parallel approach to MRI simulations.

A new step-by-step comprehensive MR physics simulator (MRISIMUL) of the Bloch equations is presented. The aim was to develop a magnetic resonance imaging (MRI) simulator that makes no assumptions with respect to the underlying pulse sequence and also allows for complex large-scale analysis on a single computer without requiring simplifications of the MRI model. We hypothesized that such a simulation platform could be developed with parallel acceleration of the executable core within the graphic processing unit (GPU) environment. MRISIMUL integrates realistic aspects of the MRI experiment from signal generation to image formation and solves the entire complex problem for densely spaced isochromats and for a densely spaced time axis. The simulation platform was developed in MATLAB whereas the computationally demanding core services were developed in CUDA-C. The MRISIMUL simulator imaged three different computer models: a user-defined phantom, a human brain model and a human heart model. The high computational power of GPU-based simulations was compared against other computer configurations. A speedup of about 228 times was achieved when compared to serially executed C-code on the CPU whereas a speedup between 31 to 115 times was achieved when compared to the OpenMP parallel executed C-code on the CPU, depending on the number of threads used in multithreading (2-8 threads). The high performance of MRISIMUL allows its application in large-scale analysis and can bring the computational power of a supercomputer or a large computer cluster to a single GPU personal computer. PMID:24595337

Xanthis, Christos G; Venetis, Ioannis E; Chalkias, A V; Aletras, Anthony H

2014-03-01

255

Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

As the single CPU core performance is saturating while the number of cores in the fastest supercomputers increases exponentially, the parallel performance of simulations on distributed memory machines is crucial. At the same time, utilizing efficiently the large number of available cores presents a challenge, especially in simulations with run-time adaptive mesh refinement which can be the key to high performance. We have developed a generic grid library (dccrg) that is easy to use and scales well up to tens of thousands of cores. The grid has several attractive features: It 1) allows an arbitrary C++ class or structure to be used as cell data; 2) is easy to use and provides a simple interface for run-time adaptive mesh refinement ; 3) transfers the data of neighboring cells between processes transparently and asynchronously; and 4) provides a simple interface to run-time load balancing, e.g. domain decomposition, through the Zoltan library. Dccrg is freely available from https://gitorious.org/dccrg for anyone to use, study and modify under the GNU Lesser General Public License version 3. We present an overview of the implementation of dccrg, its parallel scalability and several source code examples of its usage in different types of simulations.

Honkonen, Ilja; von Alfthan, Sebastian; Sandroos, Arto; Janhunen, Pekka; Palmroth, Minna

2013-04-01

256

High Performance Parallel Methods for Space Weather Simulations

NASA Technical Reports Server (NTRS)

This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.

Hunter, Paul (Technical Monitor); Gombosi, Tamas I.

2003-01-01

257

Numerical Simulation of Flow Field Within Parallel Plate Plastometer

NASA Technical Reports Server (NTRS)

Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.

Antar, Basil N.

2002-01-01

258

Simulation of hypervelocity impact on massively parallel supercomputer

Hypervelocity impact studies are important for debris shield and armor/anti-armor research and development. Numerical simulations are frequently performed to complement experimental studies, and to evaluate code accuracy. Parametric computational studies involving material properties, geometry and impact velocity can be used to understand hypervelocity impact processes. These impact simulations normally need to address shock wave physics phenomena, material deformation and failure, and motion of debris particles. Detailed, three-dimensional calculations of such events have large memory and processing time requirements. At Sandia National Laboratories, many impact problems of interest require tens of millions of computational cells. Furthermore, even the inadequately resolved problems often require tens or hundred of Cray CPU hours to complete. Recent numerical studies done by Grady and Kipp at Sandia using the Eulerian shock wave physics code CTH demonstrated very good agreement with many features of a copper sphere-on-steel plate oblique impact experiment, fully utilizing the compute power and memory of Sandia`s Cray supercomputer. To satisfy requirements for more finely resolved simulations in order to obtain a better understanding of the crater formation process and impact ejecta motion, the numerical work has been moved from the shared-memory Cray to a large, distributed-memory, massively parallel supercomputing system using PCTH, a parallel version of CTH. The current work is a continuation of the studies, but done on Sandia`s Intel 1840-processor Paragon X/PS parallel computer. With the great compute power and large memory provided by the Paragon, a highly detailed PCTH calculation has been completed for the copper sphere impacting steel plate experiment. Although the PCTH calculation used a mesh which is 4.5 times bigger than the original Cray setup, it finished in much less CPU time.

Fang, H.E.

1994-12-31

259

Highly parallelized detection of single fluorescent molecules: simulation and experiment

NASA Astrophysics Data System (ADS)

We are developing an ultrasensitive, fluorescence-based detection system in highly parallel microchannels. Multichannel microfluidic devices have been fabricated by direct femtosecond laser machining of fused silica substrates. We approach single-molecule detection sensitivity by introducing dilute aqueous solutions (˜ pM) of fluorescently labeled molecules into the microchannels. In a custom-built, wide-field microscope, a line-generating red diode laser provides narrow epi-illumination across a 500 ?m field of view. Fluorescence is detected with an electron-multiplying CCD camera allowing readout rates of several kHz. Rapid initial assessment is performed through digital filtering derived from simulations based on experimental parameters. Good agreement has been shown between simulation and experimental data. Fluorescence correlation spectroscopy then provides more detailed analysis of each separate channel. Following optimization, microfluidic devices could easily be mass-produced in low-cost polymers using imprint lithography.

Canfield, Brian K.; King, Jason K.; Robinson, William N.; Hofmeister, William H.; Davis, Lloyd M.

2011-10-01

260

Financial simulations on a massively parallel Connection Machine

This paper reports on the valuation of complex financial instruments that appear in the banking and insurance industries which requires simulations of their cashflow behavior in a volatile interest rate environment. These simulations are complex and computationally intensive. Their use, thus far, has been limited to intra-day analysis and planning. Researchers at the Wharton School and Thinking Machines Corporation have developed model formulations for massively parallel architectures, like the Connection Machine CM-2. A library of financial modeling primitives has been designed and used to implement a model for the valuation of mortgage-backed securities. Analyzing a portfolio of these securities-which would require 2 days on a large mainframe-is carried out in 1 hour on a CM-2a.

Hutchinson, J.M.; Zenios, S.A. (Thinking Machine Corp., Cambridge, MA (US))

1991-01-01

261

Parallelism and pipelining in high-speed digital simulators

NASA Technical Reports Server (NTRS)

The attainment of high computing speed as measured by the computational throughput is seen as one of the most challenging requirements. It is noted that high speed is cardinal in several distinct classes of applications. These classes are then discussed; they comprise (1) the real-time simulation of dynamic systems , (2) distributed parameter systems, and (3) mixed lumped and distributed systems. From the 1950s on, the quest for high speed in digital simulators concentrated on overcoming the limitations imposed by the so-called von Neumann bottleneck. Two major architectural approaches have made ig possible to circumvent this bottleneck and attain high speeds. These are pipelining and parallelism. Supercomputers, peripheral array processors, and microcomputer networks are then discussed.

Karplus, W. J.

1983-01-01

262

Parallel programming in MIMD type parallel systems using transputer and i860 in physical simulations

NASA Astrophysics Data System (ADS)

Parallel programming and calculation performance were examined by using two types of MIMD parallel systems, that is, a transputer (T800) network and iPSC/860. Some interface subroutines were developed to apply the programs parallelized by using a transputer network to iPSC/860. Compatibility and performance of parallelized programs are discussed.

Ido, S.; Hikosaka, S.

1992-05-01

263

A Generic Scheduling Simulator for High Performance Parallel Computers

It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of different scheduling algorithms in various computing environments. The following are our design objectives for this generic simulator. (1) Accept descriptions of varied workloads for a wide range of computing environments. (2) Provide an easy-to-use interface for description of the scheduling policies being evaluated. (3) Accurately calculate the overhead induced by various scheduling algorithms. (4) Accurately model a variety of machine architectures. In summary, we have developed a generic scheduling simulator for high performance parallel computers. This generic simulator supports standard and user-defined job attributes and generates the job attribute values from different input sources, allowing users to model a wide range of workloads, and produces performance parameters with reliability measures. All overheads caused by scheduling algorithms are considered in measuring the performance parameters. The simulator simulates a queuing network to which users can bound a specific scheduling algorithm written as a C function. A set of APIs is provided for the users to facilitate describing the scheduling algorithms. With these features, this simulator can accurately simulate any scheduling algorithms under various workloads and computing platforms. The simulator does not currently model dynamic events like message passing between tasks closely, but we plan to include this crucial functionality into our simulator in the future.

Yoo, B S; Choi, G S; Jette, M A

2001-08-01

264

Integrating System Performance Engineering into MASCOT Methodology through Discrete-Event Simulation

\\u000a Software design methodologies are incorporating non functional features on design system descriptions. MASCOT which has been\\u000a a traditional design methodology for European defence companies has no performance extension. In this paper we present a set\\u000a of performance annotations to the MASCOT methodology, called MASCOTime. These annotations are extending the MDL (MASCOT Description\\u000a Language) design components transparently. Thus, in order to evaluate the

Pere P. Sancho; Carlos Juiz; Ramón Puigjaner

2004-01-01

265

Large-scale emergencies require groups of response personnel to seek and handle information from an evolving range of sources in order to meet an evolving set of goals, often under conditions of high risk. Because emergencies induce time constraint, efforts spent on planning activities reduce the time available for execution activities. This paper discusses the design and implementation of a discreteevent

Qing Gu; David Mendonça

2006-01-01

266

National Technical Information Service (NTIS)

Reconnaissance missions are not only one of the vital modes of intelligence gathering methods, they are one of the most important contributors of military intelligence as well. They show the battlefield as it is to the commander. A simplified reconnaissan...

E. Kemik O. Arslan

2010-01-01

267

Oscillator Model for High-Precision Synchronization Protocol Discrete Event Simulation.

National Technical Information Service (NTIS)

It is well known that a common notion of time in distributed systems can be used to ensure additional properties such as real-time behavior or the identification of the order of events. As large-scale hardware testbeds for such systems are neither efficie...

A. Nagy G. Gaderer J. Mad P. Loschmidt R. Beigelbeck

2007-01-01

268

A System for Patient Management Based Discrete-Event Simulation and Hierarchical Clustering

Hospital Accident and Emergency (A&E) departments in England have a 4 hour target to treat 98% of patients from arrival to discharge, admission or transfer. Managing re- sources to meet the target and deliver care across the range of A&E services is a huge challenge for A&E managers. This paper develops an intelligent patient management tool to help managers and

Anthony Codrington-virtue; Thierry J. Chaussalet; Peter H. Millard; Paul Whittlestone; John Kelly

2006-01-01

269

Concurrent simulation of a parallel jaw end effector

NASA Technical Reports Server (NTRS)

A system of programs developed to aid in the design and development of the command/response protocol between a parallel jaw end effector and the strategic planner program controlling it are presented. The system executes concurrently with the LISP controlling program to generate a graphical image of the end effector that moves in approximately real time in response to commands sent from the controlling program. Concurrent execution of the simulation program is useful for revealing flaws in the communication command structure arising from the asynchronous nature of the message traffic between the end effector and the strategic planner. Software simulation helps to minimize the number of hardware changes necessary to the microprocessor driving the end effector because of changes in the communication protocol. The simulation of other actuator devices can be easily incorporated into the system of programs by using the underlying support that was developed for the concurrent execution of the simulation process and the communication between it and the controlling program.

Bynum, Bill

1985-01-01

270

Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

Faulon, Jean-Loup, Wilcox, R.T. [Sandia National Labs., Albuquerque, NM (United States)], Hobbs, J.D. [Montana Tech of the Univ. of Montana, Butte, MT (United States). Dept. of Chemistry and Geochemistry], Ford, D.M. [Texas A and M Univ., College Station, TX (United States). Dept. of Chemical Engineering

1997-11-01

271

Roadmap for efficient parallelization of breast anatomy simulation

NASA Astrophysics Data System (ADS)

A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 ?m voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.

Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.

2012-02-01

272

A deadlock avoidance supervisory controller for Discrete Event (DE) Systems is implemented. The DE controller uses a novel rule-based matrix dispatching formulation (US patent received). This matrix formulation makes it direct to write down the DE controller from standard manufacturing tools such as the bill of materials or the assembly tree. It is shown that the DE controller's matrix form

Ayla Gürel

273

Optimistic Simulations of Physical Systems using Reverse Computation

Efficient computer simulation of complex physical phenomena has long been challenging due to their multi-physics and multi-scale nature. In contrast to traditional time-stepped execution methods, we describe an approach using optimistic parallel discrete event simulation (PDES) and reverse computation techniques to execute plasma physics codes. We show that reverse computation-based optimistic parallel execution can significantly reduce the execution time of an example plasma simulation without requiring a significant amount of additional memory compared to conservative execution techniques. We describe an application-level reverse computation technique that is efficient and suitable for complex scientific simulations.

Tang, Yarong [Georgia Institute of Technology; Perumalla, Kalyan S [ORNL; Fujimoto, Richard [ORNL; Karimabadi, Dr. Homa [SciberQuest Inc.; Driscoll, Jonathan [SciberQuest Inc.; Omelchenko, Yuri [SciberQuest Inc.

2006-01-01

274

Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations

Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.

Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia

2010-08-01

275

Parallel algorithm for spin and spin-lattice dynamics simulations

NASA Astrophysics Data System (ADS)

To control numerical errors accumulated over tens of millions of time steps during the integration of a set of highly coupled equations of motion is not a trivial task. In this paper, we propose a parallel algorithm for spin dynamics and the newly developed spin-lattice dynamics simulation [P. W. Ma , Phys. Rev. B 78, 024434 (2008)]. The algorithm is successfully tested in both types of dynamic calculations involving a million spins. It shows good stability and numerical accuracy over millions of time steps (˜1ns) . The scheme is based on the second-order Suzuki-Trotter decomposition (STD). The usage can avoid numerical energy dissipation despite the trajectory and machine errors. The mathematical base of the symplecticity, for properly decomposed evolution operators, is presented. Due to the noncommutative nature of the spin in the present STD scheme, a unique parallel algorithm is needed. The efficiency and stability are tested. It can attain six to seven times speed up when eight threads are used. The run time per time step is linearly proportional to the system size.

Ma, Pui-Wai; Woo, C. H.

2009-04-01

276

Parallel algorithm for spin and spin-lattice dynamics simulations.

To control numerical errors accumulated over tens of millions of time steps during the integration of a set of highly coupled equations of motion is not a trivial task. In this paper, we propose a parallel algorithm for spin dynamics and the newly developed spin-lattice dynamics simulation [P. W. Ma, Phys. Rev. B 78, 024434 (2008)]. The algorithm is successfully tested in both types of dynamic calculations involving a million spins. It shows good stability and numerical accuracy over millions of time steps (approximately 1 ns) . The scheme is based on the second-order Suzuki-Trotter decomposition (STD). The usage can avoid numerical energy dissipation despite the trajectory and machine errors. The mathematical base of the symplecticity, for properly decomposed evolution operators, is presented. Due to the noncommutative nature of the spin in the present STD scheme, a unique parallel algorithm is needed. The efficiency and stability are tested. It can attain six to seven times speed up when eight threads are used. The run time per time step is linearly proportional to the system size. PMID:19518376

Ma, Pui-Wai; Woo, C H

2009-04-01

277

Simulation of multidimensional gaseous detonations with a parallel adaptive method

NASA Astrophysics Data System (ADS)

A detonation wave is a self-sustained, violent form of shock-induced combustion that is characterized by a subtle energetic interplay between leading hydrodynamic shock wave and following chemical reaction. Multidimensional gaseous detonations never remain planar and instead exhibit transverse shocks that form triple points with transient Mach reflection patterns. Their accurate numerical simulation requires a very high resolution around shock and reaction zone. A parallel adaptive finite volume method for the chemically reactive Euler equations for mixtures of thermally perfect gases has been developed for this purpose. Its key components are a high-resolution shock-capturing scheme of Roe-type, block-structured Cartesian mesh adaptation, and operator splitting to handle stiff, detailed kinetics. Beside simple verification examples to quantify the savings in wall time from mesh adaptation and parallelization, large-scale computations of Chapman-Jouguet detonations in low-pressure hydrogen-oxygen-argon mixtures will be discussed. These computations allowed the detailed analysis of triple point structures under transient conditions and a comparison between two and three space dimensions.

Deiterding, Ralf

2008-11-01

278

This paper describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical wireless network simulations are limited in speed and scale due to redundant computations, both within a single simulation run and between successive runs. Staged simulation proposes to reduce the amount of redundant computation within a simulation by restructuring discrete event

Kevin Walsh II; Emin Gün Sirer

2003-01-01

279

Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Program summaryProgram title: DCCRG Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second.

Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

2013-04-01

280

Simulation of the charge transfer inefficiency of column parallel CCDs

NASA Astrophysics Data System (ADS)

Charge-Coupled Devices (CCDs) have been successfully used in several high-energy physics experiments over the past two decades. Their high spatial resolution and thin sensitive layer make them an excellent tool for studying short-lived particles. The Linear Collider Flavour Identification (LCFI) collaboration is developing Column Parallel CCDs (CPCCDs) for the vertex detector of the International Linear Collider (ILC). The CPCCDs can be read out many times faster than standard CCDs, significantly increasing their operating speed. The results of detailed simulations of the Charge Transfer Inefficiency (CTI) of a prototype CPCCD chip are reported. The effects of the radiation damage on the CTI of a Si-based CCD particle detector are studied by simulating the effects of two electron trap levels Ec-0.17 and Ec-0.44 eV at different concentrations and operating temperatures. The dependence of the CTI on different occupancy levels (percentage of hit pixels) and readout frequencies is also studied. The optimal operating temperature—where the effects of the trapping are at a minimum—is found to be ˜230 K for the range of readout speeds proposed for the ILC.

Maneuski, Dzmitry

2008-06-01

281

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

The design of future parallel computers requires rapid simulation of target designs running realistic workloads. These simulations have been accelerated using two techniques: direct execution and the use of a parallel host. Historically, these techniques have been considered to have poor portability. This paper identi- fies and describes the implementation of four key oper- ations necessary to make such simulation

Shubhendu S. Mukherjee; Steven K. Reinhardt; Babak Falsafi; Mike Litzkow; Steve Huss-Lederman; Mark D. Hill; James R. Larus; David A. Wood

1997-01-01

282

Staged simulation: A general technique for improving simulation scale and performance

This article describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical network simulations are limited in speed and scale due to redundant computations encountered both within a single simulation run and between successive runs. Staged simulation proposes to restructure discrete event simulators to operate in stages that precompute, cache, and reuse

Kevin Walsh; Emin Gün Sirer

2004-01-01

283

Staged simulation for improving scale and performance of wireless network simulations

This paper describes staged simulation, a technique for improving the run time performance and scale of discrete event simulators. Typical wireless network simulations are limited in speed and scale due to redundant computations, both within a single simulation run and between successive runs. Staged simulation proposes to reduce the amount of redundant computation within a simulation by restructuring discrete event

Kevin Walsh; Emin Gün Sirer

2003-01-01

284

This paper proposes a systematic approach for the design of a supervisory controller for discrete event systems (DES) and\\u000a their ladder logic diagrams (LLD). The method is based on Petri nets, which is used for modeling the systems. It involves,\\u000a defining the control policy and simplifying it by using Espresso software to form the compiled controller by adding inhibiting\\u000a and

G. Cansever; I. B. Kucukdemiral

2006-01-01

285

The evolutionary model of physics large-scale simulation on parallel dataflow architecture

The problem of effective mapping of computational algorithms to parallel architecture is very important in the large-scale simulation. The developed model allows us to explore and utilize fine-grain parallelism, as well as coarse-grain parallelism. The model was tested by the nonlinear 3D magnetohydrodynamic (MHD) code.

A. V. Nikitin; L. I. Nikitina

2003-01-01

286

We investigate the applicability of the synchronous relaxation (SR) algorithm to parallel kinetic Monte Carlo simulations of simple models of thin film growth. A variety of techniques for optimizing the parallel efficiency are also presented. We find that the parallel efficiency is determined by three main factors---the calculation overhead due to relaxation iterations to correct boundary events in neighboring processors,

Yunsic Shim; Jacques G. Amar

2005-01-01

287

Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

NASA Technical Reports Server (NTRS)

The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

Baganoff, Donald

1996-01-01

288

Variance reduction algorithms for parallel replicated simulation of uniformized Markov chains

We discuss the simulation ofM replications of a uniformizable Markov chain simultaneously and in parallel (the so-called parallel replicated approach). Distributed implementation on a number of processors and parallel SIMD implementation on massively parallel computers are described. We investigate various ways of inducing correlation across replications in order to reduce the variance of estimators obtained from theM replications. In particular,

Simon Streltsov; Pirooz Vakili

1996-01-01

289

Hybrid asynchronous algorithm for parallel kinetic Monte Carlo simulations of thin film growth

We have generalized and implemented the hybrid asynchronous algorithm, originally proposed for parallel simulations of the spin-flip Ising model, in order to carry out parallel kinetic Monte Carlo (KMC) simulations. The parallel performance has been tested using a simple model of thin-film growth in both 1D and 2D. We also briefly describe how the data collection must be modified as

Yunsic Shim; Jacques G.. Amar

2006-01-01

290

Integration of parallel computation and dynamic mesh refinement for transient spray simulation

This study developed parallel computing schemes to enhance the computational efficiency of engine spray simulations when adaptive mesh refinement was used. Spray simulations have been shown to be grid dependent and thus fine mesh is often used to improve solution accuracy. In this study, dynamic mesh refinement adaptive to the spray region was developed and parallelized. The change of element

Yuanhong Li; Song-Charng Kong

2009-01-01

291

Parallel-in-time implementation of transient stability simulations on a transputer network

The most time consuming computer simulation in power system studies is the transient stability analysis. Parallel processing has been applied for time domain simulations of power system transient behavior. In this paper, a parallel implementation of an algorithm based on Shifted-Picard dynamic iterations is presented. The main idea is that a set of nonlinear Differential Algebraic Equations (DAEs), which describes

M. La Scala; G. Sblendorio; R. Sbrizzai

1994-01-01

292

Parallel standard cell placement algorithms with quality equivalent to simulated annealing

Parallel algorithms with quality equivalent to the simu- lated annealing placement algorithm for standard cells (23) are pre- sented. The first, called heuristic spanning, creates parallelism by simultaneously investigating different areas of the plausible combina- torial search space. It is used to replace the high temperature portion of simulated annealing. The low temperature portion of Simulated An- nealing is sped

JONATHAN S. ROSE; W. Martin Snelgrove; Zvonko G. Vranesic

1988-01-01

293

A natural partitioning scheme for parallel simulation of multibody systems

NASA Technical Reports Server (NTRS)

A parallel partitioning scheme based on physical-coordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure, and Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

Chiou, J. C.; Park, K. C.; Farhat, C.

1991-01-01

294

A natural partitioning scheme for parallel simulation of multibody systems

NASA Technical Reports Server (NTRS)

A parallel partitioning scheme based on physical-co-ordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure and a Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

Chiou, J. C.; Park, K. C.; Farhat, C.

1993-01-01

295

Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution

Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.

Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

2008-01-01

296

Package for the Simulation of Asynchronous Parallel Systems.

National Technical Information Service (NTIS)

High interaction rate colliders impose stringent requirements on data acquisition and triggering systems. These systems can only be realized by using asynchronous parallel networks of processors. The accurate prediction of the performance of these network...

D. M. Gingrich I. C. McArthur P. M. Hallam-Baker

1990-01-01

297

Parallel climate model (PCM) control and transient simulations

The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3)\\u000a and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory\\u000a Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on\\u000a several distributed and

W. M. Washington; J. W. Weatherly; G. A. Meehl; A. J. Semtner Jr.; T. W. Bettge; A. P. Craig; W. G. Strand Jr.; J. M. Arblaster; V. B. Wayland; R. James; Y. Zhang

2000-01-01

298

GloMoSim: A Library for Parallel Simulation of Large-Scale Wireless Networks

A number of library-based parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication protocol stack for wireless networks is divided into a set of layers, each with its own API. Models of

Xiang Zeng; Rajive Bagrodia; Mario Gerla

1998-01-01

299

High-fidelity and time-driven simulation of large wireless networks with parallel processing

Parallel processing is a promising technique for reducing the execution time of a large and complex wireless network simulation. In this article, a parallel processing technique is presented for time-driven simulations of large and complex wireless networks. The technique explicitly considers the physical-layer details of wireless network simulators such as shadowing and co-channel interference. The technique uses multiple processors interconnected

Hyunok Lee; Vahideh Manshadi; Donald C. Cox

2009-01-01

300

Parallel Simulation of Oil Reservoirs on a Multi-core Stream Computer

With the oil barrel price presently crippling the world economy, developing fast oil reservoir simulators is as important\\u000a as ever. This article describes the parallelization and development of a 2-phase oil-water reservoir simulator on the state-of-the-art\\u000a IBM Cell computer. The interdependent linear algebraic equations of the reservoir simulator is presented as well as the pipelined\\u000a time step parallelization approach adopted

Fadi N. Sibai; Hashir Karim Kidwai

2009-01-01

301

Large Scale Parallel 3D Simulation of Regional Wave Propagation Using the Earth Simulator

NASA Astrophysics Data System (ADS)

This paper presents an efficient parallel code for seismic wave propagation in 3D heterogeneous structures developed for implementation on the Earth Simulator (5120 CPUs, 40 TFLOPS)_@at the JAMSTEC Yokohama Institute, a high-performance vector parallel system suitable for large-scale simulations. The equations of motion for the 3D wavefield are solved using a higher-order (8,16, 32 etc.) staggered-grid finite-difference method (FDM) in the horizontal (x,y) directions and a conventional fourth-order FDM in the vertical (z) direction. Compared to traditional Fourier pseudospectral method (PSM), the higher-order FDM achieves very good performance on vector processors as well as on the latest high-performance microprocessors (Intel Pentium 4, Itanium 2 etc.). The parallel computing is based on partition of the computational domain, with each subregion assigned to a node of the Earth Simulator. Message passing interface (MPI) inter-node communication is employed for data exchange between subregions. Small-scale heterogeneities such as low-velocity sedimentary basins are accounted for in large-scale models by adopting a multi-grid approach that combines a coarse mesh model with embedded finer mesh model. An accurate interpolation procedure, based on the fast Fourier transform (FFT), is used to combine the wavefield in the different grids. The results of application of the multi-grid, parallel FDM code on the Earth Simulator for modeling strong ground motions from recent large earthquakes are also presented. Events such as the 1993 Kushiro (Mj7.8) and 2000 Tottori-ken Seibu (Mj7.3) earthquakes were simulated using 3D structural models of northern and western Japan. The subsurface structures in Japan were derived by combining data from a number of reflection and refraction experiments, Bouguer anomaly data, and travel-time tomography studies of P and S waves. The scale of the 3D model is about 500 km by 1000 km by 350 km, which is divided into grid intervals of 0.5 to 1 km. The simulation required 128 to 364 Gb of computation memory, and computation took 1 to 2 h using 256 to 1408 processors of the Earth Simulator. Assuming a minimum shear wave velocity of Vs = 1.7 km/s, the modeling is capable of treating high-frequency seismic wave propagations of over 1 to 2 Hz. The volume rendering technique was employed to illuminate the 3D wavefield, and a set of snapshots was combined into a video sequence. The high-resolution 3D simulations for frequencies over 1 Hz provide a good representation of wave propagation in Japan during the large earthquakes. The computer simulation also matches the observations by the dense seismic array (K-Net and KiK-net, over 1700 stations) well, demonstrating the effectiveness of the simulation model. The combined studies of high-resolution computer simulation and dense seismic observation can therefore be expected to be highly valuable in understanding the complex seismic behavior associated with heterogeneities in the subsurface structure, and for predicting the pattern of ground motions expected for future earthquake scenarios.

Furumura, T.

2003-12-01

302

Parallel Simulation of a High-Speed Wormhole Routing Network

A flexible simulator has been developed to simulatea two-level metropolitan area network which useswormhole routing. To accurately model the natureof wormhole routing, the simulator performs discretebyterather than discrete-packet simulation. Despitethe increased computational workload that this implies,it has been possible to create a simulator with acceptableperformance by writing it in Maisie, a paralleldiscrete-event simulation language. The simulatorprovides an accurate model of...

Rajive Bagrodia; Yu-an Chen; Mario Gerla; Bruce Kwan; Jay Martin; Prasasth Palnati; Simon Walton

1996-01-01

303

Parallel computing in enterprise modeling.

This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

2008-08-01

304

Scalable Simulation of Electromagnetic Hybrid Codes

New discrete-event formulations of physics simulation models are emerging that can outperform models based on traditional time-stepped techniques. Detailed simulation of the Earth's magnetosphere, for example, requires execution of sub-models that are at widely differing timescales. In contrast to time-stepped simulation which requires tightly coupled updates to entire system state at regular time intervals, the new discrete event simulation (DES) approaches help evolve the states of sub-models on relatively independent timescales. However, parallel execution of DES-based models raises challenges with respect to their scalability and performance. One of the key challenges is to improve the computation granularity to offset synchronization and communication overheads within and across processors. Our previous work was limited in scalability and runtime performance due to the parallelization challenges. Here we report on optimizations we performed on DES-based plasma simulation models to improve parallel performance. The net result is the capability to simulate hybrid particle-in-cell (PIC) models with over 2 billion ion particles using 512 processors on supercomputing platforms.

Perumalla, Kalyan S [ORNL; Fujimoto, Richard [ORNL; Karimabadi, Dr. Homa [SciberQuest Inc.

2006-01-01

305

An Integrated Simulation Environment for Parallel and Distributed System Prototyping

to given workloads. The scope and interaction of applications, operating systems, communication networks, processors, and other hardware and software lead to substantial system complexity. Development of virtual prototypes in lieu of physical prototypes can result in tremendous savings, especially when created in concert with a powerful model development tool. When high-fidelity models of parallel architecture are coupled with workloads generated

Alan D. George; Ryan B. Fogarty; Jeff S. Markwell; Michael D. Miars

1999-01-01

306

Towards parallel I/O in finite element simulations

NASA Technical Reports Server (NTRS)

I/O issues in finite element analysis on parallel processors are addressed. Viable solutions for both local and shared memory multiprocessors are presented. The approach is simple but limited by currently available hardware and software systems. Implementation is carried out on a CRAY-2 system. Performance results are reported.

Farhat, Charbel; Pramono, Eddy; Felippa, Carlos

1989-01-01

307

A three-dimensional, relativistic, electromagnetic particle simulation code is parallelized in distributed memories by High\\u000a Performance Fortran (HPF). In this code, the “ Exact Charge Conservation Scheme” is used as a method for calculating current\\u000a densities. In this paper, some techniques to optimize this code for a vector-parallel supercomputer are presented. In particular,\\u000a methods for parallelization and vectorization are discussed. Examination

Hiroki Hasegawa; Seiji Ishiguro; Masao Okamoto

308

Parallel Two-phase Flow Simulation and Representative Elementary Volume

Two-phase flow simulation using the Lattice-Boltzmann method has drawn great attention, as it can simulate a variety of two-fluid flow situations that are difficult in laboratory. The capability to use the real pore structure geometry is a particular strength of the model, compared to the more widely used pore network simulators. We have already shown that two-phase flow simulation can

Y. Keehm; T. Mukerji; A. Nur

2002-01-01

309

The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semirigorous synchronous sublattice algorithm for parallel kinetic Monte

Yunsic Shim; Jacques G. Amar

2005-01-01

310

National Technical Information Service (NTIS)

This reference manual provides instructions for determining atomistic material properties important for modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular dyn...

L. B. Munday P. W. Chung

2013-01-01

311

Computer simulation program for parallel SITAN. [Sandia Inertia Terrain-Aided Navigation, in FORTRAN

This computer program simulates the operation of parallel SITAN using digitized terrain data. An actual trajectory is modeled including the effects of inertial navigation errors and radar altimeter measurements.

Andreas, R.D.; Sheives, T.C.

1980-11-01

312

National Technical Information Service (NTIS)

Direct numerical simulations of separated-reattaching and separated flows have been performed on massively parallel processing computers. Two basic geometrical configurations have been studied: the separated-reattaching flow past a normal flat plate with ...

F. M. Najjar

1994-01-01

313

Three-dimensional shock wave physics simulations on massively parallel supercomputers.

National Technical Information Service (NTIS)

Three applications of massively parallel computing to weapons development at Sandia National Laboratories are briefly described, including armor/antiarmor simulations. The numerical modeling of penetrator-armor interactions requires detailed, three-dimens...

D. R. Gardner H. E. Fang

1992-01-01

314

PROTEUS: A High-Performance Parallel-Architecture Simulator

Proteus is a high-performance simulator for MIMD multiprocessors. It is fast, accurate, and flexible:it is one to two orders of magnitude faster than comparable simulators, it can reproduce results from realmultiprocessors, and it is easily configured to simulate a wide range of architectures. Proteus providesa modular structure that simplifies customization and independent replacement of parts of architecture.There are typically multiple

Eric A. Brewer; Chrysanthos N. Dellarocas; Adrian Colbrook; William E. Weihl

1992-01-01

315

Parallel Adaptive Multi-Mechanics Simulations using Diablo

Coupled multi-mechanics simulations (such as thermal-stress and fluidstructure interaction problems) are of substantial interest to engineering analysts. In addition, adaptive mesh refinement techniques present an attractive alternative to current mesh generation procedures and provide quantitative error bounds that can be used for model verification. This paper discusses spatially adaptive multi-mechanics implicit simulations using the Diablo computer code. (U)

Parsons, D; Solberg, J

2004-12-03

316

Simulation of hypervelocity impact on massively parallel supercomputer

Hypervelocity impact studies are important for debris shield and armor\\/anti-armor research and development. Numerical simulations are frequently performed to complement experimental studies, and to evaluate code accuracy. Parametric computational studies involving material properties, geometry and impact velocity can be used to understand hypervelocity impact processes. These impact simulations normally need to address shock wave physics phenomena, material deformation and failure,

1994-01-01

317

Parallel Adaptive Numerical Simulation of Dry Avalanches over Natural Terrain

High fidelity computational simulation can be an invaluable tool in planning strate- gies for hazard risk mitigation. The accuracy and reliability of the predictions are crucial elements of these tools being successful. We present here a new simulation tool for dry granular avalanches using several new techniques to create a highly accurate tool with a combination of good models and

A. K. Patra; A. C. Bauer; C. C. Nichita; E. B. Pitman; M. F. Sheridan; M. Namikawa; S. Renschler

2003-01-01

318

Xyce parallel electronic simulator users' guide, Version 6.0.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory. [Raytheon, Albuquerque, NM] [Raytheon, Albuquerque, NM

2014-01-01

319

Pelegant: A Parallel Accelerator Simulation Code for Electron Generation and Tracking

elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

Wang, Y.; Borland, M. [Advanced Photon Source, Argonne National Laboratory, Argonne, IL 60439 (United States)

2006-11-27

320

Pelegant : a parallel accelerator simulation code for electron generation and tracking.

elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

Wang, Y.; Borland, M. D.; Accelerator Systems Division (APS)

2006-01-01

321

Development of Multi Agent Simulation Modeling System \\

The mathematical model of multi agent resource conversion processes (RCP) is developed by the means of discrete-event simulation systems and expert systems. Within the framework of mathematical model RCP are defined: production system of the RCP structure, that taking into account conflicts origin. The discrete-event simulation and expert system \\

Konstantin A. Aksyonov; Elena F. Smoliy; Natalia V. Goncharova; Alexey A. Khrenov

2007-01-01

322

Parallel simulation of compressible flow using automatic differentiation and PETSc.

Many aerospace applications require parallel implicit solution strategies and software. The use of two computational tools, the Portable, Extensible Toolkit for Scientific computing (PETSc) and ADIFOR, to implement a Newton-Krylov-Schwarz method with pseudo-transient continuation for a particular application, namely, a steady-state, fully implicit, three-dimensional compressible Euler model of flow over an M6 wing is considered. How automatic differentiation (AD) can be used within the PETSc framework to compute the required derivatives is described. Performance data demonstrating the suitability of AD and PETSc for this problem are presented. A synopsis of results and a description of opportunities for future work concludes this paper.

Hovland, P. D.; McInnes, L. C.; Mathematics and Computer Science

2001-03-01

323

Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper develops a straightforward but effective algorithm for molecular dynamics simulations using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a network of high performance Unix workstations. Performance benchmarks were performed on both systems using two proteins. This algorithm offers a portable cost-effective alternative for molecular dynamics simulations. In view of the increasing numbers of networked workstations, this approach could help make molecular dynamics simulations more easily accessible to the research community.

Shifman, M. A.; Windemuth, A.; Schulten, K.; Miller, P. L.

1991-01-01

324

Potts-model grain growth simulations: Parallel algorithms and applications.

National Technical Information Service (NTIS)

Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural m...

S. A. Wright S. J. Plimpton T. P. Swiler

1997-01-01

325

Massively Parallel Simulation and Optimization of Queueing Networks.

National Technical Information Service (NTIS)

We simulate several variants of a class of queueing networks corresponding to different system parameter values or operating policies - simultaneously. One clock mechanism is used to drive all the variants. This clock synchronizes the system trajectories ...

P. Vakili E. Lau

1992-01-01

326

Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

Padovan, Joe

1995-01-01

327

Parallel simulation of the global epidemiology of Avian Influenza

SEARUMS is an Eco-modeling, bio-simulation, and analy- sis environment to study the global epidemiology of Avian Influenza. Originally developed in Java, SEARUMS enables comprehensive epidemiological analysis, forecast epicen- ters, and time lines of epidemics for prophylaxis; thereby mitigating disease outbreaks. However, SEARUMS-based simulations were time consuming due to the size and com- plexity of the models. In an endeavor to

Dhananjai M. Rao; Alexander Chernyakhovsky

2008-01-01

328

NWO-P: Parallel Simulation of the Alewife Machine

Thispaper provides a brief overview of that effort, sample results indicatingthe performance of the current implementation, and a fewcomments about future work. The CM-5 port of our simulator hasbeen operational since June 1992 and has proved invaluable, especiallyfor running simulations of large Alewife systems (64 to 512nodes).2 Alewife OverviewAlewife is an experimental distributed-memory multiprocessor underconstruction at the MIT Laboratory for

Kirk Johnson; David Chaiken; Alan Mainwaring; Alewife CMMU

1993-01-01

329

Parallel adaptive numerical simulation of dry avalanches over natural terrain

High-fidelity computational simulation can be an invaluable tool in planning strategies for hazard risk mitigation. The accuracy and reliability of the predictions are crucial elements of these tools being successful. We present here a new simulation tool for dry granular avalanches using several new techniques for enhancing numerical solution accuracy.Highlights of our new methodology are the use of a depth-averaged

A. K. Patra; A. C. Bauer; C. C. Nichita; E. B. Pitman; M. F. Sheridan; M. Bursik; B. Rupp; A. Webber; A. J. Stinton; L. M. Namikawa; C. S. Renschler

2005-01-01

330

Modular high-temperature gas-cooled reactor simulation using parallel processors

The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulation routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (''operator'') interaction. The benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses. 3 refs., 3 figs.

Ball, S.J.; Conklin, J.C.

1989-01-01

331

A Parallel Simulated Annealing Approach to Solve for Earthquake Rupture Rates

NASA Astrophysics Data System (ADS)

We present a parallel approach to the classic simulated annealing algorithm (Kirkpatrick 1983) in order to solve for the rates of earthquake ruptures in California's complex fault system, being developed for the 3rd Uniform California Earthquake Rupture Forecast (UCERF3). Through the use of distributed computing, we have achieved substantial speedup when compared to serial simulated annealing. We will describe the parallel simulated annealing algorithm in detail, as well as the parallelization parameters used and their effect on speedup (time to convergence, or alternatively a specified energy level) and communications efficiency. Additionally we will discuss the correlation between performance of the parallel algorithm and the degree of constraints on the solution. We will present scaling results to thousands of processors, and experiences with the MPJ Express Java Message Passing Library (Baker 2006) on the University of Southern California's High Performance Computing and Communications cluster.

Milner, K.; Page, M. T.; Field, E. H.

2011-12-01

332

Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow.

Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

2014-01-01

333

Parallelism and pipelining in high-speed digital simulators

The requirements of engineers and scientists for ever more powerful simulators has been one of the principal driving forces in the development of scientific computers. The attainment of high computing speed as measured by the computational throughput has always been one of the most, if not the most, challenging requirement. High-speed becomes of paramount importance in several distinct classes of applications: real-time simulation of dynamic systems, distributed parameter systems; and mixed lumped and distributed systems. The author considers the development of supercomputers, peripheral array processors and microcomputer networks for these applications. 9 references.

Karplus, W.J.

1982-01-01

334

We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications in this domain model and simulate natural phenomena, and often direct visual components of motion pictures. We study a set of three workloads that exemplify the span

Christopher J. Hughes; Radek Grzeszczuk; Eftychios Sifakis; Daehyun Kim; Sanjeev Kumar; Andrew P. Selle; Jatin Chhugani; Matthew Holliman; Yen-Kuang Chen

2007-01-01

335

Traditional approaches to the distributed simulation of digital designs are limited in that they are inefficient and prone to deadlock for systems with feedback loops. This paper proposes an asynchronous distributed algorithm to the simulation and verification of behavior-level models and describes its implementation on an actual loosely-coupled parallel processor. The approach is relatively efficient for realistic digital designs and

Sumit Ghosh; Meng-lin Yu

1995-01-01

336

Parallel proton fire hose instability in the expanding solar wind: Hybrid simulations

We report a study of the properties of the parallel proton fire hose instability comparing the results obtained by the linear analysis, from one-dimensional (1-D) standard hybrid simulations and 1-D hybrid expanding box simulations. The three different approaches converge toward the same instability threshold condition which is in good agreement with in situ observations, suggesting that such instability is relevant

Lorenzo Matteini; Simone Landi; Petr Hellinger; Marco Velli

2006-01-01

337

Development of a parallel three-phase transient stability simulator for power systems

This paper discusses the development of a parallel three-phase transient stability simulator that is built using the high performance computing library PETSc. Unlike the existing transient stability simulators that use a balanced per phase transmission network, the authors use a three phase transmission network. This three phase representation allows a more realistic analysis of unbalanced conditions due to untransposed transmission

Shrirang Abhyankar; Alexander Flueck; Xu Zhang; Hong Zhang

2011-01-01

338

We present a simulation tool of parallel architectures for image digital treatment applications, which is characterized by the simulation of different architectures based on RISC DLX processors and an interconnection network based on wormhole routing. Each node is provided with a computational DLX processor and another of similar features, devoted to control the communication network

V. Valero; F. Cuartero; A. Garrido; F. Quiles

1995-01-01

339

The COAST (for Computational Astrophysics) project is a program of massively parallel numerical simulations in astrophysics involving astrophysicists and software engineers from CEA\\/IRFU Saclay. The scientific objective is the understanding of the formation of structures in the Universe, including the study of large- scale cosmological structures and galaxy formation, turbulence in interstellar medium, stellar magnetohydrodynamics and protoplanetary systems. The simulations

B. Thooris; E. Audit; A. S. Brun; Y. Fidaali; F. Masset; D. Pomarède; R. Teyssier

340

Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

2011-09-06

341

Characterization of parallelism and deadlocks in distributed digital logic simulation

This paper explores the suitability of the Chandy-Misra algorithm for digital logic simulation. We use four realistic circuits as benchmarks for our analysis, with one of them being the vector-unit controller for the Titan supercomputer from Ardent. Our results show that the average number of logic elements available for concurrent execution ranges from 10 to 111 for the four circuits,

Larry Soulé; Anoop Gupta

1989-01-01

342

Simulation of multidimensional gaseous detonations with a parallel adaptive method

A detonation wave is a self-sustained, violent form of shock-induced combustion that is characterized by a subtle energetic interplay between leading hydrodynamic shock wave and following chemical reaction. Multidimensional gaseous detonations never remain planar and instead exhibit transverse shocks that form triple points with transient Mach reflection patterns. Their accurate numerical simulation requires a very high resolution around shock and

Ralf Deiterding

2008-01-01

343

Distributed intelligence in large scale traffic simulations on parallel computers

Transportation systems can be seen as displaying meta-intelligence, in the sense that intelligent actors (travelers) conspire to make the system function as a whole. In simulations one can model this by resolving each traveler individually, and giv- ing each traveler rules according to which she\\/he generates goals and then attempts to achieve them. The system as a whole has no

Kai Nagel

344

A Parallel Processing System for Simulations of Vortex Blob Interactions

The vortex method in the simulation of 2D incompressible flows with complex interacting circulations is very attractive if compared with other nowadays widespread methods. However, the vortex method is computationally very expensive and suggests the adoption of suitably powerful computing units. Our group analysed some parallelisation techniques of the algorithm, in order to obtain the best performances on a dedicated

G. Braschi; Giovanni Danese; Ivo De Lotto; D. Dotti; M. Gallati; Francesco Leporati; M. Mazzoleni

1996-01-01

345

Parallel FEM Simulation of Crack Propagation - Challenges, Status, and Perspectives

Understanding how fractures develop in materials is crucial to many disciplines, e.g., aeronautical engineering, material\\u000a sciences, and geophysics. Fast and accurate computer simlation of crack propagation in realistic 3D structures would be a\\u000a valuable tool for engineers and scientists exploring the fracture process in materials. In the following, we will describe\\u000a a next generation crack propagation simulation software that aims

Bruce Carter; Chuin-shan Chen; L. Paul Chew; Nikos Chrisochoides; Guang R. Gao; Gerd Heber; Anthony R. Ingraffea; Roland Krause; Chris Myers; Démian Nave; Keshav Pingali; Paul Stodghill; Stephen A. Vavasis; Paul A. Wawrzynek

2000-01-01

346

Parallel Simulation for Parameter Estimation of Optical Tissue Properties

\\u000a Several important laser-based medical treatments rest on the crucial knowledge of the response of tissues to laser penetration.\\u000a Optical properties are often localised and are measured using optically active fluorescent microspheres injected into the\\u000a tissue. However, the measurement process combines the tissue properties with the optical characteristics of the measuring\\u000a device which in turn requires numerically intensive mathematical simulations for

Mihai Duta; Jeyarajan Thiyagalingam; Anne E. Trefethen; Ayush Goyal; Vicente Grau; Nic Smith

2010-01-01

347

Partitioning and packing mathematical simulation models for calculation on parallel computers

NASA Technical Reports Server (NTRS)

The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

Arpasi, D. J.; Milner, E. J.

1986-01-01

348

xSim: The Extreme-Scale Simulator

Investigating parallel application performance properties at scale is becoming an important part of high-performance computing (HPC) application development and deployment. The Extreme-scale Simulator (xSim) is a performance investigation toolkit that permits running an application in a controlled environment at extreme scale without the need for a respective extreme-scale HPC system. Using a lightweight parallel discrete event simulation, xSim executes a parallel application with a virtual wall clock time, such that performance data can be extracted based on a processor model and a network model. This paper presents significant enhancements to the xSim toolkit prototype that provide a more complete Message Passing Interface (MPI) support and improve its versatility. These enhancements include full virtual MPI group, communicator and collective communication support, and global variables support. The new capabilities are demonstrated by executing the entire NAS Parallel Benchmark suite in a simulated HPC environment.

Boehm, Swen [ORNL; Engelmann, Christian [ORNL

2011-01-01

349

This project focuses on the design, development, implementation and optimization of methods, algorithms and software for large\\u000a scale simulations of free surface and multi-phase flows based on the generalized lattice Boltzmann method (GLBM). Parallel\\u000a solvers and cache optimized algorithms have been developed to simulate multi-phase and turbulent transient flows in complex\\u000a three-dimensional geometries. For the simulation of free surface problems

Jonas Tölke; Benjamin Ahrenholz; Jan Hegewald; Manfred Krafczyk

350

Multimillion atom molecular-dynamics (MD) simulations are performed to investigate dynamics of oxidation of aluminum nanoclusters and properties and processes in nanostructured silicon carbide (n-SiC) and nanostructured amorphous silica (n-a- SiO2 ). The simulations are based on reliable interatomic interactions that include both ionic and covalent effects. The simulations are carried out on parallel architectures using highly efficient O(N) multiresolutions algorithms

Rajiv K. Kalia; Timothy J. Campbell; Alok Chatterjee; Aiichiro Nakano; Priya Vashishta; Shuji Ogata

2000-01-01

351

Parallel macromolecular simulations and the replicated data strategy II. The RD-SHAKE algorithm

NASA Astrophysics Data System (ADS)

A parallel version of the SHAKE algorithm for the simulation on a distributed memory parallel computer of macromolecules with rigid atom-atom bonds is described. The method is based on the replicated data parallelisation strategy, which has been shown to be effective for macromolecules. The implementation of the algorithm on an Intel iPSC/860 computer is described and the performance issues discussed.

Smith, W.; Forester, T. R.

1994-02-01

352

Parallel molecular dynamics simulations for short-ranged many-body potentials

A new method is described that permits the efficient execution of parallel molecular dynamics simulations for irregular problems with several thousands of atoms on Single-Instruction Multiple-Data computers. The approach is based on a data-parallel atomic decomposition scheme and has overall time-complexity O(N) , where N is the size of the system. The method has been implemented on a MasPar MP-1

C. F. Cornwell; L. T. Wille

2000-01-01

353

Parallel FEM Simulation of Electromechanics in the Heart

NASA Astrophysics Data System (ADS)

Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.

Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

2011-11-01

354

A new parallel P 3 M code for very large-scale cosmological simulations

We have developed a parallel Particle–Particle, Particle–Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http:\\/\\/coho.astro.uwo.ca\\/pub\\/hydra\\/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm

Tom MacFarland; H. M. P. Couchman; Frazer Pearce; Jakob Pichlmeier

1998-01-01

355

GPU-based simulation of the long-range Potts model via parallel tempering

NASA Astrophysics Data System (ADS)

We discuss the efficiency of parallelization on graphical processing units (GPUs) for the simulation of the one-dimensional Potts model with long-range interactions via parallel tempering. We investigate the behavior of some thermodynamic properties, such as equilibrium energy and magnetization, critical temperatures as well as the separation between the first- and second-order regimes. By implementing multispin coding techniques and an efficient parallelization of the interaction energy computation among threads, the GPU-accelerated approach reached speedup factors of up to 37.

Boer, Attila

2014-07-01

356

A component-based parallel infrastructure for the simulation of fluid-structure interaction

The Uintah computational framework is a component-based infrastructure, designed for highly parallel simulations of complex\\u000a fluid–structure interaction problems. Uintah utilizes an abstract representation of parallel computation and communication\\u000a to express data dependencies between multiple physics components. These features allow parallelism to be integrated between\\u000a multiple components while maintaining overall scalability. Uintah provides mechanisms for load-balancing, data communication,\\u000a data I\\/O, and

Steven G. Parker; James Guilkey; Todd Harman

2006-01-01

357

NASA Technical Reports Server (NTRS)

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

Sohn, Andrew; Biswas, Rupak

1996-01-01

358

Parallelization issues of a code for physically-based simulation of fabrics

NASA Astrophysics Data System (ADS)

The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.

Romero, Sergio; Gutiérrez, Eladio; Romero, Luis F.; Plata, Oscar; Zapata, Emilio L.

2004-10-01

359

Parallel computation for reservoir thermal simulation: An overlapping domain decomposition approach

NASA Astrophysics Data System (ADS)

In this dissertation, we are involved in parallel computing for the thermal simulation of multicomponent, multiphase fluid flow in petroleum reservoirs. We report the development and applications of such a simulator. Unlike many efforts made to parallelize locally the solver of a linear equations system which affects the performance the most, this research takes a global parallelization strategy by decomposing the computational domain into smaller subdomains. This dissertation addresses the domain decomposition techniques and, based on the comparison, adopts an overlapping domain decomposition method. This global parallelization method hands over each subdomain to a single processor of the parallel computer to process. Communication is required when handling overlapping regions between subdomains. For this purpose, MPI (message passing interface) is used for data communication and communication control. A physical and mathematical model is introduced for the reservoir thermal simulation. Numerical tests on two sets of industrial data of practical oilfields indicate that this model and the parallel implementation match the history data accurately. Therefore, we expect to use both the model and the parallel code to predict oil production and guide the design, implementation and real-time fine tuning of new well operating schemes. A new adaptive mechanism to synchronize processes on different processors has been introduced, which not only ensures the computational accuracy but also improves the time performance. To accelerate the convergence rate of iterative solution of the large linear equations systems derived from the discretization of governing equations of our physical and mathematical model in space and time, we adopt the ORTHOMIN method in conjunction with an incomplete LU factorization preconditioning technique. Important improvements have been made in both ORTHOMIN method and incomplete LU factorization in order to enhance time performance without affecting the convergence rate of iterative solution. More importantly, the parallel implementation may serve as a working platform for any further research, for example, building and testing new physical and mathematical models, developing and testing new solver of pertinent linear equations system, etc.

Wang, Zhongxiao

360

Robust large-scale parallel nonlinear solvers for simulations.

This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.

Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)

2005-11-01

361

NASA Technical Reports Server (NTRS)

The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

Krosel, S. M.; Milner, E. J.

1982-01-01

362

NASA Astrophysics Data System (ADS)

The paper presents a novel parallelized micromagnetic solver, engineered to run on massively parallel architectures based on graphical processing units and thus suitable for the efficient simulation of large magnetic nanostructures. The code adopts a multipole expansion technique for the evaluation of long-range magnetostatic interactions and enables the calculation of the exchange field also on non-structured meshes. Its computational performances are here tested in the simulation of the hysteresis loops of dot array films with maximum size in the order of 5 ?m.

Bottauscio, O.; Manzin, A.

2014-05-01

363

Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

364

A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

Jones, Mark Howard

1987-01-01

365

The Parallel-Plate Bounded-Wave EMP Simulator is typically used to test the vulnerability of electronic systems to the electromagnetic pulse (EMP) produced by a high altitude nuclear burst by subjecting the systems to a simulated EMP environment. However, when large test objects are placed within the simulator for investigation, the desired EMP environment may be affected by the interaction between the simulator and the test object. This simulator/obstacle interaction can be attributed to the following phenomena: (1) mutual coupling between the test object and the simulator, (2) fringing effects due to the finite width of the conducting plates of the simulator, and (3) multiple reflections between the object and the simulator's tapered end-sections. When the interaction is significant, the measurement of currents coupled into the system may not accurately represent those induced by an actual EMP. To better understand the problem of simulator/obstacle interaction, a dynamic analysis of the fields within the parallel-plate simulator is presented. The fields are computed using a moment method solution based on a wire mesh approximation of the conducting surfaces of the simulator. The fields within an empty simulator are found to be predominately transversse electromagnetic (TEM) for frequencies within the simulator's bandwidth, properly simulating the properties of the EMP propagating in free space. However, when a large test object is placed within the simulator, it is found that the currents induced on the object can be quite different from those on an object situated in free space. A comprehensive study of the mechanisms contributing to this deviation is presented.

Gedney, S.D.

1990-12-01

366

This article describes the application of multiphysics simulation on parallel computing platforms to model aeroelastic instabilities and flow-induced vibrations. Multiphysics simulation is based on a single computational framework for the modeling of multiple interacting physical phenomena. Within the multiphysics framework, the finite element treatment of fluids is based on the Galerkin-Least-Squares (GLS) method with discontinuity capturing operators. The arbitrary-Lagrangian—Eulerian (ALE)

Steven M. Rifai; Zden?k Johan; Wen-Ping Wang; Jean-Pierre Grisval; Thomas J. R. Hughes; Robert M. Ferencz

1999-01-01

367

Real-time simulation of MHD\\/steam power plants by digital parallel processors

Attention is given to a large FORTRAN coded program which simulates the dynamic response of the MHD\\/steam plant on either a SEL 32\\/55 or VAX 11\\/780 computer. The code realizes a detailed first-principle model of the plant. Quite recently, in addition to the VAX 11\\/780, an AD-10 has been installed for usage as a real-time simulation facility. The parallel processor

R. M. Johnson; D. A. Rudberg

1981-01-01

368

GRAPE3: highly parallelized special-purpose computer for gravitational many-body simulations

The authors have developed a highly parallelized special-purpose computer GRAPE (GRAvity PipE)-3 for gravitational many-body simulations. It accelerates gravitational force calculations which are the most expensive part of the many-body simulations. The peak computing speed is equivalent to about 15 GFLOPS. The GRAPE-3 system consists of two identical boards connected to a host computer through a VME bus. Each board

S. K. Okumura; J. Makino; T. Ebisuzaki; T. Ito; T. Fukushige; D. Sugimoto; E. Hashimoto; K. Tomida; N. Miyakawa

1992-01-01

369

Xyce parallel electronic simulator reference guide, Version 6.0.1.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] .

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory. [Raytheon, Albuquerque, NM] [Raytheon, Albuquerque, NM

2014-01-01

370

A hybrid simulation approach is developed to study chemical reactions coupled with long-range mechanical phenomena in materials. The finite-element method for continuum mechanics is coupled with the molecular dynamics method for an atomic system that embeds a cluster of atoms described quantum-mechanically with the electronic density-functional method based on real-space multigrids. The hybrid simulation approach is implemented on parallel computers

Shuji Ogata; Elefterios Lidorikis; Fuyuki Shimojo; Aiichiro Nakano; Priya Vashishta; Rajiv K. Kalia

2001-01-01

371

We explore the emerging application area of physics-based simu- lation for computer animation and visual special effects. I n par- ticular, we examine its parallelization potential and char acterize its behavior on a chip multiprocessor (CMP). Applications in this do- main model and simulate natural phenomena, and often direct vi- sual components of motion pictures. We study a set of

Christopher J. Hughes; Radek Grzeszczuk; Eftychios Sifakis; Daehyun Kim; Sanjeev Kumar; Andrew Selle; Jatin Chhugani; Matthew J. Holliman; Yen-kuang Chen

2007-01-01

372

PARALLEL MESH GENERATION FOR CFD SIMULATIONS OF COMPLEX REAL WORLD AERODYNAMIC PROBLEMS

The primary focus of this project is to design and implement a parallel framework for an unstructured mesh generator based on the advancing front method (AFM). In particular, we target large-scale Computational Fluid Dynamics (CFD) simulations of complex problems.

George Zagaris; Shahyar Pirzadeh; Andrey Chernikov; Nikos Chrisochoides

373

Parallel Octree-Based Finite Element Method for Large-Scale Earthquake Ground Motion Simulation

We present a parallel octree-based finite el- ement method for large-scale earthquake ground motion simulation in realistic basins. The octree representa- tion combines the low memory per node and good cache performance of finite difference methods with the spa- tial adaptivity to local seismic wavelengths characteris- tic of unstructured finite element methods. Several tests are provided to verify the numerical

J. Bielak; O. Ghattas; E.-J. Kim

2005-01-01

374

Comparison of gyrokinetic simulations of parallel plasma conductivity with analytical models

NASA Astrophysics Data System (ADS)

A full f gyrokinetic particle-in-cell simulation including Coulomb collisions is shown to reproduce the results from analytic estimates for parallel plasma conductivity in the collisional parameter regime, with reasonably good agreement, when varying the temperature and impurity content of the plasma. Differences between the models are discussed.

Kiviniemi, T. P.; Leerink, S.; Niskala, P.; Heikkinen, J. A.; Korpilo, T.; Janhunen, S.

2014-07-01

375

Simulations on Interruption of Circuit Breaker in HVDC System with Two Parallel Transmission Lines

This paper describes the Electro Magnetic Transients Program (EMTP) simulation results for a study on specifications and duty of an HVDC circuit breaker in an HVDC system. The HVDC breaker employs an inverse current injection method and a capacitor precharged from the line voltage. The HVDC system is monopolar, but has two parallel transmission lines and is the simplest multiterminal

S. Tokuyama; K. Hirasawa; Y. Yoshioka; Y. Kato

1987-01-01

376

Hardware In the Loop Simulation of a Diesel Parallel Mild-Hybrid Electric Vehicle

Hybrid vehicles present a real potential to reduce CO2 emission and energy dependency. The simulation of these vehicles is well adapted to highlight the first order influent parameters. However, more realistic components and HEVs performance versus cost could be identified and improved by testing using the HIL concept. This paper deals with the test and validation of a parallel mild-hybrid

R. Trigui; B. Jeanneret; B. Malaquin; F. Badin; C. Plasse

2007-01-01

377

Parallel Implementation of a Cellular Automaton Model for the Simulation of Laser Dynamics

Abstract. A parallel implementation for distributed-memory MIMD systems of a 2D discrete model of laser dynamics based on cellular automata is presented. The model has been implemented on a PC cluster using a message passing library. A good performance has been obtained, allowing us to run realistic simulations of laser systems in clusters of workstations, which could not be aorded

Jose Luis Guisado; Francisco Fernández De Vega; Francisco Jiménez-morales; K. A. Iskra

2006-01-01

378

Monte-Carlo simulation of electron properties in rf parallel plate capacitively coupled discharges

Electron properties in a parallel plate capacitively coupled rf discharge are studied with results from a Monte-Carlo simulation. Time averaged, spatially dependent electron distributions are computed by integrating, in time, electron trajectories as a function of position while oscillating the applied electric field at rf frequencies. The dc component of the sheath potential is solved for in a self-consistent manner

M. J. Kushner

1983-01-01

379

Discharge Growth Simulation between Parallel-Plate Electrodes Using Various Computation Techniques

In this work, a simulation of the discharge growth between parallel- plate electrodes is built up based on various computation techniques. The con- tinuity equations of different particle species existing in the discharge are solved by the characteristic method. The electric field is evaluated by a disc method and one dimensional Poisson's equation. Excited atoms or molecules and photons due

A. H. MUFTI

1999-01-01

380

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

381

Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

382

Advocates of Business Process (BP) approaches argue that the real value of IT is that it provokes innovative changes in business processes. Despite the fact that many BP and IT academics and practitioners agree on this idea, BP and IT design are still performed separately. Moreover, there is very little research that is concerned with studying the ways in which

JULIE EATOCK; RAY J. PAUL; ALAN SERRANO

383

A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers

NASA Astrophysics Data System (ADS)

One of the grand challenge problems now supported by HPCC is the Numerical Tokamak Project. A goal of this project is the study of low-frequency micro-instabilities in tokamak plasmas, which are believed to cause energy loss via turbulent thermal transport across the magnetic field lines. An important tool in this study is gyrokinetic particle-in-cell (PIC) simulation. Gyrokinetic, as opposed to fully-kinetic, methods are particularly well suited to the task because they are optimized to study the frequency and wavelength domain of the microinstabilities. Furthermore, many researchers now employ low-noise delta(f) methods to greatly reduce statistical noise by modelling only the perturbation of the gyrokinetic distribution function from a fixed background, not the entire distribution function. In spite of the increased efficiency of these improved algorithms over conventional PIC algorithms, gyrokinetic PIC simulations of tokamak micro-turbulence are still highly demanding of computer power--even fully-vectorized codes on vector supercomputers. For this reason, we have worked for several years to redevelop these codes on massively parallel computers. We have developed 3D gyrokinetic PIC simulation codes for SIMD and MIMD parallel processors, using control-parallel, data-parallel, and domain-decomposition message-passing (DDMP) programming paradigms. This poster summarizes our earlier work on codes for the Connection Machine and BBN TC2000 and our development of a generic DDMP code for distributed-memory parallel machines. We discuss the memory-access issues which are of key importance in writing parallel PIC codes, with special emphasis on issues peculiar to gyrokinetic PIC. We outline the domain decompositions in our new DDMP code and discuss the interplay of different domain decompositions suited for the particle-pushing and field-solution components of the PIC algorithm.

Williams, T. J.

1992-12-01

384

In this paper, parallel recombinative simulated annealing (PRSA), a hybrid method with features of simulated annealing and genetic algorithms, is examined. PRSA inherits the global convergence property from simulated annealing and the parallelism property from genetic algorithms. PRSA was implemented on a monoprocessor system as well as on a transputer. The algorithm, its parallel implementation, and its application to an NP-hard problem, namely standard cell placement in very large scale integration (VLSI) chip design, are described. PRSA was run for a large range of test cases. Since its performance depends on many parameters, the effects of parameter variations are studied in detail. Some important parameters are migration of individuals to other transputer nodes and selection strategies for constructing new populations. In comparison with simulated annealing and genetic algorithms, PRSA was found to produce better solutions. PMID:18255962

Kurbel, K; Schneider, B; Singh, K

1998-01-01

385

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

NASA Astrophysics Data System (ADS)

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.

Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji

2013-03-01

386

Application of parallel computing to seismic damage process simulation of an arch dam

NASA Astrophysics Data System (ADS)

The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

Zhong, Hong; Lin, Gao; Li, Jianbo

2010-06-01

387

NASA Technical Reports Server (NTRS)

This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

Lyons, Daniel T.; Desai, Prasun N.

2005-01-01

388

Parallel Monte Carlo simulations on an ARC-enabled computing grid

NASA Astrophysics Data System (ADS)

Grid computing opens new possibilities for running heavy Monte Carlo simulations of physical systems in parallel. The presentation gives an overview of GaMPI, a system for running an MPI-based random walker simulation on grid resources. Integrating the ARC middleware and the new storage system Chelonia with the Ganga grid job submission and control system, we show that MPI jobs can be run on a world-wide computing grid with good performance and promising scaling properties. Results for relatively communication-heavy Monte Carlo simulations run on multiple heterogeneous, ARC-enabled computing clusters in several countries are presented.

Nilsen, Jon K.; Samset, Bjørn H.

2011-12-01

389

Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.

Turner, Mark G.; Topp, David A.

1998-01-01

390

libMesh: a C++ library for parallel adaptive mesh refinement\\/coarsening simulations

In this paper we describe the \\u000a libMesh\\u000a (http:\\/\\/libmesh.sourceforge.net) framework for parallel adaptive finite element applications. \\u000a libMesh\\u000a is an open-source software library that has been developed to facilitate serial and parallel simulation of multiscale, multiphysics\\u000a applications using adaptive mesh refinement and coarsening strategies. The main software development is being carried out\\u000a in the CFDLab (http:\\/\\/cfdlab.ae.utexas.edu) at the University of Texas, but

Benjamin S. Kirk; John W. Peterson; Roy H. Stogner; Graham F. Carey

2006-01-01

391

Application of parallel computing techniques to a large-scale reservoir simulation

Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.

Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

2001-02-01

392

Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

393

A new parallel P3M code for very large-scale cosmological simulations

NASA Astrophysics Data System (ADS)

We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

1998-12-01

394

The best approach to the characterization of spatial resolution under various conditions is through the use of massively parallel Monte Carlo electron trajectory simulation. The spatial resolution is systematically studied in multiphase materials at intermediate voltages as a function of beam size, specimen thickness and electron beam energy by massively parallel Monte Carlo electron trajectory simulation. To provide a baseline

A. D. Romig Jr.; J. R. Michael; J. I. Goldstein

1991-01-01

395

This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

2001-08-31

396

GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online.

Pronk, Sander; Pall, Szilard; Schulz, Roland; Larsson, Per; Bjelkmar, Par; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

397

NASA Astrophysics Data System (ADS)

Optimizing gas turbines is a complex multi-physical and multi-component problem that has long been based on expensive experiments. Today, computer simulation can reduce design process costs and is acknowledged as a promising path for optimization. However, performing such computations using high-fidelity methods such as a large eddy simulation (LES) on gas turbines is challenging. Nevertheless, such simulations become accessible for specific components of gas turbines. These stand-alone simulations face a new challenge: to improve the quality of the results, new physics must be introduced. Therefore, an efficient massively parallel coupling methodology is investigated. The flow solver modeling relies on the LES code AVBP which has already been ported on massively parallel architectures. The conduction solver is based on the same data structure and thus shares its scalability. Accurately coupling these solvers while maintaining their scalability is challenging and is the actual objective of this work. To obtain such goals, a methodology is proposed and different key issues to code the coupling are addressed: convergence, stability, parallel geometry mapping, transfers and interpolation. This methodology is then applied to a real burner configuration, hence demonstrating the possibilities and limitations of the solution.

Jaure, S.; Duchaine, F.; Staffelbach, G.; Gicquel, L. Y. M.

2013-01-01

398

PCSIM: A Parallel Simulation Environment for Neural Circuits Fully Integrated with Python

The Parallel Circuit SIMulator (PCSIM) is a software package for simulation of neural circuits. It is primarily designed for distributed simulation of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit simulator with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural simulations.

Pecevski, Dejan; Natschlager, Thomas; Schuch, Klaus

2008-01-01

399

Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

Nishiura, Daisuke, E-mail: nishiura@jamstec.go.j [Institute for Research on Earth Evolution, Japan Agency for Marine-Earth Science and Technology, Kanagawa 236-0001 (Japan); Sakaguchi, Hide [Institute for Research on Earth Evolution, Japan Agency for Marine-Earth Science and Technology, Kanagawa 236-0001 (Japan)

2011-03-01

400

Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the S?-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures.

D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

2014-01-01

401

A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

Bui, Trong T.

1999-01-01

402

Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas

The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.

Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y. [General Atomics, San Diego, California 92121 (United States); Center for Integrated Plasma Studies, University of Colorado at Boulder, Boulder, Colorado 80309 (United States)

2006-07-15

403

Wolf: a rollback algorithm for optimistic distributed simulation systems

Discrete event dynamical systems are used to model a number of engineering applications ranging from communication networks, distributed computing systems to manufacturing systems. A new analytical model is proposed for the analysis of asynchronous, optimistic distributed simulation of discrete event dynamical systems. The performance of traditional timewarp algorithms and that of a new rollback algorithm, Wolf, are examined in this

Vijay Madisetti; Jean Walrand; David Messerschmitt

1988-01-01

404

Adaptive finite element simulation of flow and transport applications on parallel computers

NASA Astrophysics Data System (ADS)

The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the design and to demonstrate the capability for resolving complex multiscale processes efficiently and reliably. The first application considered is the simulation of chemotactic biological systems such as colonies of Escherichia coli. This work appears to be the first application of AMR to chemotactic processes. These systems exhibit transient, highly localized features and are important in many biological processes, which make them ideal for simulation with adaptive techniques. A nonlinear reaction-diffusion model for such systems is described and a finite element formulation is developed. The solution methodology is described in detail. Several phenomenological studies are conducted to study chemotactic processes and resulting biological patterns which use the parallel adaptive refinement capability developed in this work. The other application study is much more extensive and deals with fine scale interactions for important hypersonic flows arising in aerospace applications. These flows are characterized by highly nonlinear, convection-dominated flowfields with very localized features such as shock waves and boundary layers. These localized features are well-suited to simulation with adaptive techniques. A novel treatment of the inviscid flux terms arising in a streamline-upwind Petrov-Galerkin finite element formulation of the compressible Navier-Stokes equations is also presented and is found to be superior to the traditional approach. The parallel adaptive finite element formulation is then applied to several complex flow studies, culminating in fully three-dimensional viscous flows about complex geometries such as the Space Shuttle Orbiter. Physical phenomena such as viscous/inviscid interaction, shock wave/boundary layer interaction, shock/shock interaction, and unsteady acoustic-driven flowfield response are considered in detail. A computational investigation of a 25°/55° double cone configuration details the complex multiscale flow features and investigates a potential source of experimentally-observed unsteady flowfield response.

Kirk, Benjamin Shelton

405

NASA Astrophysics Data System (ADS)

We present the results of a unique, parallel scaling study using a 3-D variably saturated flow problem including land surface processes that ranges from a single processor to a maximum number of 16,384 processors. In the applied finite difference framework and for a fixed problem size per processor, this results in a maximum number of approximately 8 × 109 grid cells (unknowns). Detailed timing information shows that the applied simulation platform ParFlow exhibits excellent parallel efficiency. This study demonstrates that regional scale hydrologic simulations on the order of 103 km2 are feasible at hydrologic resolution (˜100-101 m laterally, 10-2-10-1 m vertically) with reasonable computation times, which has been previously assumed to be an intractable computational problem.

Kollet, Stefan J.; Maxwell, Reed M.; Woodward, Carol S.; Smith, Steve; Vanderborght, Jan; Vereecken, Harry; Simmer, Clemens

2010-04-01

406

Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.

Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

1999-11-13

407

Parallel Electromagnetic Simulation for Electric Large Target by EMS-FMM

The development framework of parallel electromagnetic simulation software EMS-FMM based on multilevel fast multipole method (MLFMM) is present in this paper. EMS-FMM can solve large-scale scattering problem of 10 million scale with a complexity of O (Nlog N), so it's efficient for electric large target. It's implemented by C and Fortran with MPI communication on cluster DeepComp 7000. Implementation of

Wu Wang; Yangde Feng; Xuebin Chi

2010-01-01

408

D simulation in electrical engineering is based on recent researchwork (Whitney's elements, auto-gauged formulations, discretizationof the source terms) and it results in complex and irregular codes.Generally, explicit message passing is used to parallelize this kind ofapplications requiring tedious and error prone low level coding of complexcommunication schedules to deal with irregularity. In this paper,we focus on a high level approach

Emmanuel Cagniot; Thomas Brandes; Jean-luc Dekeyser; Francis Piriou; Pierre Boulet; Stéphance Clénet

2000-01-01

409

Parallel 3D Simulation of Seismic Wave Propagation in the Structure of Nobi Plain, Central Japan

We performed large-scale parallel simulations of the seismic wave propagation to understand the complex wave behavior in the 3D basin structure of the Nobi Plain, which is one of the high population cities in central Japan. In this area, many large earthquakes occurred in the past, such as the 1891 Nobi earthquake (M8.0), the 1944 Tonankai earthquake (M7.9) and the

A. Kotani; T. Furumura; K. Hirahara

2003-01-01

410

Research on co-simulation of rigid- flexible coupling system of parallel robot

A co-simulation method for rigid-flexible coupling system is proposed in this paper, it is simple and easily carried out. In order to solve the dynamics of 3-TPT parallel robot more accurately, its three driven links are regard as flexible bodies and the other parts are rigid bodies to form the rigid-flexible coupling system, and the dynamics equation of the coupling

Yongxian Liu; Chunxia Zhu; Zhao Jinfu

2009-01-01

411

A parallel adaptive finite volume method for nanoscale double-gate MOSFETs simulation

We propose in this paper a quantum correction transport model for nanoscale double-gate metal-oxide-semiconductor field effect transistor (MOSFET) device simulation. Based on adaptive finite volume, parallel domain decomposition, monotone iterative, and a posteriori error estimation methods, the model is solved numerically on a PC-based Linux cluster with MPI libraries. Quantum mechanical effect plays an important role in semiconductor nanoscale device

Yiming Li; Shao-Ming Yu

2005-01-01

412

Parallel Linear System Solution and Its Application to Railway Power Network Simulation

\\u000a The Streaming SIMD extension (SSE) is a special feature embedded in the Intel Pentium III and IV classes of microprocessors.\\u000a It enables the execution of SIMD type operations to exploit data parallelism. This article presents improving computation\\u000a performance of a railway network simulator by means of SSE. Voltage and current at various points of the supply system to\\u000a an electrified

Muhammet Fikret Ercan; Yu-fai Fung; Tin-kin Ho; Wai-leung Cheung

2003-01-01

413

On the design, simulation and analysis of parallel concatenated Gallager codes

We present a framework for designing a class of concatenated codes based on low-density parity-check component codes that we call parallel concatenated Gallager codes (PCGC). The iterative decoding trajectories technique is used to explain the performance of PCGC. Simulation results show an enhanced performance in the practical Eb\\/N0 range in both AWGN and flat Rayleigh fading channels. The reduced decoding

Hatim Behairy; Shih-Chun Chang

2002-01-01

414

A parallel finite volume algorithm for large-eddy simulation of turbulent flows

A parallel unstructured finite volume algorithm is developed for large-eddy simulation of compressible turbulent flows. Major components of the algorithm include piecewise linear least-square reconstruction of the unknown variables, trilinear finite element interpolation for the spatial coordinates, Roe flux difference splitting, and second-order MacCormack explicit time marching. The computer code is designed from the start to take full advantage of

Trong Tri Bui

1998-01-01

415

Construction of a parallel processor for simulating manipulators and other mechanical systems

NASA Technical Reports Server (NTRS)

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

Hannauer, George

1991-01-01

416

NASA Technical Reports Server (NTRS)

An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.

Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas

2000-01-01

417

Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) parallel higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale simulation of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain

A. Candel; A. Kabel; L. Lee; Z. Li; C. Limborg; C. Ng; E. Prudencio; G. Schussman; R. Uplenchwar; K. Ko

2009-01-01

418

NASA Astrophysics Data System (ADS)

Methods for simulating macromolecules by molecular dynamics on distributed memory parallel computers are described. The methods are based on the replicated data parallelisation strategy, which is both simple and effective. In Part I the methods for handling the force calculations, including Coulombic interactions are discussed. The implementation of the methods on an Intel iPSC/860 computer is described and the performance issues discussed with reference to a model macromolecule in solution.

Smith, W.; Forester, T. R.

1994-02-01

419

\\u000a A 3D Finite Difference Method (FDM) with spatially irregular grids is developed to simulate the seismic propagation in anisotropic\\u000a media. Staggered irregular grid finite difference operator with second-order time and spatial accuracy are used to approximate\\u000a the velocity-stress elastic wave equations. The parallel codes are implemented with Message Passing Interface (MPI) library\\u000a and c language. The 3D model with complex

Weitao Sun; Jiwu Shu; Weimin Zheng

2005-01-01

420

A parallel multiphase flow code for the 3D simulation of explosive volcanic eruptions

A new parallel code for the simulation of the transient, 3D dispersal of volcanic particles in the atmosphere is presented. The model equations, describing the multiphase flow dynamics of gas and solid pyroclasts ejected from the volcanic vent during explosive eruptions, are solved by a finite-volume discretization scheme and a pressure-based iterative non-linear solver suited to compressible multiphase flows. The

T. Esposti Ongaro; C. Cavazzoni; G. Erbacci; A. Neri; M. V. Salvetti

2007-01-01

421

NASA Astrophysics Data System (ADS)

Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more readily than smaller ones. Nucleation initiates at the center of the rod and grows outward.

Lee, Nicholas Jabari Ouma

422

Development of Multi Agent Resource Conversion Processes Model and Simulation System

\\u000a The mathematical model of multi agent resource conversion processes (RCP) is developed by the means of discrete-event simulation\\u000a systems and expert systems. Within the framework of mathematical model RCP are defined: production system of the RCP structure,\\u000a that taking into account conflicts origin. The discrete-event simulation system \\

Konstantin A. Aksyonov; Elena F. Smoliy; Natalia V. Goncharova; Alexey A. Khrenov; Anastasia A. Baronikhina

2006-01-01

423

Creation of Multi Agent Resource Conversion Processes Model and Simulation Modeling System

The mathematical model of multi agent resource conversion processes (RCP) is developed by the means of discrete-event simulation systems and expert systems. Within the framework of mathematical model RCP are defined: production system of the RCP structure and rules transformation structure, that taking into account conflicts origin. The discrete-event simulation system \\

Konstantin A. Aksyonov; Elena F. Smoliy; Natalia V. Goncharova; Alexey A. Khrenov; Anastasia A. Baronikhina

2006-01-01

424

Spontaneous hot flow anomalies at quasi-parallel shocks: 2. Hybrid simulations

NASA Astrophysics Data System (ADS)