For comprehensive and current results, perform a real-time search at Science.gov.

1

Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

Steinman, Jeffrey S.

1992-01-01

2

Parallel Discrete Event Simulation of Lyme Disease

Parallel Discrete Event Simulation of Lyme Disease Ewa Deelman , Thomas Caraco Â¡ and Boleslaw K distribution of Lyme disease, currently the most frequently re- ported vector-borne disease of humans). Our goal is to understand patterns in the Lyme disease epidemic at the regional scale through studying

Varela, Carlos

3

Synchronous Parallel System for Emulation and Discrete Event Simulation

NASA Technical Reports Server (NTRS)

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

Steinman, Jeffrey S. (Inventor)

2001-01-01

4

Synchronous parallel system for emulation and discrete event simulation

NASA Technical Reports Server (NTRS)

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

Steinman, Jeffrey S. (inventor)

1992-01-01

5

Optimistic Parallel Discrete Event Simulation on a Beowulf Cluster of Multi-core Machines

Optimistic Parallel Discrete Event Simulation on a Beowulf Cluster of Multi-core Machines The trend towards multi-core and many-core CPUs is forever changing the composition of the Beowulf cluster. The modern Beowulf cluster is now a heterogeneous cluster of single core, multi-core, and even many

Wilsey, Philip A.

6

SPaDES\\/Java: object-oriented parallel discrete-event simulation

We describe the design, implementation and performance optimizations of SPaDES\\/Java, a process-oriented discrete-event simulation library in Java that supports sequential and parallel simulation. Parallel event synchronization is facilitated through a hybrid carrier-, demand-driven flushing conservative message mechanism. Inter-processor message communication is coordinated by a shared persistent memory implemented using Java Jini\\/JavaSpaces. We present the stepwise performance optimizations we have carried

Yong Meng Teo; Yew Kwong Ng

2002-01-01

7

library for discrete event simulation, SimPy, was chosen as the foundation for the tool. The developed tool can handle spatial objects such as moving machines, trees and boulders. Support for continuous linear movements was also added, which has resulted in a model that partially overlaps continuous and

Forest Technology; Linus Jundén

2011-01-01

8

Application of Parallel Discrete Event Simulation to the Space Surveillance Network

NASA Astrophysics Data System (ADS)

In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to thousands of nodes. There are many PDES platforms we might have used, but two requirements led us to build our own. First, the parallel components of our SSN simulation are coded and maintained by separate teams, so TESSA is designed to support transparent coupling and interoperation of separately compiled components written in any of six programming languages. Second, conventional PDES simulators are designed so that while the parallel components run concurrently, each of them is internally sequential, whereas for TESSA we needed to support MPI-based parallelism within each component. The TESSA simulator is still a work in progress and currently has some significant limitations. The paper describes those as well.

Jefferson, D.; Leek, J.

2010-09-01

9

We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that spatial scattering can often obviate the need for dynamic load balancing.

Thulasidasan, Sunil [Los Alamos National Laboratory; Kasiviswanathan, Shiva [Los Alamos National Laboratory; Eidenbenz, Stephan [Los Alamos National Laboratory; Romero, Philip [Los Alamos National Laboratory

2010-01-01

10

Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.

Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

2013-01-01

11

SPaDES/Java: Object-Oriented Parallel Discrete-Event Simulation

Operations Research Department, and JADIS [17] from the United States Air Force. Simkit consists of three optimizations of SPaDES/Java, a process- oriented discrete-event simulation library in Java that supports message synchronization overhead, and the cost of inter-processor communication. Two benchmark programs

Teo, Yong-Meng

12

NASA Technical Reports Server (NTRS)

The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.

Steinman, Jeffrey S. (Inventor)

1998-01-01

13

SimPy is a Python-based, interpreted simulation tool that offers the power and convenience of Python. It is able to launch processes and sub-processes using generators, which act autonomously and may interact using interrupts. SimPy offers other advantages over competing commercial codes in that it allows for modular development, use of a version control system such as CVS, can be made

V. Castillo

2006-01-01

14

SimPy is a Python-based, interpreted simulation tool that offers the power and convenience of Python. It is able to launch processes and sub-processes using generators, which act autonomously and may interact using interrupts. SimPy offers other advantages over competing commercial codes in that it allows for modular development, use of a version control system such as CVS, can be made

Victor Castillo

2006-01-01

15

Writing parallel, discrete-event simulations in ModSim: Insight and experience

The Time Warp Operating System (TWOS) has been the focus of much research in parallel simulation. A new language, called ModSim, has been developed for use in conjunction with TWOS. The coupling of ModSim and TWOS provides a tool to construct large, complex simulation models that will run on several parallel and distributed computer systems. As part of the Griffin Project'' underway here at Los Alamos National Laboratory, there is strong interest in assessing the coupling of ModSim and TWOS from an application-oriented perspective. To this end, a key component of the Eagle combat simulation has been implemented in ModSim for execution on TWOS. In this paper brief overviews of ModSim and TWOS will be presented. Finally, the compatibility of the computational models presented by the language and the operating system will be examined in light of experience gained to date. 18 refs., 4 figs.

Rich, D.O.; Michelsen, R.E.

1989-09-11

16

Distributed discrete-event simulation computing surveys

Traditional discrete-event simulations employ an inherently sequential algorithm. In practice, simulations of large systems are limited by this sequentiality, because only a modest number of events can be simulated. Distributed discrete-event simulation (carried out on a network of processors with asynchronous message-communicating capabilities) is proposed as an alternative; it may provide better performance by partitioning the simulation among the component

Jayadev Misra

1986-01-01

17

Distributed discrete event simulation. Final report

The presentation given here is restricted to discrete event simulation. The complexity of and time required for many present and potential discrete simulations exceeds the reasonable capacity of most present serial computers. The desire, then, is to implement the simulations on a parallel machine. However, certain problems arise in an effort to program the simulation on a parallel machine. In one category of methods deadlock care arise and some method is required to either detect deadlock and recover from it or to avoid deadlock through information passing. In the second category of methods, potentially incorrect simulations are allowed to proceed. If the situation is later determined to be incorrect, recovery from the error must be initiated. In either case, computation and information passing are required which would not be required in a serial implementation. The net effect is that the parallel simulation may not be much better than a serial simulation. In an effort to determine alternate approaches, important papers in the area were reviewed. As a part of that review process, each of the papers was summarized. The summary of each paper is presented in this report in the hopes that those doing future work in the area will be able to gain insight that might not otherwise be available, and to aid in deciding which papers would be most beneficial to pursue in more detail. The papers are broken down into categories and then by author. Conclusions reached after examining the papers and other material, such as direct talks with an author, are presented in the last section. Also presented there are some ideas that surfaced late in the research effort. These promise to be of some benefit in limiting information which must be passed between processes and in better understanding the structure of a distributed simulation. Pursuit of these ideas seems appropriate.

De Vries, R.C. [Univ. of New Mexico, Albuquerque, NM (United States). EECE Dept.

1988-02-01

18

A literature survey on distributed discrete event simulation

Much literature over the past decade has examined using multiprocessors to increase the speed and lower the cost of discrete event simulation. Three orthogonal approaches have been suggested, using simulation parallelism in support functions, in model functions and on the application level. This overview brings together these past approaches into a new framework wherein all three can be used simultaneously

Fred J. Kaudel

1987-01-01

19

We describe the various aspects involved in building FastTrans, a scalable, parallel microsimulator for transportation networks that can simulate and route tens of millions of vehicles on real-world road networks in a fraction of real time. Vehicular trips are generated using agent-based simulations that provide realistic, daily activity schedules for a synthetic population of millions of intelligent agents. We use

Sunil Thulasidasan; Shiva Kasiviswanathan; Stephan Eidenbenz; Emanuele Galli; Susan M. Mniszewski; Phillip Romero

2009-01-01

20

Performance bounds on parallel self-initiating discrete-event

NASA Technical Reports Server (NTRS)

The use is considered of massively parallel architectures to execute discrete-event simulations of what is termed self-initiating models. A logical process in a self-initiating model schedules its own state re-evaluation times, independently of any other logical process, and sends its new state to other logical processes following the re-evaluation. The interest is in the effects of that communication on synchronization. The performance is considered of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and lower bounds on the performance of a new conservative protocol. The analysis of Time Warp includes the overhead costs of state-saving and rollback. The analysis points out sufficient conditions for the conservative protocol to outperform Time Warp. The analysis also quantifies the sensitivity of performance to message fan-out, lookahead ability, and the probability distributions underlying the simulation.

Nicol, David M.

1990-01-01

21

Discrete event simulation in the artificial intelligence environment

Discrete Event Simulations performed in an Artificial Intelligence (AI) environment provide benefits in two major areas. The productivity provided by Object Oriented Programming, Rule Based Programming, and AI development environments allows simulations to be developed and maintained more efficiently than conventional environments allow. Secondly, the use of AI techniques allows direct simulation of human decision making processes and Command and Control aspects of a system under study. An introduction to AI techniques is presented. Two discrete event simulations produced in these environments are described. Finally, a software engineering methodology is discussed that allows simulations to be designed for use in these environments. 3 figs.

Egdorf, H.W.; Roberts, D.J.

1987-01-01

22

Enhancing Discrete Event Network Simulators with Analytical Network Cloud Models

Enhancing Discrete Event Network Simulators with Analytical Network Cloud Models Florian models for network clouds, which are much more efficient, if less accurate, than node- by-node models of suitable analytical network cloud models and also de- scribe how these models can be implemented in the ns

Braun, Torsten

23

Optimization of Operations Resources via Discrete Event Simulation Modeling

NASA Technical Reports Server (NTRS)

The resource levels required for operation and support of reusable launch vehicles are typically defined through discrete event simulation modeling. Minimizing these resources constitutes an optimization problem involving discrete variables and simulation. Conventional approaches to solve such optimization problems involving integer valued decision variables are the pattern search and statistical methods. However, in a simulation environment that is characterized by search spaces of unknown topology and stochastic measures, these optimization approaches often prove inadequate. In this paper, we have explored the applicability of genetic algorithms to the simulation domain. Genetic algorithms provide a robust search strategy that does not require continuity and differentiability of the problem domain. The genetic algorithm successfully minimized the operation and support activities for a space vehicle, through a discrete event simulation model. The practical issues associated with simulation optimization, such as stochastic variables and constraints, were also taken into consideration.

Joshi, B.; Morris, D.; White, N.; Unal, R.

1996-01-01

24

Multi-threaded, discrete event simulation of distributed computing systems

NASA Astrophysics Data System (ADS)

The LHC experiments have envisaged computing systems of unprecedented complexity, for which is necessary to provide a realistic description and modeling of data access patterns, and of many jobs running concurrently on large scale distributed systems and exchanging very large amounts of data. A process oriented approach for discrete event simulation is well suited to describe various activities running concurrently, as well the stochastic arrival patterns specific for such type of simulation. Threaded objects or "Active Objects" can provide a natural way to map the specific behaviour of distributed data processing into the simulation program. The simulation tool developed within MONARC is based on Java (TM) technology which provides adequate tools for developing a flexible and distributed process oriented simulation. Proper graphics tools, and ways to analyze data interactively, are essential in any simulation project. The design elements, status and features of the MONARC simulation tool are presented. The program allows realistic modeling of complex data access patterns by multiple concurrent users in large scale computing systems in a wide range of possible architectures, from centralized to highly distributed. Comparison between queuing theory and realistic client-server measurements is also presented.

Legrand, Iosif; MONARC Collaboration

2001-10-01

25

Metrics for Availability Analysis Using a Discrete Event Simulation Method

The system performance metric 'availability' is a central concept with respect to the concerns of a plant's operators and owners, yet it can be abstract enough to resist explanation at system levels. Hence, there is a need for a system-level metric more closely aligned with a plant's (or, more generally, a system's) raison d'etre. Historically, availability of repairable systems - intrinsic, operational, or otherwise - has been defined as a ratio of times. This paper introduces a new concept of availability, called endogenous availability, defined in terms of a ratio of quantities of product yield. Endogenous availability can be evaluated using a discrete event simulation analysis methodology. A simulation example shows that endogenous availability reduces to conventional availability in a simple series system with different processing rates and without intermediate storage capacity, but diverges from conventional availability when storage capacity is progressively increased. It is shown that conventional availability tends to be conservative when a design includes features, such as in - process storage, that partially decouple the components of a larger system.

Schryver, Jack C [ORNL; Nutaro, James J [ORNL; Haire, Marvin Jonathan [ORNL

2012-01-01

26

Predicting Liver Transplant Capacity Using Discrete Event Simulation.

The number of liver transplants (LTs) performed in the US increased until 2006 but has since declined despite an ongoing increase in demand. This decline may be due in part to decreased donor liver quality and increasing discard of poor-quality livers. We constructed a discrete event simulation (DES) model informed by current donor characteristics to predict future LT trends through the year 2030. The data source for our model is the United Network for Organ Sharing database, which contains patient-level information on all organ transplants performed in the US. Previous analysis showed that liver discard is increasing and that discarded organs are more often from donors who are older, are obese, have diabetes, and donated after cardiac death. Given that the prevalence of these factors is increasing, the DES model quantifies the reduction in the number of LTs performed through 2030. In addition, the model estimatesthe total number of future donors needed to maintain the current volume of LTs and the effect of a hypothetical scenario of improved reperfusion technology.We also forecast the number of patients on the waiting list and compare this with the estimated number of LTs to illustrate the impact that decreased LTs will have on patients needing transplants. By altering assumptions about the future donor pool, this model can be used to develop policy interventions to prevent a further decline in this lifesaving therapy. To our knowledge, there are no similar predictive models of future LT use based on epidemiological trends. PMID:25391681

Toro-Diaz, Hector; Mayorga, Maria E; Barritt, A Sidney; Orman, Eric S; Wheeler, Stephanie B

2014-11-12

27

Dessert, an Open-Source .NET Framework for Process-Based Discrete-Event Simulation

for process-based discrete-event simulation, designed to retain the simplicity and flexibility of SimPy optimizations; indeed, benchmarks show that our Dessert outperforms SimPy. KeywordsÂDiscrete-event simulation by SimPy [1] [2], which exploits Python generators [3], a special form of coroutines [4], for writing

Robbiano, Lorenzo

28

Parallel discrete event simulation with predictors

; ~ tF . single event processing time; ~ tCF . . CPU time for single message formulation; ~ tCT . CPU time for single message transmission; ~ tCR . CPU tiine for single message reception; ~ tLM . Link time for message transmission; ~ tLV . Link time...

Gummadi, Vidya

1995-01-01

29

Discrete event simulation and production system design for Rockwell hardness test blocks

The research focuses on increasing production volume and decreasing costs at a hardness test block manufacturer. A discrete event simulation model is created to investigate potential system wide improvements. Using the ...

Scheinman, David Eliot

2009-01-01

30

Discrete-event based simulation conceptual modeling of systems biology

The protein production from DNA to protein via RNA is a very complicated process, which could be called central dogma. In this paper, we used event based simulation to model, simulate, analyze and specify the three main processes that are involved in the process of protein production: replication, transcription, and translation. The whole control flow of event-based simulation is composed

Joe W. Yeol; Issac Barjis; Yeong S. Ryu; Joseph Barjis

2005-01-01

31

Discrete-event simulation on the World Wide Web using Java

This paper introduces Simkit, a small set of Java classes for creating discrete event simulation models. Simkit may be used to either implement stand-alone models or Web page applets. Exploiting network capabilities of Java, the lingua franca of the World Wide Web (WWW), Simkit models can easily be implemented as applets and executed in a Web browser. Java's graphical capabilities

Arnold H. Buss; Kirk A. Stork

1996-01-01

32

DISCRETE EVENT SIMULATION OF OPTICAL SWITCH MATRIX PERFORMANCE IN COMPUTER NETWORKS

In this paper, we present application of a Discrete Event Simulator (DES) for performance modeling of optical switching devices in computer networks. Network simulators are valuable tools in situations where one cannot investigate the system directly. This situation may arise if the system under study does not exist yet or the cost of studying the system directly is prohibitive. Most available network simulators are based on the paradigm of discrete-event-based simulation. As computer networks become increasingly larger and more complex, sophisticated DES tool chains have become available for both commercial and academic research. Some well-known simulators are NS2, NS3, OPNET, and OMNEST. For this research, we have applied OMNEST for the purpose of simulating multi-wavelength performance of optical switch matrices in computer interconnection networks. Our results suggest that the application of DES to computer interconnection networks provides valuable insight in device performance and aids in topology and system optimization.

Imam, Neena [ORNL; Poole, Stephen W [ORNL

2013-01-01

33

Integration of the FreeBSD TCP\\/IP-Stack into the Discrete Event Simulator OMNet

The discrete event simulator OMNeT++, that is programmed in C++, shows a steady growing popularity. Due to its well-structured nature, it is easy to understand and easy to use. A shortcoming of it, however, is the limited number of available simulation models. Especially, for network simulations a validated TCP implementation was missing. In order to avoid a re-implementation of a

Roland Bless; Mark Doll

2004-01-01

34

SimBA: A Discrete Event Simulator for Performance Prediction of Volunteer Computing Projects

SimBA (Simulator of BOINC Applications) is a discrete event simulator that models the main functions of BOINC, which is a well-known framework used in Volunteer Com- puting (VC) projects. SimBA simulates the generation and distribution of tasks that are executed in a highly volatile , heterogeneous, and distributed environment as well as the collection and validation of completed tasks. To

Michela Taufer; Andre Kerstens; Trilce Estrada; David A. Flores; Patricia J. Teller

2007-01-01

35

A New Approach to Modeling Physical Systems: Discrete Event Simulations of Grid-based Models

The traditional technique to simulate physical systems modeled by partial differential equations is by means of a time-stepped methodology where the state of the system is updated at regular discrete time intervals. This method has inherent inefficiencies. In contrast, we propose a new asynchronous type of simulation based on a discrete-event-driven (as opposed to time-driven) approach, where the state of

H. Karimabadi; Y. Omelchenko; J. Driscoll; N. Omidi; R. Fujimoto; S. Pande; K. S. Perumalla

36

Discrete event model-based simulation for train movement on a single-line railway

NASA Astrophysics Data System (ADS)

The aim of this paper is to present a discrete event model-based approach to simulate train movement with the considered energy-saving factor. We conduct extensive case studies to show the dynamic characteristics of the traffic flow and demonstrate the effectiveness of the proposed approach. The simulation results indicate that the proposed discrete event model-based simulation approach is suitable for characterizing the movements of a group of trains on a single railway line with less iterations and CPU time. Additionally, some other qualitative and quantitative characteristics are investigated. In particular, because of the cumulative influence from the previous trains, the following trains should be accelerated or braked frequently to control the headway distance, leading to more energy consumption.

Xu, Xiao-Ming; Li, Ke-Ping; Yang, Li-Xing

2014-08-01

37

Revisiting the Issue of Performance Enhancement of Discrete Event Simulation Software

New approaches are considered for performance en- hancement of discrete-event simulation software. In- stead of taking a purely algorithmic analysis view, we supplement algorithmic considerations with focus on system factors such as compiler\\/interpreter eciency, hybrid interpreted\\/compiled code, virtual and cache memory issues, and so on. The work here consists of a case study of the SimPy language, in which we

Alex Bahouth; Steven Crites; Norman Matloff; Todd Williamson

2007-01-01

38

A General Purpose Discrete Event Simulator Homi Bodhanwala, Luis Miguel Campos, Calvin Chai, Chris of change, multiÂperson development and testing and alÂ low for componentÂbased simulations to be built

Scherson, Isaac D.

39

The Fix-Point Method for Discrete Events Simulation Using SQL and UDF

\\u000a In this work we focus on leveraging SQL’s expressive power and query engine’s data processing capability for large scale discrete\\u000a event simulation.\\u000a \\u000a \\u000a The challenge of using SQL query for discrete event simulation is that the simulation involves concurrent operations where\\u000a the past events, the future events and the pending events (not fully instantiated yet) must be taken into account; but

Qiming Chen; Meichun Hsu; Bin Zhang

40

d li d l iModeling and Solution Issues in Discrete Event Simulationin Discrete Event Simulation

;Manufacturing Systems. Design/Operation (planning, h d li )scheduling) Supply Chain Management. Logistics Simulation Systemp y SIMAN V, SIMPSCRIPT II.5 y SLAM II: Enterprise Dynamics: http://www.incontrol.nl/ SIMUL8

Grossmann, Ignacio E.

41

In most decision-analytic models in health care, it is assumed that there is treatment without delay and availability of all required resources. Therefore, waiting times caused by limited resources and their impact on treatment effects and costs often remain unconsidered. Queuing theory enables mathematical analysis and the derivation of several performance measures of queuing systems. Nevertheless, an analytical approach with closed formulas is not always possible. Therefore, simulation techniques are used to evaluate systems that include queuing or waiting, for example, discrete event simulation. To include queuing in decision-analytic models requires a basic knowledge of queuing theory and of the underlying interrelationships. This tutorial introduces queuing theory. Analysts and decision-makers get an understanding of queue characteristics, modeling features, and its strength. Conceptual issues are covered, but the emphasis is on practical issues like modeling the arrival of patients. The treatment of coronary artery disease with percutaneous coronary intervention including stent placement serves as an illustrative queuing example. Discrete event simulation is applied to explicitly model resource capacities, to incorporate waiting lines and queues in the decision-analytic modeling example. PMID:20345550

Jahn, Beate; Theurl, Engelbert; Siebert, Uwe; Pfeiffer, Karl-Peter

2010-01-01

42

A Framework for the Optimization of Discrete-Event Simulation Models

NASA Technical Reports Server (NTRS)

With the growing use of computer modeling and simulation, in all aspects of engineering, the scope of traditional optimization has to be extended to include simulation models. Some unique aspects have to be addressed while optimizing via stochastic simulation models. The optimization procedure has to explicitly account for the randomness inherent in the stochastic measures predicted by the model. This paper outlines a general purpose framework for optimization of terminating discrete-event simulation models. The methodology combines a chance constraint approach for problem formulation, together with standard statistical estimation and analyses techniques. The applicability of the optimization framework is illustrated by minimizing the operation and support resources of a launch vehicle, through a simulation model.

Joshi, B. D.; Unal, R.; White, N. H.; Morris, W. D.

1996-01-01

43

DeMO: An Ontology for Discrete-event Modeling and Simulation

Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

2011-01-01

44

Incorporating discrete event simulation into quality improvement efforts in health care systems.

Quality improvement (QI) efforts are an indispensable aspect of health care delivery, particularly in an environment of increasing financial and regulatory pressures. The ability to test predictions of proposed changes to flow, policy, staffing, and other process-level changes using discrete event simulation (DES) has shown significant promise and is well reported in the literature. This article describes how to incorporate DES into QI departments and programs in order to support QI efforts, develop high-fidelity simulation models, conduct experiments, make recommendations, and support adoption of results. The authors describe how DES-enabled QI teams can partner with clinical services and administration to plan, conduct, and sustain QI investigations. PMID:24324280

Rutberg, Matthew Harris; Wenczel, Sharon; Devaney, John; Goldlust, Eric Jonathan; Day, Theodore Eugene

2015-01-01

45

NASA Technical Reports Server (NTRS)

CONFIG is a modeling and simulation tool prototype for analyzing the normal and faulty qualitative behaviors of engineered systems. Qualitative modeling and discrete-event simulation have been adapted and integrated, to support early development, during system design, of software and procedures for management of failures, especially in diagnostic expert systems. Qualitative component models are defined in terms of normal and faulty modes and processes, which are defined by invocation statements and effect statements with time delays. System models are constructed graphically by using instances of components and relations from object-oriented hierarchical model libraries. Extension and reuse of CONFIG models and analysis capabilities in hybrid rule- and model-based expert fault-management support systems are discussed.

Malin, Jane T.; Basham, Bryan D.

1989-01-01

46

Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling

NASA Technical Reports Server (NTRS)

NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand DES capabilities to address KSC's planning needs.

Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.

2010-01-01

47

Developing Flexible Discrete Event Simulation Models in an Uncertain Policy Environment

NASA Technical Reports Server (NTRS)

On February 1st, 2010 U.S. President Barack Obama submitted to Congress his proposed budget request for Fiscal Year 2011. This budget included significant changes to the National Aeronautics and Space Administration (NASA), including the proposed cancellation of the Constellation Program. This change proved to be controversial and Congressional approval of the program's official cancellation would take many months to complete. During this same period an end-to-end discrete event simulation (DES) model of Constellation operations was being built through the joint efforts of Productivity Apex Inc. (PAl) and Science Applications International Corporation (SAIC) teams under the guidance of NASA. The uncertainty in regards to the Constellation program presented a major challenge to the DES team, as to: continue the development of this program-of-record simulation, while at the same time remain prepared for possible changes to the program. This required the team to rethink how it would develop it's model and make it flexible enough to support possible future vehicles while at the same time be specific enough to support the program-of-record. This challenge was compounded by the fact that this model was being developed through the traditional DES process-orientation which lacked the flexibility of object-oriented approaches. The team met this challenge through significant pre-planning that led to the "modularization" of the model's structure by identifying what was generic, finding natural logic break points, and the standardization of interlogic numbering system. The outcome of this work resulted in a model that not only was ready to be easily modified to support any future rocket programs, but also a model that was extremely structured and organized in a way that facilitated rapid verification. This paper discusses in detail the process the team followed to build this model and the many advantages this method provides builders of traditional process-oriented discrete event simulations.

Miranda, David J.; Fayez, Sam; Steele, Martin J.

2011-01-01

48

The impact of inpatient boarding on ED efficiency: a discrete-event simulation study.

In this study, a discrete-event simulation approach was used to model Emergency Department's (ED) patient flow to investigate the effect of inpatient boarding on the ED efficiency in terms of the National Emergency Department Crowding Scale (NEDOCS) score and the rate of patients who leave without being seen (LWBS). The decision variable in this model was the boarder-released-ratio defined as the ratio of admitted patients whose boarding time is zero to all admitted patients. Our analysis shows that the Overcrowded(+) (a NEDOCS score over 100) ratio decreased from 88.4% to 50.4%, and the rate of LWBS patients decreased from 10.8% to 8.4% when the boarder-released-ratio changed from 0% to 100%. These results show that inpatient boarding significantly impacts both the NEDOCS score and the rate of LWBS patient and this analysis provides a quantification of the impact of boarding on emergency department patient crowding. PMID:20703616

Bair, Aaron E; Song, Wheyming T; Chen, Yi-Chun; Morris, Beth A

2010-10-01

49

NASA Astrophysics Data System (ADS)

Sudden Cardiac Death (SCD) is responsible for at least 180,000 deaths a year and incurs an average cost of $286 billion annually in the United States alone. Herein, we present a novel discrete event simulation model of SCD, which quantifies the chains of events associated with the formation, growth, and rupture of atheroma plaques, and the subsequent formation of clots, thrombosis and on-set of arrhythmias within a population. The predictions generated by the model are in good agreement both with results obtained from pathological examinations on the frequencies of three major types of atheroma, and with epidemiological data on the prevalence and risk of SCD. These model predictions allow for identification of interventions and importantly for the optimal time of intervention leading to high potential impact on SCD risk reduction (up to 8-fold reduction in the number of SCDs in the population) as well as the increase in life expectancy.

Andreev, Victor P.; Head, Trajen; Johnson, Neil; Deo, Sapna K.; Daunert, Sylvia; Goldschmidt-Clermont, Pascal J.

2013-05-01

50

Sudden Cardiac Death (SCD) is responsible for at least 180,000 deaths a year and incurs an average cost of $286 billion annually in the United States alone. Herein, we present a novel discrete event simulation model of SCD, which quantifies the chains of events associated with the formation, growth, and rupture of atheroma plaques, and the subsequent formation of clots, thrombosis and on-set of arrhythmias within a population. The predictions generated by the model are in good agreement both with results obtained from pathological examinations on the frequencies of three major types of atheroma, and with epidemiological data on the prevalence and risk of SCD. These model predictions allow for identification of interventions and importantly for the optimal time of intervention leading to high potential impact on SCD risk reduction (up to 8-fold reduction in the number of SCDs in the population) as well as the increase in life expectancy. PMID:23648451

Andreev, Victor P.; Head, Trajen; Johnson, Neil; Deo, Sapna K.; Daunert, Sylvia; Goldschmidt-Clermont, Pascal J.

2013-01-01

51

A Formal Framework for Stochastic Discrete Event System Specification Modeling and Simulation

We introduce an extension of the classic Discrete Event System Specification (DEVS) formalism that includes stochastic features. Based on the use of the probability spaces theory we define the stochastic DEVS (STDEVS) specification, which provides a formal framework for modeling and sim- ulation of general non-deterministic discrete event systems. The main theoretical properties of the STDEVS framework are treated, including

Rodrigo Castro; Ernesto Kofman; Gabriel A. Wainer

2010-01-01

52

This study used discrete event simulation to model the personnel recruiting process for a U.S. Army recruiting company. Actual data from the company was collected and used to build the simulation model. The model is run under various conditions...

Fancher, Robert H.

1997-01-01

53

Koala: A DiscreteEvent Simulation Model of Infrastructure Clouds Koala is a discrete, written in SLX1 , facilitates investigation of global behavior throughout a single IaaS cloud. Koala scales. Koala is based loosely on the Amazon Elastic Compute Cloud (EC2) and on Eucalyptus open

54

Discrete event simulation tool for analysis of qualitative models of continuous processing systems

NASA Technical Reports Server (NTRS)

An artificial intelligence design and qualitative modeling tool is disclosed for creating computer models and simulating continuous activities, functions, and/or behavior using developed discrete event techniques. Conveniently, the tool is organized in four modules: library design module, model construction module, simulation module, and experimentation and analysis. The library design module supports the building of library knowledge including component classes and elements pertinent to a particular domain of continuous activities, functions, and behavior being modeled. The continuous behavior is defined discretely with respect to invocation statements, effect statements, and time delays. The functionality of the components is defined in terms of variable cluster instances, independent processes, and modes, further defined in terms of mode transition processes and mode dependent processes. Model construction utilizes the hierarchy of libraries and connects them with appropriate relations. The simulation executes a specialized initialization routine and executes events in a manner that includes selective inherency of characteristics through a time and event schema until the event queue in the simulator is emptied. The experimentation and analysis module supports analysis through the generation of appropriate log files and graphics developments and includes the ability of log file comparisons.

Malin, Jane T. (inventor); Basham, Bryan D. (inventor); Harris, Richard A. (inventor)

1990-01-01

55

Discrete Event Simulation of QoS of a SCADA System Interconnecting a Power Grid and a Telco Network

Discrete Event Simulation of QoS of a SCADA System Interconnecting a Power Grid and a Telco Network Isolation and System Restoration (FISR) service, delivered by SCADA system are computed, discussed and correlated to quality indicators of power supplied to customers. In delivering FISR service, SCADA system

Paris-Sud XI, UniversitÃ© de

56

Discrete event simulation for healthcare organizations: a tool for decision making.

Healthcare organizations face challenges in efficiently accommodating increased patient demand with limited resources and capacity. The modern reimbursement environment prioritizes the maximization of operational efficiency and the reduction of unnecessary costs (i.e., waste) while maintaining or improving quality. As healthcare organizations adapt, significant pressures are placed on leaders to make difficult operational and budgetary decisions. In lieu of hard data, decision makers often base these decisions on subjective information. Discrete event simulation (DES), a computerized method of imitating the operation of a real-world system (e.g., healthcare delivery facility) over time, can provide decision makers with an evidence-based tool to develop and objectively vet operational solutions prior to implementation. DES in healthcare commonly focuses on (1) improving patient flow, (2) managing bed capacity, (3) scheduling staff, (4) managing patient admission and scheduling procedures, and (5) using ancillary resources (e.g., labs, pharmacies). This article describes applicable scenarios, outlines DES concepts, and describes the steps required for development. An original DES model developed to examine crowding and patient flow for staffing decision making at an urban academic emergency department serves as a practical example. PMID:23650696

Hamrock, Eric; Paige, Kerrie; Parks, Jennifer; Scheulen, James; Levin, Scott

2013-01-01

57

DiMSim: a discrete-event simulator of metabolic networks.

A novel, scalable, quantitative, discrete-event simulator of metabolic and more general reaction pathways-DiMSim-has been developed. Rather than being modeled by systems of differential equations, metabolic pathways are viewed as bipartite graphs consisting of metabolites and reactions, linked by unidirectional or bidirectional arcs, and fluxes of metabolites emerge as the product of flows of the metabolites through the individual reactions. If required, DiMSim is able to model reactions involving single molecules up to molar concentrations so it is able to cope with the special characteristics of biochemical systems, including reversible reactions and discontinuous behavior, e.g. due to competition between reactions for limited quantities of reactants, product or allosteric inhibition and highly nonlinear behavior, e.g. due to cascades. It is also able to model membrane-bound compartments and the channels used to transport metabolites between them (both passive diffusion and active transport). While Michaelis-Menten kinetics is supported, DiMSim makes almost no assumptions other than each reaction having a fixed stoichiometry and that each reaction takes a stated amount of time. PMID:12767160

Xia, Xiao-Qin; Wise, Michael J

2003-01-01

58

Tutorial: Parallel Simulation on Supercomputers

This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.

Perumalla, Kalyan S [ORNL

2012-01-01

59

Flexi-Cluster: A Simulator for a Single Compute Cluster Flexi-Cluster is a flexible, discrete-event simulation model for a single compute cluster, such as might be deployed within a compute grid. The model management and scheduling within a single compute cluster. The key innovation in the model is to permit users

60

Discrete Event Simulation of manufacturing systems has become widely accepted as an important tool to aid the design of such systems. Often, however, it is applied by practitioners in a manner which largely ignores an important element of industry; namely, the workforce. Workers are usually represented as simple resources, often with deterministic performance values. This approach ignores the potentially large

Tim Baines; Linda Hadfield; Steve Mason; John Ladbrook

2003-01-01

61

NASA Technical Reports Server (NTRS)

While the ability to model the state of a space system over time is essential during spacecraft operations, the use of time-based simulations remains rare in preliminary design. The absence of the time dimension in most traditional early design tools can however become a hurdle when designing complex systems whose development and operations can be disrupted by various events, such as delays or failures. As the value delivered by a space system is highly affected by such events, exploring the trade space for designs that yield the maximum value calls for the explicit modeling of time.This paper discusses the use of discrete-event models to simulate spacecraft development schedule as well as operational scenarios and on-orbit resources in the presence of uncertainty. It illustrates how such simulations can be utilized to support trade studies, through the example of a tool developed for DARPA's F6 program to assist the design of "fractionated spacecraft".

Dubos, Gregory F.; Cornford, Steven

2012-01-01

62

On extending parallelism to serial simulators

This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops sub-models using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit

David Nicol; Philip Heidelberger

1995-01-01

63

Background Computer simulation studies of the emergency department (ED) are often patient driven and consider the physician as a human resource whose primary activity is interacting directly with the patient. In many EDs, physicians supervise delegates such as residents, physician assistants and nurse practitioners each with different skill sets and levels of independence. The purpose of this study is to present an alternative approach where physicians and their delegates in the ED are modeled as interacting pseudo-agents in a discrete event simulation (DES) and to compare it with the traditional approach ignoring such interactions. Methods The new approach models a hierarchy of heterogeneous interacting pseudo-agents in a DES, where pseudo-agents are entities with embedded decision logic. The pseudo-agents represent a physician and delegate, where the physician plays a senior role to the delegate (i.e. treats high acuity patients and acts as a consult for the delegate). A simple model without the complexity of the ED is first created in order to validate the building blocks (programming) used to create the pseudo-agents and their interaction (i.e. consultation). Following validation, the new approach is implemented in an ED model using data from an Ontario hospital. Outputs from this model are compared with outputs from the ED model without the interacting pseudo-agents. They are compared based on physician and delegate utilization, patient waiting time for treatment, and average length of stay. Additionally, we conduct sensitivity analyses on key parameters in the model. Results In the hospital ED model, comparisons between the approach with interaction and without showed physician utilization increase from 23% to 41% and delegate utilization increase from 56% to 71%. Results show statistically significant mean time differences for low acuity patients between models. Interaction time between physician and delegate results in increased ED length of stay and longer waits for beds. Conclusion This example shows the importance of accurately modeling physician relationships and the roles in which they treat patients. Neglecting these relationships could lead to inefficient resource allocation due to inaccurate estimates of physician and delegate time spent on patient related activities and length of stay. PMID:23692710

2013-01-01

64

Controlled Sequential Bifurcation: A New Factor-Screening Method for Discrete-Event Simulation

Screening experiments are performed to eliminate unimpor- tant factors so that the remaining important factors can be more thoroughly studied in later experiments. Sequential bifurcation (SB) is a screening method that is well suited for simulation experiments; the challenge is to prove the \\

Hong Wan; Bruce E. Ankenman; Barry L. Nelson

2006-01-01

65

Controlled sequential bifurcation: a new factor-screening method for discrete-event simulation

Screening experiments are performed to eliminate unimportant factors so that the remaining important factors can be more thoroughly studied in later experiments. Sequential bifurcation (SB) is a screening method that is well suited for simulation experiments; the challenge is to prove the \\

Hong Wan; Bruce Ankenman; Bany L. Nelson

2003-01-01

66

We present a model that integrates real-time process control charting with simulation modeling to illustrate the effects and benefits of SPC charts for quality improvement efforts. The integrated model is particularly significant in addressing transition issues arising from changes in the input material. A case study based on a medical manufacturing industry process is used to illustrate the approach.

Harriet Black Nembhard; Ming-Shu Kao; Gino Lim

1999-01-01

67

We present a model that integrates real-time process control charting with simulation modeling to illustrate the effects and benefits of SPC charts for quality improvement efforts. The integrated model is particularly significant in addressing transition issues arising from changes in the input material. A case study based on a medical manufacturing industry process is used to illustrate the approach

Harriet Black Nembhard; Ming-Shu Kao; Gino Lim

1999-01-01

68

A methodology for fabrication of intelligent discrete-event simulation models

In this article a meta-specification for the software requirements and design of intelligent discrete next-event simulation models has been presented. The specification is consistent with established practices for software development as presented in the software engineering literature. The specification has been adapted to take into consideration the specialized needs of object-oriented programming resulting in the actor-centered taxonomy. The heart of the meta-specification is the methodology for requirements specification and design specification of the model. The software products developed by use of the methodology proposed herein are at the leading edge of technology in two very synergistic disciplines - expert systems and simulation. By incorporating simulation concepts into expert systems a deeper reasoning capability is obtained - one that is able to emulate the dynamics or behavior of the object system or process over time. By including expert systems concepts into simulation, the capability to emulate the reasoning functions of decision-makers involved with (and subsumed by) the object system is attained. In either case the robustness of the technology is greatly enhanced.

Morgeson, J.D.; Burns, J.R.

1987-01-01

69

Forest biomass supply logistics for a power plant using the discrete-event simulation approach

This study investigates the logistics of supplying forest biomass to a potential power plant. Due to the complexities in such a supply logistics system, a simulation model based on the framework of Integrated Biomass Supply Analysis and Logistics (IBSAL) is developed in this study to evaluate the cost of delivered forest biomass, the equilibrium moisture content, and carbon emissions from the logistics operations. The model is applied to a proposed case of 300 MW power plant in Quesnel, BC, Canada. The results show that the biomass demand of the power plant would not be met every year. The weighted average cost of delivered biomass to the gate of the power plant is about C$ 90 per dry tonne. Estimates of equilibrium moisture content of delivered biomass and CO2 emissions resulted from the processes are also provided.

Mobini, Mahdi [University of British Columbia, Vancouver; Sowlati, T. [University of British Columbia, Vancouver; Sokhansanj, Shahabaddine [ORNL

2011-04-01

70

Screening experiments are performed to eliminate unimportant factors so that the remaining important factors can be more thoroughly studied in later experiments. Sequential bifurcation (SB) is a screening method that is well suited for simulation experiments; the challenge is to prove the \\

Hong Wan; Bruce E. Ankenman; Barry L. Nelson

2003-01-01

71

Simulating Billion-Task Parallel Programs

In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

Perumalla, Kalyan S [ORNL] [ORNL; Park, Alfred J [ORNL] [ORNL

2014-01-01

72

Threaded WARPED: An Optimistic Parallel Discrete Event Simulator for Cluster of Multi-Core Machines

. However, the emergence of low-cost multi-core and many-core processors suitable for use in Beowulf called WARPED to a Beowulf Cluster of many-core processors. More precisely, WARPED is an optimistically for efficient execution on single-core Beowulf Clusters. The work of this thesis extends the WARPED kernel

Wilsey, Philip A.

73

ISSUES IN PARALLEL DISCRETE EVENT SIMULATION FOR AN INTERNET TELEPHONY CALL SIGNALING PROTOCOL

, features are imple- mented somewhat differently than they are in the existing circuit-switched world. 1 INTRODUCTION In the world of Internet telephony, features1 are imple- mented somewhat differently are provided mostly by originating and terminating switches (it's actually a little bit more compli- cated than

Dickens, Phillip M.

74

On extending parallelism to serial simulators

NASA Technical Reports Server (NTRS)

This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.

Nicol, David; Heidelberger, Philip

1994-01-01

75

The growing understanding of the use of biomarkers in Alzheimer's disease (AD) may enable physicians to make more accurate and timely diagnoses. Florbetaben, a beta-amyloid tracer used with positron emission tomography (PET), is one of these diagnostic biomarkers. This analysis was undertaken to explore the potential value of florbetaben PET in the diagnosis of AD among patients with suspected dementia and to identify key data that are needed to further substantiate its value. A discrete event simulation was developed to conduct exploratory analyses from both US payer and societal perspectives. The model simulates the lifetime course of disease progression for individuals, evaluating the impact of their patient management from initial diagnostic work-up to final diagnosis. Model inputs were obtained from specific analyses of a large longitudinal dataset from the New England Veterans Healthcare System and supplemented with data from public data sources and assumptions. The analyses indicate that florbetaben PET has the potential to improve patient outcomes and reduce costs under certain scenarios. Key data on the use of florbetaben PET, such as its influence on time to confirmation of final diagnosis, treatment uptake, and treatment persistency, are unavailable and would be required to confirm its value. PMID:23326754

Guo, Shien; Getsios, Denis; Hernandez, Luis; Cho, Kelly; Lawler, Elizabeth; Altincatal, Arman; Lanes, Stephan; Blankenburg, Michael

2012-01-01

76

Background Recent reforms in Portugal aimed at strengthening the role of the primary care system, in order to improve the quality of the health care system. Since 2006 new policies aiming to change the organization, incentive structures and funding of the primary health care sector were designed, promoting the evolution of traditional primary health care centres (PHCCs) into a new type of organizational unit - family health units (FHUs). This study aimed to compare performances of PHCC and FHU organizational models and to assess the potential gains from converting PHCCs into FHUs. Methods Stochastic discrete event simulation models for the two types of organizational models were designed and implemented using Simul8 software. These models were applied to data from nineteen primary care units in three municipalities of the Greater Lisbon area. Results The conversion of PHCCs into FHUs seems to have the potential to generate substantial improvements in productivity and accessibility, while not having a significant impact on costs. This conversion might entail a 45% reduction in the average number of days required to obtain a medical appointment and a 7% and 9% increase in the average number of medical and nursing consultations, respectively. Conclusions Reorganization of PHCC into FHUs might increase accessibility of patients to services and efficiency in the provision of primary care services. PMID:21999336

2011-01-01

77

Inflated speedups in parallel simulations via malloc()

NASA Technical Reports Server (NTRS)

Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

Nicol, David M.

1990-01-01

78

M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed. PMID:23560037

Khalid, Ruzelan; M. Nawawi, Mohd Kamal; Kawsar, Luthful A.; Ghani, Noraida A.; Kamil, Anton A.; Mustafa, Adli

2013-01-01

79

The combination of simulation with the maintenance analysis of mining equipment has been proven to be an effective tool to assess the impact of equipment failures on mining equipment. Genetic algorithms have been applied to multiple areas of mine design, mostly involving optimization solutions. With regard to maintenance analysis, past research in mining focused on the design of a genetic

Greg Yuriy; Nick Vayenas

2008-01-01

80

Objective To assess the budgetary impact of switching from screen-film mammography to full-field digital mammography in a population-based breast cancer screening program. Methods A discrete-event simulation model was built to reproduce the breast cancer screening process (biennial mammographic screening of women aged 50 to 69 years) combined with the natural history of breast cancer. The simulation started with 100,000 women and, during a 20-year simulation horizon, new women were dynamically entered according to the aging of the Spanish population. Data on screening were obtained from Spanish breast cancer screening programs. Data on the natural history of breast cancer were based on US data adapted to our population. A budget impact analysis comparing digital with screen-film screening mammography was performed in a sample of 2,000 simulation runs. A sensitivity analysis was performed for crucial screening-related parameters. Distinct scenarios for recall and detection rates were compared. Results Statistically significant savings were found for overall costs, treatment costs and the costs of additional tests in the long term. The overall cost saving was 1,115,857€ (95%CI from 932,147 to 1,299,567) in the 10th year and 2,866,124€ (95%CI from 2,492,610 to 3,239,638) in the 20th year, representing 4.5% and 8.1% of the overall cost associated with screen-film mammography. The sensitivity analysis showed net savings in the long term. Conclusions Switching to digital mammography in a population-based breast cancer screening program saves long-term budget expense, in addition to providing technical advantages. Our results were consistent across distinct scenarios representing the different results obtained in European breast cancer screening programs. PMID:24832200

Comas, Mercè; Arrospide, Arantzazu; Mar, Javier; Sala, Maria; Vilaprinyó, Ester; Hernández, Cristina; Cots, Francesc; Martínez, Juan; Castells, Xavier

2014-01-01

81

Modelling in economic evaluation is an unavoidable fact of life. Cohort-based state transition models are most common, though discrete event simulation (DES) is increasingly being used to implement more complex model structures. The benefits of DES relate to the greater flexibility around the implementation and population of complex models, which may provide more accurate or valid estimates of the incremental costs and benefits of alternative health technologies. The costs of DES relate to the time and expertise required to implement and review complex models, when perhaps a simpler model would suffice. The costs are not borne solely by the analyst, but also by reviewers. In particular, modelled economic evaluations are often submitted to support reimbursement decisions for new technologies, for which detailed model reviews are generally undertaken on behalf of the funding body. This paper reports the results from a review of published DES-based economic evaluations. Factors underlying the use of DES were defined, and the characteristics of applied models were considered, to inform options for assessing the potential benefits of DES in relation to each factor. Four broad factors underlying the use of DES were identified: baseline heterogeneity, continuous disease markers, time varying event rates, and the influence of prior events on subsequent event rates. If relevant, individual-level data are available, representation of the four factors is likely to improve model validity, and it is possible to assess the importance of their representation in individual cases. A thorough model performance evaluation is required to overcome the costs of DES from the users' perspective, but few of the reviewed DES models reported such a process. More generally, further direct, empirical comparisons of complex models with simpler models would better inform the benefits of DES to implement more complex models, and the circumstances in which such benefits are most likely. PMID:24627341

Karnon, Jonathan; Haji Ali Afzali, Hossein

2014-06-01

82

Background Osteoporotic fractures cause a large health burden and substantial costs. This study estimated the expected fracture numbers and costs for the remaining lifetime of postmenopausal women in Germany. Methods A discrete event simulation (DES) model which tracks changes in fracture risk due to osteoporosis, a previous fracture or institutionalization in a nursing home was developed. Expected lifetime fracture numbers and costs per capita were estimated for postmenopausal women (aged 50 and older) at average osteoporosis risk (AOR) and for those never suffering from osteoporosis. Direct and indirect costs were modeled. Deterministic univariate and probabilistic sensitivity analyses were conducted. Results The expected fracture numbers over the remaining lifetime of a 50 year old woman with AOR for each fracture type (% attributable to osteoporosis) were: hip 0.282 (57.9%), wrist 0.229 (18.2%), clinical vertebral 0.206 (39.2%), humerus 0.147 (43.5%), pelvis 0.105 (47.5%), and other femur 0.033 (52.1%). Expected discounted fracture lifetime costs (excess cost attributable to osteoporosis) per 50 year old woman with AOR amounted to €4,479 (€1,995). Most costs were accrued in the hospital €1,743 (€751) and long-term care sectors €1,210 (€620). Univariate sensitivity analysis resulted in percentage changes between -48.4% (if fracture rates decreased by 2% per year) and +83.5% (if fracture rates increased by 2% per year) compared to base case excess costs. Costs for women with osteoporosis were about 3.3 times of those never getting osteoporosis (€7,463 vs. €2,247), and were markedly increased for women with a previous fracture. Conclusion The results of this study indicate that osteoporosis causes a substantial share of fracture costs in postmenopausal women, which strongly increase with age and previous fractures. PMID:24981316

2014-01-01

83

NASA Technical Reports Server (NTRS)

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

Nicol, David; Fujimoto, Richard

1992-01-01

84

Parallel Atomistic Simulations

Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

HEFFELFINGER,GRANT S.

2000-01-18

85

CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.

Weening, J.S.

1988-05-01

86

On time diagnosis of discrete event systems

A formulation and solution methodology for on-time fault diagnosis in discrete event systems is presented. This formulation and solution methodology captures the timeliness aspect of fault diagnosis and is therefore different from all other approaches to fault diagnosis in discrete event systems which are asymptotic in nature. A monitor observes a projection of the events that occur in the system.

Aditya Mahajan; Demosthenis Teneketzis

2008-01-01

87

Xyce parallel electronic simulator.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

2010-05-01

88

Optimal Discrete Event Supervisory Control of Aircraft Gas Turbine Engines

NASA Technical Reports Server (NTRS)

This report presents an application of the recently developed theory of optimal Discrete Event Supervisory (DES) control that is based on a signed real measure of regular languages. The DES control techniques are validated on an aircraft gas turbine engine simulation test bed. The test bed is implemented on a networked computer system in which two computers operate in the client-server mode. Several DES controllers have been tested for engine performance and reliability.

Litt, Jonathan (Technical Monitor); Ray, Asok

2004-01-01

89

Constraint-based modeling of discrete event dynamic systems

Numerous frameworks dedicated to the modeling of discrete event dynamic systems have been proposed to deal with programming,\\u000a simulation, validation, situation tracking, or decision tasks: automata, Petri nets, Markov chains, synchronous languages,\\u000a temporal logics, event and situation calculi, STRIPS…All these frameworks present significant similarities, but none offers\\u000a the flexibility of more generic frameworks such as logic or constraints. In this

Gérard Verfaillie; Cédric Pralet; Michel Lemaître

2010-01-01

90

An algebra of discrete event processes

NASA Technical Reports Server (NTRS)

This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.

Heymann, Michael; Meyer, George

1991-01-01

91

INTERACTING DISCRETE EVENT SYSTEMS: MODELLING, VERIFICATION, AND

represented - as interaction specification - in the modelling structure. A multilevel extension to the model information can be used to solve the external verification problem modularly by converting the problemINTERACTING DISCRETE EVENT SYSTEMS: MODELLING, VERIFICATION, AND SUPERVISORY CONTROL by Sherif

Abdelwahed, Sherif

92

Nonlinear Control and Discrete Event Systems

NASA Technical Reports Server (NTRS)

As the operation of large systems becomes ever more dependent on extensive automation, the need for an effective solution to the problem of design and validation of the underlying software becomes more critical. Large systems possesses much detailed structure, typically hierarchical, and they are hybrid. Information processing at the top of the hierarchy is by means of formal logic and sentences; on the bottom it is by means of simple scalar differential equations and functions of time; and in the middle it is by an interacting mix of nonlinear multi-axis differential equations and automata, and functions of time and discrete events. The lecture will address the overall problem as it relates to flight vehicle management, describe the middle level, and offer a design approach that is based on Differential Geometry and Discrete Event Dynamic Systems Theory.

Meyer, George; Null, Cynthia H. (Technical Monitor)

1995-01-01

93

Multiple Autonomous Discrete Event Controllers for Constellations

NASA Technical Reports Server (NTRS)

The Multiple Autonomous Discrete Event Controllers for Constellations (MADECC) project is an effort within the National Aeronautics and Space Administration Goddard Space Flight Center's (NASA/GSFC) Information Systems Division to develop autonomous positioning and attitude control for constellation satellites. It will be accomplished using traditional control theory and advanced coordination algorithms developed by the Johns Hopkins University Applied Physics Laboratory (JHU/APL). This capability will be demonstrated in the discrete event control test-bed located at JHU/APL. This project will be modeled for the Leonardo constellation mission, but is intended to be adaptable to any constellation mission. To develop a common software architecture. the controllers will only model very high-level responses. For instance, after determining that a maneuver must be made. the MADECC system will output B (Delta)V (velocity change) value. Lower level systems must then decide which thrusters to fire and for how long to achieve that (Delta)V.

Esposito, Timothy C.

2003-01-01

94

Generalized Detectability for Discrete Event Systems

In our previous work, we investigated detectability of discrete event systems, which is defined as the ability to determine the current and subsequent states of a system based on observation. For different applications, we defined four types of detectabilities: (weak) detectability, strong detectability, (weak) periodic detectability, and strong periodic detectability. In this paper, we extend our results in three aspects. (1) We extend detectability from deterministic systems to nondeterministic systems. Such a generalization is necessary because there are many systems that need to be modeled as nondeterministic discrete event systems. (2) We develop polynomial algorithms to check strong detectability. The previous algorithms are based on observer whose construction is of exponential complexity, while the new algorithms are based on a new automaton called detector. (3) We extend detectability to D-detectability. While detectability requires determining the exact state of a system, D-detectability relaxes this requirement by asking only to distinguish certain pairs of states. With these extensions, the theory on detectability of discrete event systems becomes more applicable in solving many practical problems. PMID:21691432

Shu, Shaolong; Lin, Feng

2011-01-01

95

Generalized Detectability for Discrete Event Systems.

In our previous work, we investigated detectability of discrete event systems, which is defined as the ability to determine the current and subsequent states of a system based on observation. For different applications, we defined four types of detectabilities: (weak) detectability, strong detectability, (weak) periodic detectability, and strong periodic detectability. In this paper, we extend our results in three aspects. (1) We extend detectability from deterministic systems to nondeterministic systems. Such a generalization is necessary because there are many systems that need to be modeled as nondeterministic discrete event systems. (2) We develop polynomial algorithms to check strong detectability. The previous algorithms are based on observer whose construction is of exponential complexity, while the new algorithms are based on a new automaton called detector. (3) We extend detectability to D-detectability. While detectability requires determining the exact state of a system, D-detectability relaxes this requirement by asking only to distinguish certain pairs of states. With these extensions, the theory on detectability of discrete event systems becomes more applicable in solving many practical problems. PMID:21691432

Shu, Shaolong; Lin, Feng

2011-05-01

96

Evolution of time horizons in parallel and grid simulations

NASA Astrophysics Data System (ADS)

We analyze the evolution of the local simulation times (LST) in parallel discrete event simulations. The new ingredients introduced are (i) we associate the LST with the nodes and not with the processing elements, and (ii) we propose to minimize the exchange of information between different processing elements by freezing the LST on the boundaries between processing elements for some time of processing and then releasing them by a wide-stream memory exchange between processing elements. The highlights of our approach are (i) it keeps the highest level of processor time utilization during the algorithm evolution, (ii) it takes a reasonable time for the memory exchange, excluding the time consuming and complicated process of message exchange between processors, and (iii) the communication between processors is decoupled from the calculations performed on a processor. The effectiveness of our algorithm grows with the number of nodes (or threads). This algorithm should be applicable for any parallel simulation with short-range interactions, including parallel or grid simulations of partial differential equations.

Shchur, L. N.; Novotny, M. A.

2004-08-01

97

Complex system analysis through discrete event simulation

E-commerce is generally thought of as a world without walls. Although a computer monitor may replace a storefront window, the products that are purchased online have to be distributed from a brick and mortar warehouse. ...

Faranca, Anthony G. (Anthony Gilbert), 1971-

2004-01-01

98

ZAMBEZI: a parallel pattern parallel fault sequential circuit fault simulator

Sequential circuit fault simulators use the multiple bits in a computer data word to accelerate simulation. We introduce, and implement, a new sequential circuit fault simulator, a parallel pattern parallel fault simulator, ZAMBEZI, which simultaneously simulates multiple faults with multiple vectors in one data word. ZAMBEZI is developed by enhancing the control flow, of existing parallel pattern algorithms. For a

Minesh B. Amin; Bapiraju Vinnakota

1996-01-01

99

Planning and supervision of reactor defueling using discrete event techniques

New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.

Garcia, H.E.; Imel, G.R. [Argonne National Lab., IL (United States); Houshyar, A. [Western Michigan Univ., Kalamazoo, MI (United States). Dept. of Physics

1995-12-31

100

Parallel circuit simulation on supercomputers

Circuit simulation is a very time-consuming and numerically intensive application, especially when the problem size is large as in the case of VLSI circuits. To improve the performance of circuit simulators without sacrificing accuracy, a variety of parallel processing algorithms have been investigated due to the recent availability of a number of commercial multiprocessor machines. In this paper, research in

R. A. Saleh; K. A. Gallivan; M.-C. Chang; I. N. Hajj; T. N. Trick; D. Smart

1989-01-01

101

PARALLEL IMPLEMENTATION OF VLSI HED CIRCUIT SIMULATION

14 PARALLEL IMPLEMENTATION OF VLSI HED CIRCUIT SIMULATION INFORMATICA 2/91 Keywords: circuit simulation, direct method, vvaveform relaxation, parallel algorithm, parallel computer architecture Srilata, India Junj Sile Marjan Spegel Jozef Stefan Institute, Ljubljana, Slovenia The importance of circuit

Silc, Jurij

102

Parallel Simulation of Multicomponent Systems

\\u000a Simulation of multicomponent systems poses many critical challenges in science and engineering. We overview some software\\u000a and algorithmic issues in developing high-performance simulation tools for such systems, based on our experience in developing\\u000a a large-scale, fully-coupled code for detailed simulation of solid propellant rockets. We briefly sketch some of our solutions\\u000a to these issues, with focus on parallel and performance

Michael T. Heath; Xiangmin Jiao

2004-01-01

103

Parallelizing Timed Petri Net simulations

NASA Technical Reports Server (NTRS)

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

Nicol, David M.

1993-01-01

104

Parallel Event-Driven Global Magnetospheric Hybrid Simulations

NASA Astrophysics Data System (ADS)

Global MHD/Hall-MHD magnetospheric models are not able to capture the full diversity of scales and processes that control the Earth's magnetosphere. In order to significantly improve the predictive capabilities of global space weather models, new CPU-efficient algorithms are needed, which could properly account for ion kinetic effects in a large computational domain over long simulation times. To achieve this much expected breakthrough in hybrid (particle ions and fluid electrons) simulations we developed a novel asynchronous time integration technique known as Discrete-Event Simulation (DES). DES replaces conventional time stepping with event processing, which allows to update macro-particles and grid-based fields on their own timescales. This unique capability of DES removes the traditional CFL constraint on the global timestep and enables natural (event-driven) coupling of multi-physics components in a global application model. We report first-ever parallel 2D hybrid DES (HYPERS) runs and compare them with similar time-stepped simulations. We also discuss our undergoing efforts on developing efficient load-balancing strategies for future 3D HYPERS runs on petascale architectures.

Omelchenko, Y. A.; Karimabadi, H.; Saule, E.; Catalyurek, U. V.

2010-12-01

105

CAISSON: Interconnect Network Simulator

NASA Technical Reports Server (NTRS)

Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.

Springer, Paul L.

2006-01-01

106

An assessment of the ModSim/TWOS parallel simulation environment

The Time Warp Operating System (TWOS) has been the focus of significant research in parallel, discrete-event simulation (PDES). A new language, ModSim, has been developed for use in conjunction with TWOS. The coupling of ModSim and TWOS is an attempt to address the development of large-scale, complex, discrete-event simulation models for parallel execution. The approach, simply stated, is to provide a high-level simulation-language that embodies well-known software engineering principles combined with a high-performance parallel execution environment. The inherent difficulty with this approach is the mapping of the simulation application to the parallel run-time environment. To use TWOS, Time Warp applications are currently developed in C and must be tailored according to a set of constraints and conventions. C/TWOS applications are carefully developed using explicit calls to the Time Warp primitives; thus, the mapping of application to parallel run-time environment is done by the application developer. The disadvantage to this approach is the questionable scalability to larger software efforts; the obvious advantage is the degree of control over managing the efficient execution of the application. The ModSim/TWOS system provides an automatic mapping from a ModSim application to an equivalent C/TWOS application. The major flaw with the ModSim/TWOS system is it currently exists is that there is no compiler support for mapping a ModSim application into an efficient C/TWOS application. Moreover, the ModSim language as currently defined does not provide explicit hooks into the Time Warp Operating System and hence the developer is unable to tailor a ModSim application in the same fashion that a C application can be tailored. Without sufficient compiler support, there is a mismatch between ModSim's object-oriented, process-based execution model and the Time Warp execution model.

Rich, D.O.; Michelsen, R.E.

1991-01-01

107

Model Transformation with Hierarchical Discrete-Event Control

Model Transformation with Hierarchical Discrete- Event Control Thomas Huining Feng Electrical permission. #12;Model Transformation with Hierarchical Discrete-Event Control by Huining Feng B.S. (Nanjing Date Date University of California, Berkeley Spring 2009 #12;Model Transformation with Hierarchical

108

Xyce parallel electronic simulator design.

This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

2010-09-01

109

Parallel network simulations with NEURON.

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L

2006-10-01

110

Discrete Event Supervisory Control Applied to Propulsion Systems

NASA Technical Reports Server (NTRS)

The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.

Litt, Jonathan S.; Shah, Neerav

2005-01-01

111

Mutually Nonblocking Supervisory Control of Discrete Event Systems

1 Mutually Nonblocking Supervisory Control of Discrete Event Systems M. Fabian Department to each individual specification. We call this the problem of mutually nonblocking supervision, which. We present a necessary and sufficient condition for the existence of a mutually nonblocking

Kumar, Ratnesh

112

Data parallel sequential circuit fault simulation

Sequential circuit fault simulation is a compute-intensive problem. Parallel simulation is one method to reduce fault simulation time. In this paper, we discuss a novel technique to partition the fault set for the fault parallel simulation of sequential circuits on multiple processors. When applied statically, the technique can scale well for up to thirty two processors on an ethernet. The

Minesh B. Amin; Bapiraju Vinnakota

1996-01-01

113

Parallel simulation of the Sharks World problem

The Sharks World problem has been suggested as a suitable application to evaluate the effectiveness of parallel simulation algorithms. This paper develops a simulation model in Maisie, a C-based simulation language. With minor modifications, a Maisie progrmm may be executed using either sequential or parallel simulation algorithms. The paper presents the results of executing the Maisie model on a multicomputer

Rajive L. Bagrodia; Wen-Toh Liao

1990-01-01

114

The Maisie environment for parallel simulation

Maisie is among the few languages that separates the simulation program from the underlying algorithm (sequential or parallel) that is used to execute the program. It is thus possible to design a sequential simulation and, if needed, to subsequently port it to a parallel machine for execution with optimistic or conservative algorithms. We provide an overview of the Maisie simulation

Rajive L. Bagrodia; Vikas Jha; Jerry Waldorf

1994-01-01

115

Hierarchical Discrete Event Supervisory Control of Aircraft Propulsion Systems

NASA Technical Reports Server (NTRS)

This paper presents a hierarchical application of Discrete Event Supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the mission objectives. Each engine is operated under a continuously varying control system that maintains the specified performance and a local discrete-event supervisor for condition monitoring and life extending control. A global upper level DES controller is designed for load balancing and overall health management of the propulsion system.

Yasar, Murat; Tolani, Devendra; Ray, Asok; Shah, Neerav; Litt, Jonathan S.

2004-01-01

116

Maximally Permissive Hierarchical Control of Decentralized Discrete Event Systems

The subject of this paper is the synthesis of natural projections that serve as nonblocking and maximally permissive ab- stractions for the hierarchical and decentralized control of large- scale discrete event systems. To this end, existing concepts for non- blocking abstractions such as natural observers and marked string accepting (msa)-observers are extended by local control consistency (LCC) as a novel

Klaus Schmidt; Christian Breindl

2011-01-01

117

Synchronization and Linearity: an algebra for discrete event systems

This book proposes a unified mathematical treatment of a class of 'linear' discrete event systems, which contains important subclasses of Petri nets and queuing networks with synchronization constraints. The linearity has to be understood with respect to nonstandard algebraic structures, e.g. the 'max-plus algebra'. A calculus is developed based on such structures, which is followed by tools for computing the

F. Baccelli; G. Cohen; G. J. Olsder; J. P. Quadrat

1992-01-01

118

Parallel processing interactively simulates complex VLSI logic

To break the simulation bottleneck that has slowed designers of very large-scale integrated circuits, engineers at IBM developed the logic simulation machine, a hardware logic simulator. This dedicated, highly parallel computer simulates logic at a functional or gate level at speeds up to 100000 times faster than software simulators. In fact, it is so fast that designers can perform logic simulations interactively, experimenting with several designs and choosing the best one.

Howard, J.K.; Malm, R.L.; Warren, L.M.

1983-12-15

119

Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†

Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327

Khan, Md. Ashfaquzzaman; Herbordt, Martin C.

2011-01-01

120

Parallel methods for the flight simulation model

The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.

Xiong, Wei Zhong; Swietlik, C.

1994-06-01

121

Parallel quantum computer simulation on the GPU

Simulation of quantum computers using classical computers is a hard problem with high memory and computa- tional requirements. Parallelization can alleviate this problem, allowing the simulation of more qubits at the same time or the same number of qubits to be simulated in less time. A promising approach is to exploit the high performance computing capa- bilities provided by the

Andrei Amariutei; Simona Caraiman

2011-01-01

122

Data Parallel SwitchLevel Simulation \\Lambda Randal E. Bryant

Mellon University Abstract Data parallel simulation involves simulating the be havior of a circuit over runs on a a massively parallel SIMD machine, with each processor simulat ing the circuit behavior parallelism in simulation utilize circuit parallelism. In this mode, the simulator extracts parallelism from

Bryant, Randal E.

123

Data parallel sorting for particle simulation

Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

Dagum, L. (NASA Ames Research Center, Moffett Field, CA (United States))

1992-05-01

124

Xyce parallel electronic simulator : users' guide.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

125

Polynomial Synthesis of Supervisor for Partially Observed Discrete Event Systems by allowing

Polynomial Synthesis of Supervisor for Partially Observed Discrete Event Systems by allowing of discrete event systems under partial ob- servation using nondeterministic supervisors. We formally define a nondeterministic control policy and also a control & observation compatible nondeterministic state ma- chine

Kumar, Ratnesh

126

Parallel Circuit Simulation Using Hierarchical Relaxation

This paper describes a class of parallel algorithms for circuit simulation based on hierarchical relaxation that has been implemented on the Cedar multiprocessor. The Cedar machine is a reconfigurable, general-purpose supercomputer that was designed and implemented at the University of Illinois. A hierarchical circuit simulation scheme was developed to exploit the hierarchical organization of Cedar. The new algorithm and a

Gih-guang Hung; Yen-cheng Wen; Kyle Gallivan; Resve A. Saleh

1990-01-01

127

Simulation based performance prediction by PEPSY

Parallel programs, generated by the supercompiler, VFCS (Vienna Fortran Compilation System), are an application area for the tool, PEPSY (PErformance Prediction SYstem), which we have developed recently. PEPSY derives a performance model from an internal representation of the parallel program in the compiler, automatically, and makes performance analysis by discrete-event simulation. Several monitoring modes of PEPSY enable to analyze a

Roman Blasko

1995-01-01

128

Parallel processing of a rotating shaft simulation

NASA Technical Reports Server (NTRS)

A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

Arpasi, Dale J.

1989-01-01

129

Parallel distributed-time logic simulation

The Chandy-Misra algorithm offers more parallelism than the standard event-driven algorithm for digital logic simulation. With suitable enhancements, the Chandy-Misra algorithm also offers significantly better parallel performance. The authors present methods to optimize the algorithm using information about the large number of global synchronization points, called deadlocks, that limit performance. They classify deadlocks and describe them in terms of circuit

L. Soule; A. Gupta

1989-01-01

130

Parallel logic simulation on general purpose machines

Three parallel algorithms for logic simulation have been developed and implemented on a general purpose shared-memory parallel machine. The first algorithm is a synchronous version of a traditional event-driven algorithm which achieves speed-ups of 6 to 9 with 15 processors. The second algorithm is a synchronous unit-delay compiled mode algorithm which achieves speed-ups of 10 to 13 with 15 processors.

Larry Soulé; Tom Blank

1988-01-01

131

NASA Technical Reports Server (NTRS)

Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.

Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.

1994-01-01

132

VALIDATION OF MASSIVELY PARALLEL SIMULATIONS OF DYNAMIC FRACTURE AND

VALIDATION OF MASSIVELY PARALLEL SIMULATIONS OF DYNAMIC FRACTURE AND FRAGMENTATION OF BRITTLE element simulations of dynamic fracture and fragmentation of brittle solids are presented. Fracture the results of massively parallel numerical simulations of dynamic fracture and fragmentation in brittle

Barr, Al

133

Parallel Simulation of Carbon Nanotube Based Composites

\\u000a Computational simulation plays a vital role in nanotechnology. Molecular dynamics (MD) is an important computational method\\u000a to understand the fundamental behavior of nanoscale systems, and to transform that understanding into useful products. MD\\u000a computations, however, are severely restricted by the spatial and temporal scales of simulations. This paper describes the\\u000a methods used to achieve effective spatial parallelization of a MD

Jyoti Kolhe; Usha Chandra; Sirish Namilae; Ashok Srinivasan; Namas Chandra

2004-01-01

134

Parallel Simulated Annealing Algorithms in Global Optimization

Global optimization involves the difficult task of the identification of global extremities of mathematical functions. Such problems are often encountered in practice in various fields, e.g., molecular biology, physics, industrial chemistry. In this work, we develop five different parallel Simulated Annealing (SA) algorithms and compare them on an extensive test bed used previously for the assessment of various solution approaches

Esin Onba?o?lu; Linet Özdamar

2001-01-01

135

A comparison of serial and parallel processing simulation models of pilot workload

This paper discusses and evaluates several options for modeling a pilot while he performs normal cockpit duties. Using a queueing analogy, with the pilot modeled as the server and the pilot's tasks as customers, two discrete-event simulation models were developed. The main issue examined was the relative accuracy of a model which processes tasks in series versus one which processes

T. F. Schuppr; Wright-Patterson AFR

1989-01-01

136

Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.

Menon, Suresh

1996-01-01

137

Continuum Representation for Simulating Discrete Events of Battery Operation

- kinetic expression and Ohm's law in electrolyte were used to calcu- late initial guesses for algebraic performed using this technique using the solid-phase diffusion model. A nonlinear electrochemical differential algebraic equation DAE systems can be obtained using DAEIS. DAEIS is effective in handling a DAE

Panchagnula, Mahesh

138

Parallel algorithm strategies for circuit simulation.

Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

2010-01-01

139

Xyce parallel electronic simulator : reference guide.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

140

Parallel node placement method by bubble simulation

NASA Astrophysics Data System (ADS)

An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

2014-03-01

141

Communication Reqirements in Parallel Crashworthiness Simulation

This paper deals with the design and implementation of communications strategies for the migration to distributed-memory, MIMD machines of an industrial crashworthiness simulation program, PAM-CRASH, using message-passing. A summary of the algorithmic features and parallelization approach is followed by a discussion of options to minimize overheads introduced by the need for global communication. Implementation issues will be specific to the

Guy Lonsdale; Jan Clinckemaillie; Stefanos Vlachoutsis; J. Dubois

1994-01-01

142

Parallel Logic Level simulation of VLSI Circuits Abstract In this paper, we study parallel logic evaluated the impact on parallel circuit simulation of different number of partitions with different. Few staticstics have been published to exploit the parallelism and analyze performance in circuit

Cong, Jason "Jingsheng"

143

Fracture simulations via massively parallel molecular dynamics

NASA Astrophysics Data System (ADS)

Fracture simulations at the atomistic level have been carried out for relatively small systems of particles, typically 10,000 or less. In order to study anything approaching a macroscopic system, massively parallel molecular dynamics (MD) must be employed. In two spatial dimensions (2D), it is feasible to simulate a sample that is 0.1 microns on a side. Recent MD simulations of mode 1 crack extension under tensile loading at high strain rates are reported. The method of uniaxial, homogeneously expanding periodic boundary conditions was employed to represent tensile stress conditions near the crack tip. The effects of strain rate, temperature, material properties (equation of state and defect energies), and system size were examined. It was found that, in order to mimic a bulk sample, several tricks (in addition to expansion boundary conditions) need to be employed: (1) the sample must be pre-strained to nearly the condition at which the crack will spontaneously open; (2) to relieve the stresses at free surfaces, such as the initial notch, annealing by kinetic-energy quenching must be carried out to prevent unwanted rarefactions; (3) sound waves emitted as the crack tip opens and dislocations emitted from the crack tip during blunting must be absorbed by special reservoir regions. The tricks described briefly will be especially important to carrying out feasible massively parallel 3D simulations via MD.

Holian, B. L.; Abraham, F. F.; Ravelo, R.

144

Parallel Strategies for Crash and Impact Simulations

We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

1998-12-07

145

Massively Parallel Direct Simulation of Multiphase Flow

The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

2000-08-10

146

State-Feedback Control of Fuzzy Discrete-Event Systems

In a 2002 paper, we combined fuzzy logic with discrete-event systems (DESs) and established an automaton model of fuzzy DESs (FDESs). The model can effectively represent deterministic uncertainties and vagueness, as well as human subjective observation and judgment inherent to many real-world problems, particularly those in biomedicine. We also investigated optimal control of FDESs and applied the results to optimize HIV/AIDS treatments for individual patients. Since then, other researchers have investigated supervisory control problems in FDESs, and several results have been obtained. These results are mostly derived by extending the traditional supervisory control of (crisp) DESs, which are string based. In this paper, we develop state-feedback control of FDESs that is different from the supervisory control extensions. We use state space to describe the system behaviors and use state feedback in control. Both disablement and enforcement are allowed. Furthermore, we study controllability based on the state space and prove that a controller exists if and only if the controlled system behavior is (state-based) controllable. We discuss various properties of the state-based controllability. Aside from novelty, the proposed new framework has the advantages of being able to address a wide range of practical problems that cannot be effectively dealt with by existing approaches. We use the diabetes treatment as an example to illustrate some key aspects of our theoretical results. PMID:19884087

Lin, Feng; Ying, Hao

2014-01-01

147

Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype

ERIC Educational Resources Information Center

This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a…

Sanchez, A.; Bucio, J.

2012-01-01

148

A comparative study of three model-based FDI approaches for Discrete Event Systems

A comparative study of three model-based FDI approaches for Discrete Event Systems MickaÃ«l and Isolation (FDI) approaches for Discrete Event Systems (DES) are evaluated. The considered approaches of productivity which has positive effects on economic issues. Fault Detection and Isolation (FDI) methods allow

Paris-Sud XI, UniversitÃ© de

149

xSim: The extreme-scale simulator

Investigating parallel application performance at scale is an important part of high-performance computing (HPC) application development. The Extreme-scale Simulator (xSim) is a performance toolkit that permits running an application in a controlled environment at extreme scale without the need for a respective extreme-scale HPC system. Using a lightweight parallel discrete event simulation, xSim executes a parallel application with a virtual

Swen Boehm; Christian Engelmann

2011-01-01

150

Parallel gate-level circuit simulation on shared memory architectures

This paper presents the results of an experimental study to evaluate the effectiveness of parallel simulation in reducing the execution time of gate-level models of VLSI circuits. Specific contributions of this paper include (i) the design of a gate-level parallel simulator that can be executed, without any changes on both distributed memory and shared memory parallel architectures, (ii) demonstrated speedups

Rajive Bagrodia; Yu-an Chen; Vikas Jha; Nicki Sonpar

1995-01-01

151

Simulating Concurrent Intrusions for Testing Intrusion Detection Systems: Parallelizing Intrusions

For testing Intrusion Detection Systems (IDS), it is essentialthat we be able to simulate intrusions in different forms(both sequential and parallelized) in order to comprehensivelytest and evaluate the detection capability of an IDS. This paperpresents an algorithm for automatically transforming a sequentialintrusive script into a set of parallel intrusive scripts(formed by a group of parallel threads) which simulate a concurrentintrusion.

1995-01-01

152

Parallel Finite Element Simulation of Tracer Injection in Oil Reservoirs

Parallel Finite Element Simulation of Tracer Injection in Oil Reservoirs Alvaro L.G.A. Coutinho In this work, parallel finite element techniques for the simulation of tracer injection in oil reservoirs. The pressure, velocity and concentration linear systems of equations are solved with parallel elementÂbyÂelement

Coutinho, Alvaro L. G. A.

153

Parallel and Distributed Multi-Algorithm Circuit Simulation

. Increased VLSI design complexity has made circuit simulation an ever growing bottleneck, making parallel processing an appealing solution for addressing this challenge. In this thesis, we propose and develop a parallel and distributed multi...

Dai, Ruicheng

2012-10-19

154

Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

Carr, Eric; Nicol, David M.

1994-01-01

155

YetiSim: a C++ simulation library with execution graphs instead of coroutines

YetiSim is a new open source C++ discrete event simulation library developed using Intel's open source Threading Building Blocks library to provide for parallel processing of tasks. It was created to provide an alternative method of constructing simulations in C++ without coroutines. Execution graphs, directed graphs based on UML state charts, are introduced. These graphs are directly executed by YetiSim,

Adrien Guillon; Deborah Loach

2008-01-01

156

Parallel Proximity Detection for Computer Simulations

NASA Technical Reports Server (NTRS)

The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

1998-01-01

157

Parallel Proximity Detection for Computer Simulation

NASA Technical Reports Server (NTRS)

The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

1997-01-01

158

Parallel multiscale simulations of a brain aneurysm

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver ?? ?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( ?? ?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2012-01-01

159

Parallel multiscale simulations of a brain aneurysm

NASA Astrophysics Data System (ADS)

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2013-07-01

160

Parallel multiscale simulations of a brain aneurysm.

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver ?? ?r . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( ?? ?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066

Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

2013-07-01

161

Parallel multiscale simulations of a brain aneurysm

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)] [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States); Fedosov, Dmitry A. [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany)] [Institute of Complex Systems and Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich 52425 (Germany); Karniadakis, George Em, E-mail: george_karniadakis@brown.edu [Division of Applied Mathematics, Brown University, Providence, RI 02912 (United States)

2013-07-01

162

MAPS: multi-algorithm parallel circuit simulation

The emergence of multi-core and many-core processors has introduced new opportunities and challenges to EDA research and development. While the availability of increasing parallel computing power holds new promise to address many computing challenges in CAD, the leverage of hardware parallelism can only be possible with a new generation of parallel CAD applications. In this paper, we propose a novel

Xiaoji Ye; Wei Dong; Peng Li; Sani R. Nassif

2008-01-01

163

On the synthesis of safe control policies in decentralized control of discrete-event systems

State estimation and safe controller synthesis for a general form of decentralized control architecture for discrete-event systems is investigated. For this architecture, controllable events are assigned to be either \\

Kurt Rohloff; Stéphane Lafortune

2003-01-01

164

Comments on "Polynomial Time Verification of Decentralized Diagnosability of Discrete Event

1 Comments on "Polynomial Time Verification of Decentralized Diagnosability of Discrete Event that their testing automaton GV = GF iGN,i, defined in [1, Algorithm 1, Step 4], is in general nondeterministic

Kumar, Ratnesh

165

Coarse grain parallel finite element simulations for incompressible flows

Parallel simulation of incompressible fluid flows is considered on networks of homogeneous work-stations. Coarse-grain parallelization of a Taylor-Galerkin\\/pressure-correction finite element algorithm are discussed, taking into account network communication costs. The main issues include the parallelization of system assembly, and iterative and direct solvers, that are of com-mon interest to finite element and general numerical computation. The parallelization strategies are implemented

P. W. Grant; M. F. Webster; X. Zhang

1998-01-01

166

Parallel Algorithms for Time and Frequency Domain Circuit Simulation

organization of parallel platforms. ....... 13 5 Two types of physical organization of parallel platforms. ....... 14 6 Mechanism of MPI and thread based parallelization. ......... 15 7 A basic flow of transient simulation. .................. 24 8 BD... of the parallel forward scheme. ............. 45 14 Four-thread waveform pipelining. ................... 46 15 4T waveform pipelining: a) without revoking of forward pipelining and b) with revoking of forward pipelining. .............. 47 16 Speedups of various...

Dong, Wei

2010-10-12

167

Improving the performance of parallel relaxation-based circuit simulators

Describes methods of increasing parallelism, thereby improving the performance, of waveform relaxation-based parallel circuit simulators. The key contribution is the use of parallel nonlinear relaxation and parallel model evaluation to solve large subcircuits that may lead to load balancing problems. These large subcircuits are further partitioned and solved on clusters of tightly-coupled multiprocessors. This paper describes a general hybrid\\/hierarchical approach

Gih-guang Hung; Yen-cheng Wen; Kyle A. Gallivan; Resve A. Saleh

1993-01-01

168

Parallelization of Rocket Engine Simulator Software (PRESS)

NASA Technical Reports Server (NTRS)

Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be encapsulated in C + + code thereby facilitating reuse without undue development effort. The details are covered in the aforementioned section of the interim report filed on April 28, 1997.

Cezzar, Ruknet

1997-01-01

169

Parallelizing simulated annealing algorithms based on high-performance computer

We implemented five conversions of simulated annealing (SA) algorithm from sequential-to-parallel forms on high-performance\\u000a computers and applied them to a set of standard function optimization problems in order to test their performances. According\\u000a to the experimental results, we eventually found that the traditional approach to parallelizing simulated annealing, namely,\\u000a parallelizing moves in sequential SA, difficultly handled very difficult problem instances.

Ding-Jun Chen; Chung-Yeol Lee; Cheol-Hoon Park; Pedro Mendes

2007-01-01

170

PARASPICE: A Parallel Circuit Simulator for Shared-Memory Multiprocessors

This paper presents a general approach to parallelizing direct method circuit simulation. The approach extracts parallel tasks at the algorithmic level for each compute-intensive module and therefore is suitable for a wide range of shared-memory multiprocessors. The implementation of the approach in SPICE2 resulted in a portable parallel direct circuit simulator, PARASPICE. The superior performance of PARASPICE is demonstrated on

Gung-chung Yang

1990-01-01

171

Parssec: A Parallel Simulation Environment for Complex Systems

ulating large-scale systems. Widespread use ofparallel simulation, however, has been significantlyhindered by a lack of tools for integrating parallelmodel execution into the overall framework of systemsimulation. Although a number of algorithmicalternatives exist for parallel execution of discreteeventsimulation models, performance analysts notexpert in parallel simulation have relatively few toolsgiving them flexibility to experiment with multiplealgorithmic or architectural...

Rajive Bagrodia; Richard A. Meyer; Mineo Takai; Yu-An Chen; Xiang Zeng; Jay Martin; Ha Yoon Song

1998-01-01

172

Partitioning strategies for parallel KIVA-4 engine simulations

Parallel KIVA-4 is described and simulated in four different engine geometries. The Message Passing-Interface (MPl) was used to parallelize KIVA-4. Par itioning strategies ar accesed in light of the fact that cells can become deactivated and activated during the course of an engine simulation which will affect the load balance between processors.

Torres, D J [Los Alamos National Laboratory; Kong, S C [IOWA STATE UNIV

2008-01-01

173

Parallel magnetic field perturbations in gyrokinetic simulations

At low beta it is common to neglect parallel magnetic field perturbations on the basis that they are of order beta{sup 2}. This is only true if effects of order beta are canceled by a term in the nablaB drift also of order beta[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B{sub ||} is required for accurate gyrokinetic simulations at low beta for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B{sub ||} are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with eta{sub e}. The effect of B{sub ||} on the ETG instability is shown to depend on the normalized pressure gradient beta{sup '} at constant beta.

Joiner, N.; Hirose, A. [Department of Physics and Engineering Physics, University of Saskatchewan, Saskatoon, Saskatchewan S7N 5E2 (Canada); Dorland, W. [University of Maryland, College Park, Maryland 20742 (United States)

2010-07-15

174

Parallel methods for dynamic simulation of multiple manipulator systems

NASA Technical Reports Server (NTRS)

In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

Mcmillan, Scott; Sadayappan, P.; Orin, David E.

1993-01-01

175

A Parallel and Accelerated Circuit Simulator with Precise Accuracy

We have developed a hi ghly parallel and accelerated circuit simulator which produces precise results for large scale simulation. We incorporated multithreading in both the model and matrix calculations to achieve not only a factor of 10 acceleration compared to the defacto standard circuit simulator used worldwide, but also equal or exceed the performance of timing-based event -driven simulators with

Peter M. Lee; Shinji Ito; Takeaki Hashimoto; Tomomasa Touma; Junji Sato; Goichi Yokomizo; Ic

2002-01-01

176

Behavior coordination of mobile robotics using supervisory control of fuzzy discrete event systems.

In order to incorporate the uncertainty and impreciseness present in real-world event-driven asynchronous systems, fuzzy discrete event systems (DESs) (FDESs) have been proposed as an extension to crisp DESs. In this paper, first, we propose an extension to the supervisory control theory of FDES by redefining fuzzy controllable and uncontrollable events. The proposed supervisor is capable of enabling feasible uncontrollable and controllable events with different possibilities. Then, the extended supervisory control framework of FDES is employed to model and control several navigational tasks of a mobile robot using the behavior-based approach. The robot has limited sensory capabilities, and the navigations have been performed in several unmodeled environments. The reactive and deliberative behaviors of the mobile robotic system are weighted through fuzzy uncontrollable and controllable events, respectively. By employing the proposed supervisory controller, a command-fusion-type behavior coordination is achieved. The observability of fuzzy events is incorporated to represent the sensory imprecision. As a systematic analysis of the system, a fuzzy-state-based controllability measure is introduced. The approach is implemented in both simulation and real time. A performance evaluation is performed to quantitatively estimate the validity of the proposed approach over its counterparts. PMID:21421445

Jayasiri, Awantha; Mann, George K I; Gosine, Raymond G

2011-10-01

177

Multiprogrammed non-blocking checkpoints in support of optimistic simulation on myrinet clusters

CCL (checkpointing and communication library) is a software layer in support of optimistic parallel discrete event simulation (PDES) on myrinet-based COTS clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, non-blocking (asynchronous) checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. These functionalities are unique since

Andrea Santoro; Francesco Quaglia

2007-01-01

178

Parallel transistor level circuit simulation using domain decomposition methods

This paper presents an efficient parallel transistor level full-chip circuit simulation tool with SPICE-accuracy. The new approach partitions the circuit into a linear domain and several non-linear domains based on circuit non-linearity and connectivity. The linear domain is solved by parallel fast linear solver while nonlinear domains are parallelly distributed into different processors and solved by direct solver. Parallel domain

He Peng; Chung-kuan Cheng

2009-01-01

179

Parallel architecture for real-time simulation. Master's thesis

This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel architecture based upon The University of Alabama's OPERA architecture. Each component of this system is discussed and rationale presented for its selection. The problem selected, real-time simulation of the Space Shuttle Main Engine for the test and evaluation of the proposed architecture, is explored, identifying the areas where parallelism can be exploited and parallel processing applied. Results from the test and evaluation phase are presented and compared with the results of the same problem that has been processed on a uniprocessor system.

Cockrell, C.D.

1989-01-01

180

DCCB and SCC Based Fast Circuit Partition Algorithm For Parallel SPICE Simulation

DCCB and SCC Based Fast Circuit Partition Algorithm For Parallel SPICE Simulation Xiaowei Zhou, Yu facing VLSI circuits for parallel simulation. This paper presents an efficient circuit partition algorithm specially designed for VLSI circuit partition and parallel simulation. The algorithm

Wang, Yu

181

New Iterative Linear Solvers For Parallel Circuit Simulation

This thesis discusses iterative linear solvers for parallel transient analysis of large scale logic circuits. Theincreasing importance of large scale circuit simulation is the driving force of the researches on efficient parallelcircuit simulation. The most time consuming part of circuit transient analysis is the model evaluation, andthe next is the linear solver, which takes about 1\\/5 of simulation time. Although

Reiji Suda

1996-01-01

182

Parallelizing Circuit Simulation - A Combined Algorithmic And Specialized Hardware Approach

Accurate performance estimation of high-density integrated circuits requires the kind of detailed numerical simulation performed in programs like ASTAP[1] and SPICE[2]. Because of the large computation time required for such prograins when applied to large circuits, accelerating numerical simulation is an important problem. Parallel processing promises to be a viable approach to accclerating the simulation of large circuits. This paper

Jacob White; Nicholas Weiner

183

Development of a Massively-Parallel, Biological Circuit Simulator

Genetic expression and control pathways can be successfully modeled as electrical circuits. Given the vast quantity of genomic data, very large and complex genetic circuits can be constructed. To tackle such problems, the massively-parallel, electronic circuit simulator, Xyce™, is being adapted to address biological problems. Unique to this biocircuit simulator is the ability to simulate not just one or a

Richard L. Schiek; Elebeoba E. May

2003-01-01

184

Development of parallelism for circuit simulation by tearing

A hierarchical clustering with min-cut exchange method for parallel circuit simulation is presented. Partitioning into subcircuits is near optimum in terms of distribution of computational cost and does not sacrifice the sparsity of the entire matrix. In order to compute the arising dense interconnection matrix in parallel, multilevel and distributed row-base dissection algorithms are used. A processing speed up of

H. Onozuka; M. Kanoh; C. Mizuta; T. Nakata; N. Tanabe

1993-01-01

185

Massively Parallel Simulations of Solar Flares and Plasma Turbulence

Massively Parallel Simulations of Solar Flares and Plasma Turbulence Lukas Arnold, Christoph Beetz in space- and astrophysical plasmasystems include solar flares and hydro- or magnetohydrodynamic turbulence of solar flares and Lagrangian statistics of compressible and incompress- ible turbulent flows

Grauer, Rainer

186

Parallel Monte Carlo Driver (PMCD)—a software package for Monte Carlo simulations in parallel

NASA Astrophysics Data System (ADS)

Thanks to the dramatic decrease of computer costs and the no less dramatic increase in those same computer's capabilities and also thanks to the availability of specific free software and libraries that allow the set up of small parallel computation installations the scientific community is now in a position where parallel computation is within easy reach even to moderately budgeted research groups. The software package PMCD (Parallel Monte Carlo Driver) was developed to drive the Monte Carlo simulation of a wide range of user supplied models in parallel computation environments. The typical Monte Carlo simulation involves using a software implementation of a function to repeatedly generate function values. Typically these software implementations were developed for sequential runs. Our driver was developed to enable the run in parallel of the Monte Carlo simulation, with minimum changes to the original code that implements the function of interest to the researcher. In this communication we present the main goals and characteristics of our software, together with a simple study its expected performance. Monte Carlo simulations are informally classified as "embarrassingly parallel", meaning that the gains in parallelizing a Monte Carlo run should be close to ideal, i.e. with speed ups close to linear. In this paper our simple study shows that without compromising the easiness of use and implementation, one can get performances very close to the ideal.

Mendes, B.; Pereira, A.

2003-03-01

187

Efficient event-driven simulation of parallel processor architectures

In this paper we present a new approach for generating high-speed optimized event-driven instruction set level simulators for adaptive massively parallel processor architectures. The simulator generator is part of a methodology for the systematic mapping, evaluation, and exploration of massively parallel processor architectures that are designed for special purpose applications in the world of embedded computers. The generation of high-speed

Alexey Kupriyanov; Dmitrij Kissler; Frank Hannig; Jürgen Teich

2007-01-01

188

Xyce parallel electronic simulator : users' guide. Version 5.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-11-01

189

Xyce Parallel Electronic Simulator : users' guide, version 4.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-02-01

190

Activity regions for the specification of discrete event systems

The common view on modeling and simulation of dynamic systems is to focus on the specification of the state of the system and its transition function. Although some interesting challenges remain to efficiently and elegantly support this view, we consider in this paper that this problem is solved. Instead, we propose here to focus on a new point of view

Alexandre Muzy; Luc Touraille; Hans Vangheluwe; Olivier Michel; Mamadou Kaba Traoré; David R. C. Hill

2010-01-01

191

Traffic simulations on parallel computers using domain decomposition techniques

Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.

Hanebutte, U.R.; Tentner, A.M.

1995-12-31

192

Performance of Parallel Logic Event Simulation on PC-Cluster

PC-cluster is becoming more and more popular in many scientific and engineering applications, but not in electronic design areas. One of the reasons that parallel simulations have not been popularized is due to the high cost of frequent communications of small messages. Several simulation techniques have been aggressively studied and developed in the past ten years. These studies mostly focused

Thuy T. Le; Jalel Rejeb

2004-01-01

193

Parallel and Adaptive Simulation of Fuel Cells Robert Klfkorn1

Parallel and Adaptive Simulation of Fuel Cells in 3d Robert KlÃ¶fkorn1 , Dietmar KrÃ¶ner1 , Mario) fuel cells. Hereby, we focus on the simulation done in 3d us- ing modern techniques like higher order and the transport of species in the cathodic gas diffusion layer of the fuel cell. Therefore, from the detailed

MÃ¼nster, WestfÃ¤lische Wilhelms-UniversitÃ¤t

194

Accelerating Quantum Computer Simulation via Parallel Eigenvector Computation

Quantum-dot cellular automata (QDCA) hold great potential to produce the next generation of computer hardware, but their development is hindered by computationally intensive simulations. Our research therefore focuses on rewriting one such simulation to run parallel calculations on a graphics processing unit (GPU). We have decreased execution time from 33 hours 11 minutes to 1 hour 39 minutes, but current

Karl Stathakis

2011-01-01

195

Clustering Algorithms for Parallel Car-Crash Simulation Analysis

Buckling and certain contact situations cause scattering results of numerical crash simulation: For a BMW model differences between the position of a node in two simulation runs of up to 10 cm were observed, just as a result of round-off differences in the case of parallel computing. An engineer has to measure this scatter, to check whether important parts of

Liquan Meil; Clemens Thole

196

PROTEUS: A High-Performance Parallel-Architecture Simulator

PROTEUS: A High-Performance Parallel-Architecture Simulator by Eric A. Brewer Chrysanthos N gured to simulate a wide range of architectures. Proteus provides a modular structure that simpli es customization and independent replacement of parts of architecture. There are typically multiple implementations

Koppelman, David M.

197

This short paper outlines research that the Universities of Arizona and New Mexico have initiated in distributed robotics. A basic hypothesis of this research is that intrinsically efficient discrete event abstractions of soft computing paradigms, including neurons and fuzzy rule controllers, will enable cooperative multi-robot systems to overcome severe limitations on on- board processing and communications networking. Specifically, we seek

Bernard P. Zeigler; Mo Jamshidi; Hessam Sarjoughian

198

IEC 61131-3 Compliant Control Code Generation from Discrete Event Models

This paper describes a control logic implementation approach, which is based on discrete event models in form of finite state machines and Petri nets. Such models may be derived during supervisory control synthesis. The approach defines a transformation of the models into an IEC 61131-3 compliant code that can be translated and downloaded into a standard industrial programmable logic controller.

Dejan Gradisar; Drago Matko

2005-01-01

199

Discrete event diagnosis using labeled Petri nets. An application to manufacturing systems

In this paper an approach to on-line diagnosis of discrete event systems based on labeled Petri nets is presented. The approach is based on the notion of basis markings and justifications and it can be applied both to bounded and unbounded Petri nets whose unobservable subnet is acyclic. Moreover it is shown that, in the case of bounded Petri nets,

M. P. Cabasino; A. Giua; M. Pocci; C. Seatzu

2011-01-01

200

Determination of Timed Transitions in Identified Discrete-Event Models for Fault Detection

Determination of Timed Transitions in Identified Discrete-Event Models for Fault Detection Stefan Schneider1, Lothar Litz1 and Jean-Jacques Lesage2 Abstract-- Model-based fault detection compares modeled. The method identifies a set of time guards leading to an advantageous trade-off between the fault detection

Boyer, Edmond

201

Signed real measure of regular languages for discrete-event automata ASOK RAYy* and SHASHI PHOHAz

Signed real measure of regular languages for discrete-event automata ASOK RAYy* and SHASHI PHOHAz This paper presents the concept and formulation of a signed real measure of regular languages for analysis a partial ordering on a set of controlled sublanguages {Lk} of a regular plant language L, the signed real

Ray, Asok

202

Fault Detection and Isolation in Manufacturing Systems with an Identified Discrete Event Model

Fault Detection and Isolation in Manufacturing Systems with an Identified Discrete Event Model) In this paper a generic method for fault detection and isolation (FDI) in manufacturing systems considered and controller built on the basis of observed fault free system behavior. An identification algorithm known from

Paris-Sud XI, Université de

203

A event occurrence rules based compact modeling formalism for a class of discrete event systems

The analysis, failure diagnosis and control of discrete event systems (DESs) requires an accurate model of the system. In this paper we present a methodology which makes the task of modeling DESs considerably less cumbersome, less error prone, and more user-friendly than it usually is. In doing so we simplify a modeling formalism by eliminating \\

V. Chandra; R. Kumar

2002-01-01

204

Safety Control of Discrete Event Systems Using Finite State Machines with Parameters

-4383. 1 #12;most notably automata or finite state machines [RW87, CL99], Petri nets [Pet81, HKG97Safety Control of Discrete Event Systems Using Finite State Machines with Parameters Yi-Liang Chen modeled as finite state machines have been well developed over the years in addressing various fundamental

Lin, Feng

205

Nonblocking and Safe Control of Discrete Event Systems modeled as Extended Finite Automata

finite automata (EFA) framework, obtained by augmenting a standard finite state automaton (FSA1 Nonblocking and Safe Control of Discrete Event Systems modeled as Extended Finite Automata Lucien Automata (EFA), i.e., finite au- tomata extended with variables, are a suitable modeling frame- work

Kumar, Ratnesh

206

ENG SE/EC/ME733: Discrete Event and Hybrid Systems Description

models such as Hybrid Automata. In the second part of the course, this material is used to introduce. The third part of the course covers more advanced material on control and optimization of DES and HS. TopicsENG SE/EC/ME733: Discrete Event and Hybrid Systems Description: Prereq: EK500 or EC505 or consent

Goldberg, Bennett

207

Scheduling trains on a railway network using a discrete event model of railway traffic

Scheduling trains in a railway network is a fundamental operational problem in the railway industry. A local feedback-based travel advance strategy is developed using a discrete event model of train advances along lines of the railway. This approach can quickly handle perturbations in the schedule and is shown to perform well on three time-performance criteria while maintaining the local nature

M. J. Dorfman; J. Medanic

2004-01-01

208

Anticipating railway operation disruption events based on the analysis of discrete-event diagnostic increases operating costs and reduces patronage. In railway systems reliability is significantly influenced on reliability and availability, many advanced railway systems and components include monitoring and diagnostic

Paris-Sud XI, Université de

209

Parallel Computer Simulations of Heat Transfer in Biological Tissues

Parallel computer simulation of heat transfer in parts of the human body is described. Realistic geometric models and tissues\\u000a with different thermodynamic properties are analyzed. The principal steps of the computer simulations, including mathematical\\u000a and geometric modeling, domain discretization, numerical solution, validation of simulated results, and visualization, are\\u000a described. An explicit finite difference method for the inhomogeneous computational domain has

Roman Trobec

210

Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

2007-01-01

211

Fault Diagnosis of Continuous Systems Using Discrete-Event Methods Matthew Daigle, Xenofon.j.daigle,xenofon.koutsoukos,gautam.biswas@vanderbilt.edu Abstract-- Fault diagnosis is crucial for ensuring the safe operation of complex engineering systems fault isolation in systems with complex continuous dynamics. This paper presents a novel discrete- event

Koutsoukos, Xenofon D.

212

A hybrid parallel framework for the cellular Potts model simulations

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

2009-01-01

213

Parallel Monte Carlo Simulation for control system design

NASA Technical Reports Server (NTRS)

The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.

Schubert, Wolfgang M.

1995-01-01

214

Parallel runway requirement analysis study. Volume 2: Simulation manual

NASA Technical Reports Server (NTRS)

This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.

Ebrahimi, Yaghoob S.; Chun, Ken S.

1993-01-01

215

Parallelization of Rocket Engine Simulator Software (PRESS)

NASA Technical Reports Server (NTRS)

We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10/18/99). At the least, the research would need to be done on Windows 95/Windows NT based platforms. Moreover, with the acquisition of Lahey Fortran package for PC platform, and the existing Borland C + + 5. 0, we can do work on C + + wrapper issues. We have carefully studied the blueprint for Space Transportation Propulsion Integrated Design Environment for the next 25 years [13] and found the inclusion of HBCUs in that effort encouraging. Especially in the long period for which a map is provided, there is no doubt that HBCUs will grow and become better equipped to do meaningful research. In the shorter period, as was suggested in our presentation at the HBCU conference, some key decisions regarding the aging Fortran based software for rocket propellants will need to be made. One important issue is whether or not object oriented languages such as C + + or Java should be used for distributed computing. Whether or not "distributed computing" is necessary for the existing software is yet another, larger, question to be tackled with.

Cezzar, Ruknet

1998-01-01

216

Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

217

Xyce Parallel Electronic Simulator : reference guide, version 4.1.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-02-01

218

Xyce parallel electronic simulator reference guide, version 6.0.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G. [Raytheon, Albuquerque, NM

2013-08-01

219

Parallel computer simulation of autotransformer-fed AC traction networks

Simulation of electrical conditions in power lines and rails in an on-line mode in AC and DC electrified railways is necessary for effective engineering design. A description is given of the use of a parallel computer to simulate voltages along an autotransformer-fed AC railway. The algorithm, based on the solution of algebraic equations for a single train, produces a faster-than-real-time

R. John Hill; I. H. Cevik

1990-01-01

220

Xyce Parallel Electronic Simulator : reference guide, version 2.0.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

221

Parallel electronic circuit simulation on the iPSC system

A parallel circuit simulator was implemented on the iPSC system. Concurrent model evaluation, hierarchical BBDF (bordered block diagonal form) reordering, and distributed multifrontal decomposition to solve the sparse matrix are used. A speedup of six times has been achieved on an eight-processor iPSC hypercube system

C.-P. Yuan; R. Lucas; P. Chan; R. Dutton

1988-01-01

222

Automatic Computation of Sensitivities for a Parallel Aerodynamic Simulation

Derivatives of functions given in the form of large-scale simulation codes are frequently used in computational science and engineering. Examples include design optimization, parameter estimation, solution of nonlinear systems, and inverse problems. In this note we address the computation of derivatives of a parallel computational fluid dynamics code by automatic differ- entiation. More precisely, we are interested in the derivatives

Arno Rasch; H. Martin Bücker; Christian H. Bischof

2007-01-01

223

Parallel Simulations for Analysing Portfolios of Catastrophic Event Risk

occurrence patterns and characteristics of catastrophe perils such as hurricanes, tornadoes, severe winterParallel Simulations for Analysing Portfolios of Catastrophic Event Risk A. K. Bahl Center Value at Risk (TVAR) for a variety of types of complex property catastrophe insurance contracts

Rau-Chaplin, Andrew

224

Pseudorandom number generator for massively parallel molecular-dynamics simulations

A class of uniform pseudorandom number generators is proposed for modeling and simulations on massively parallel computers. The algorithm is simple, nonrecursive, and is easily transported to serial or vector computers. We have tested the procedure for uniformity, independence, and correlations by several methods. Related, less complex sequences passed some of these tests well enough; however, inadequacies were revealed by

Brad Holian; Ora Percus; Tony Warnock; Paula Whitlock

1994-01-01

225

Parallel Program Complex for 3D Unsteady Flow Simulation

A parallel program complex for 3D viscous gas flow simulation is presented. This complex is based on explicit finite difference schemes, which are constructed as an approximation of conservation laws (control volume method) and oriented on use of locally refined grids. Special algorithm and utility for nested grid partitioning was created. The principle of program construction permits to introduce new

Eugene V. Shilnikov

2006-01-01

226

Parallelization Strategies for Large Particle Simulations in Astrophysics

NASA Astrophysics Data System (ADS)

The modeling of collisional N-body stellar systems is a topic of great current interest in several branches of astrophysics and cosmology. These systems are dominated by the physics of relaxation, the collective effect of many weak, random gravitational encounters between stars. They connect directly to our understanding of star clusters, and to the formation of exotic objects such as X-ray binaries, pulsars, and massive black holes. As a prototypical multi-physics, multi-scale problem, the numerical simulation of such systems is computationally intensive, and can only be achieved through high-performance computing. The goal of this thesis is to present parallelization and optimization strategies that can be used to develop efficient computational tools for simulating collisional N-body systems. This leads to major advances: 1) From an astrophysics perspective, these tools enable the study of new physical regimes out of reach by previous simulations. They also lead to much more complete parameter space exploration, allowing direct comparison of numerical results to observational data. 2) On the high-performance computing front, efficient parallelization of a multi-component application requires the meticulous redesign of the various components, as well as innovative parallelization techniques. Many of the challenges faced in this process lie at the very heart of high-performance computing research, including achieving optimal load balancing, maximizing utilization of computational resources, and making effective use of different parallel platforms. For modeling collisional N-body systems, a Monte Carlo approach provides ideal balance between speed and accuracy, as opposed to the more accurate but less scalable direct N-body method. We describe the development of a new version of the Cluster Monte Carlo (CMC) code capable of simulating systems with a realistic number of stars, while accounting for all important physical processes. This efficient and scalable parallel version of CMC runs on both GPUs and distributed-memory architectures. We introduce various parallelization and optimization strategies that include the use of best-suited data structures, adaptive data partitioning schemes, parallel random number generation, parallel I/O, and optimized parallel algorithms, resulting in a very desirable scalability of the run-time with the processor number.

Pattabiraman, Bharath

227

Parallel Performance Optimization of the Direct Simulation Monte Carlo Method

NASA Astrophysics Data System (ADS)

Although the direct simulation Monte Carlo (DSMC) particle method is more computationally intensive compared to continuum methods, it is accurate for conditions ranging from continuum to free-molecular, accurate in highly non-equilibrium flow regions, and holds potential for incorporating advanced molecular-based models for gas-phase and gas-surface interactions. As available computer resources continue their rapid growth, the DSMC method is continually being applied to increasingly complex flow problems. Although processor clock speed continues to increase, a trend of increasing multi-core-per-node parallel architectures is emerging. To effectively utilize such current and future parallel computing systems, a combined shared/distributed memory parallel implementation (using both Open Multi-Processing (OpenMP) and Message Passing Interface (MPI)) of the DSMC method is under development. The parallel implementation of a new state-of-the-art 3D DSMC code employing an embedded 3-level Cartesian mesh will be outlined. The presentation will focus on performance optimization strategies for DSMC, which includes, but is not limited to, modified algorithm designs, practical code-tuning techniques, and parallel performance optimization. Specifically, key issues important to the DSMC shared memory (OpenMP) parallel performance are identified as (1) granularity (2) load balancing (3) locality and (4) synchronization. Challenges and solutions associated with these issues as they pertain to the DSMC method will be discussed.

Gao, Da; Zhang, Chonglin; Schwartzentruber, Thomas

2009-11-01

228

A new approach for high-speed simulation is applied to the analysis of nuclear power system dynamics. The proposed approach is to first identify inherent parallelism and then to develop suitable parallel computation algorithms. The latter includes numerical integration and table lookup techniques that can be used for achieving high-speed simulation. A performance evaluation of the proposed methodology has been completed, which is based on benchmark simulation for pressurized water reactor plant dynamics. The multirate integration algorithm and an innovative table lookup technique running on a parallel processing computer system have proved to be the most advantageous in computational speed.

Yeh, H.C.; Kastenberg, W.E.; Karplus, W.J.

1989-01-01

229

Reusable Component Model Development Approach for Parallel and Distributed Simulation

Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

2014-01-01

230

Real-time digital simulation of dynamic systems using parallelism

In recent years, there has been a growing need for the real-time digital simulation of dynamic systems. This technique can help engineers not only understand a complex physical system to support the development of sophisticated digital controls but also evaluate the validation of digital control hardware and software developed. For a complex system, more computing power is required. However, the consistent escalation of component speed characterizing earlier decades did not occur in the last decade, we must seek greater computing speed through the parallel use of conventional components. Parallelism can be achieved through either the development of parallel integration algorithms or the decomposition of the dynamic model. There are three main results in this thesis. (1) New parallel algorithms which cut down the computation time per step are developed. (2) Their characteristics on the parameter plane are investigated with respect to accuracy and stability range. (3) An algorithm to distribute those time-consuming function evaluations between processors in a distributed system is formed. Two aircraft models are studied to illustrate the idea. The hardware and software problems in the implementation of the system by using microprocessors are also investigated. The simulation results and computation time savings strongly support the development in the real-time digital simulation area.

Yen, K.K.

1985-01-01

231

PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.

Joshi, Abhijit S [ORNL] [ORNL; Jain, Prashant K [ORNL] [ORNL; Mudrich, Jaime A [ORNL] [ORNL; Popov, Emilian L [ORNL] [ORNL

2012-01-01

232

Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

233

Potts-model grain growth simulations: Parallel algorithms and applications

Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.

Wright, S.A.; Plimpton, S.J.; Swiler, T.P. [and others

1997-08-01

234

Xyce Parallel Electronic Simulator Users Guide Version 6.2.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce%40sandia.gov (outside Sandia) xyce-sandia%40sandia.gov (Sandia only)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

2014-09-01

235

Casting pearls ballistically: Efficient massively parallel simulation of particle deposition

We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.

Lubachevsky, B.D. [AT& T Bell Laboratories, Murray Hill, NJ (United States)] [AT& T Bell Laboratories, Murray Hill, NJ (United States); Privman, V. [Clarkson Univ., Potsdam, NY (United States)] [Clarkson Univ., Potsdam, NY (United States); Roy, S.C. [College of William and Mary, Williamsburg, VA (United States)] [College of William and Mary, Williamsburg, VA (United States)

1996-06-01

236

Petascale turbulence simulation using a highly parallel fast multipole method

We present a 0.5 Petaflop\\/s calculation of homogeneous isotropic turbulence in a cube of 2048^3 particles, using a highly parallel fast multipole method (FMM) using 2048 GPUs on the TSUBAME 2.0 system. We compare this particle-based code with a spectral DNS code under the same calculation condition and the same machine. The results of our particle-based turbulence simulation match quantitatively

R. Yokota; T. Narumi; L. A. Barba; K. Yasuoka

2011-01-01

237

Parallel algorithms for simulating continuous time Markov chains

NASA Technical Reports Server (NTRS)

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

Nicol, David M.; Heidelberger, Philip

1992-01-01

238

Time parallelization of advanced operation scenario simulations of ITER plasma

This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA - an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.

Samaddar, D. [ITER Organization, Saint Paul Lez Durance, France; Casper, T. A. [Lawrence Livermore National Laboratory (LLNL); Kim, S. H. [ITER Organization, Saint Paul Lez Durance, France; Berry, Lee A [ORNL; Elwasif, Wael R [ORNL; Batchelor, Donald B [ORNL; Houlberg, Wayne A [ORNL

2013-01-01

239

We review and develop techniques to determine associations between series of discrete events. The bootstrap, a nonparametric statistical method, allows the determination of the significance of associations with minimal assumptions about the underlying processes. We find the key requirement for this method: one of the series must be widely spaced in time to guarantee the theoretical applicability of the bootstrap. If this condition is met, the calculated significance passes a reasonableness test. We conclude with some potential future extensions and caveats on the applicability of these methods. The techniques presented have been implemented in a Python-based software toolkit.

Niehof, Jonathan T.; Morley, Steven K.

2012-01-01

240

Stochastic Event Counter for Discrete-Event Systems Under Unreliable Observations

This paper addresses the issues of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). First, we develop a noble recursive procedure that updates active counter information state sequentially with available observations. In general, the cardinality of active counter information state is unbounded, which makes the exact recursion infeasible computationally. To overcome this difficulty, we develop an approximated recursive procedure that regulates and bounds the size of active counter information state. Using the approximated active counting information state, we give an approximated minimum mean square error (MMSE) counter. The developed algorithms are then applied to count special routing events in a material flow system.

Tae-Sic Yoo; Humberto E. Garcia

2008-06-01

241

K-NN algorithm in Parallel VLSI Simulation School of Computer Science

level circuit simulation. A fundamental problem posed by a parallel environment is the decision of whether it is best to simulate a particular circuit sequentially or on a parallel platform.Furthermore, in the event that a circuit should be simulated on a parallel platform, it is necessary to decide how many

Tropper, Carl

242

An automated parallel simulation execution and analysis approach

NASA Astrophysics Data System (ADS)

State-of-the-art simulation computing requirements are continually approaching and then exceeding the performance capabilities of existing computers. This trend remains true even with huge yearly gains in processing power and general computing capabilities; simulation scope and fidelity often increases as well. Accordingly, simulation studies often expend days or weeks executing a single test case. Compounding the problem, stochastic models often require execution of each test case with multiple random number seeds to provide valid results. Many techniques have been developed to improve the performance of simulations without sacrificing model fidelity: optimistic simulation, distributed simulation, parallel multi-processing, and the use of supercomputers such as Beowulf clusters. An approach and prototype toolset has been developed that augments existing optimization techniques to improve multiple-execution timelines. This approach, similar in concept to the SETI @ home experiment, makes maximum use of unused licenses and computers, which can be geographically distributed. Using a publish/subscribe architecture, simulation executions are dispatched to distributed machines for execution. Simulation results are then processed, collated, and transferred to a single site for analysis.

Dallaire, Joel D.; Green, David M.; Reaper, Jerome H.

2004-08-01

243

Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

2005-06-01

244

Simulation and parallel connection of step-down piezoelectric transformers

NASA Astrophysics Data System (ADS)

Piezoelectric transformers have been used widely in electronic circuits due to advantages such as high efficiency, miniaturization and no flammability; however the output power has been limited. For overcoming this drawback, some research has recently been focused on connections between piezoelectric transformers. Based on these operations, the output power has been improved compared to the single operation. Parallel operation of step-down piezoelectric transformers is presented in this paper. An important factor affecting the parallel operation of piezoelectric transformer was the resonance frequency, and a small difference in resonance frequencies was obtained with transformers having the same dimensions and fabricating processes. The piezoelectric transformers were found to operate in first radial mode at a frequency of 68 kHz. An equivalent circuit was used to investigate parallel driving of piezoelectric transformers and then to compare the result with experimental observations. The electrical characteristics, including the output voltage, output power and efficient were measured at a matching resistive load. Effects of frequency on the step-down ratio and of the input voltage on the power properties in the simulation were similar to the experimental results. The output power of the parallel operation was 35 W at a load of 50 ? and an input voltage of 100 V; the temperature rise was 30 °C and the efficiency was 88%.

Thang, Vo Viet; Kim, Insung; Jeong, Soonjong; Kim, Minsoo; Song, Jaesung

2012-01-01

245

A massively parallel cellular automaton for the simulation of recrystallization

NASA Astrophysics Data System (ADS)

A new implementation of a cellular automaton for the simulation of primary recrystallization in 3D space is presented. In this new approach, a parallel computer architecture is utilized to partition the simulation domain into multiple computational subdomains that can be treated as coupled, gradually coupled or decoupled entities. This enabled us to identify the characteristic growth length associated with the space repartitioning during nucleus growth. In doing so, several communication strategies between the simulation domains were implemented and tested for accuracy and parallel performance. Specifically, the model was applied to investigate the effect of a gradual spatial decoupling on microstructure evolution during oriented growth of random texture components into a deformed Al single crystal. For a domain discretized into one billion cells, it was found that a particular decoupling strategy resulted in faster executions of about two orders of magnitude and highly accurate simulations. Further partition of the domain into isolated entities systematically and negatively impacts microstructure evolution. We investigated this effect quantitatively by geometrical considerations.

Kühbach, M.; Barrales-Mora, L. A.; Gottstein, G.

2014-10-01

246

Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

247

Mapping a battlefield simulation onto message-passing parallel architectures

NASA Technical Reports Server (NTRS)

Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.

Nicol, David M.

1987-01-01

248

Conservative parallel simulation of priority class queueing networks

NASA Technical Reports Server (NTRS)

A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

Nicol, David M.

1990-01-01

249

MRISIMUL: a GPU-based parallel approach to MRI simulations.

A new step-by-step comprehensive MR physics simulator (MRISIMUL) of the Bloch equations is presented. The aim was to develop a magnetic resonance imaging (MRI) simulator that makes no assumptions with respect to the underlying pulse sequence and also allows for complex large-scale analysis on a single computer without requiring simplifications of the MRI model. We hypothesized that such a simulation platform could be developed with parallel acceleration of the executable core within the graphic processing unit (GPU) environment. MRISIMUL integrates realistic aspects of the MRI experiment from signal generation to image formation and solves the entire complex problem for densely spaced isochromats and for a densely spaced time axis. The simulation platform was developed in MATLAB whereas the computationally demanding core services were developed in CUDA-C. The MRISIMUL simulator imaged three different computer models: a user-defined phantom, a human brain model and a human heart model. The high computational power of GPU-based simulations was compared against other computer configurations. A speedup of about 228 times was achieved when compared to serially executed C-code on the CPU whereas a speedup between 31 to 115 times was achieved when compared to the OpenMP parallel executed C-code on the CPU, depending on the number of threads used in multithreading (2-8 threads). The high performance of MRISIMUL allows its application in large-scale analysis and can bring the computational power of a supercomputer or a large computer cluster to a single GPU personal computer. PMID:24595337

Xanthis, Christos G; Venetis, Ioannis E; Chalkias, A V; Aletras, Anthony H

2014-03-01

250

Development of magnetron sputtering simulator with GPU parallel computing

NASA Astrophysics Data System (ADS)

Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that initially both simulations are in good agreement; however, differences develop over time due to statistical noise in the PIC-MCC GPGPU model.

Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil

2014-12-01

251

CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation

NASA Astrophysics Data System (ADS)

We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (?2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

Schneider, Evan E.; Robertson, Brant E.

2015-04-01

252

Numerical Simulation of Flow Field Within Parallel Plate Plastometer

NASA Technical Reports Server (NTRS)

Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.

Antar, Basil N.

2002-01-01

253

Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

As the single CPU core performance is saturating while the number of cores in the fastest supercomputers increases exponentially, the parallel performance of simulations on distributed memory machines is crucial. At the same time, utilizing efficiently the large number of available cores presents a challenge, especially in simulations with run-time adaptive mesh refinement which can be the key to high performance. We have developed a generic grid library (dccrg) that is easy to use and scales well up to tens of thousands of cores. The grid has several attractive features: It 1) allows an arbitrary C++ class or structure to be used as cell data; 2) is easy to use and provides a simple interface for run-time adaptive mesh refinement ; 3) transfers the data of neighboring cells between processes transparently and asynchronously; and 4) provides a simple interface to run-time load balancing, e.g. domain decomposition, through the Zoltan library. Dccrg is freely available from https://gitorious.org/dccrg for anyone to use, study and modify under the GNU Lesser General Public License version 3. We present an overview of the implementation of dccrg, its parallel scalability and several source code examples of its usage in different types of simulations.

Honkonen, Ilja; von Alfthan, Sebastian; Sandroos, Arto; Janhunen, Pekka; Palmroth, Minna

2013-04-01

254

Massively parallel algorithms for trace-driven cache simulations

NASA Technical Reports Server (NTRS)

Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

1991-01-01

255

Data analysis for parallel car-crash simulation results and model optimization

The paper discusses automotive crash simulation in a stochastic context, whereby the uncertainties in numerical simulation results generated by parallel computing. Since crash is a non-repeatable phenomenon, qualification for crashworthiness based on a single test is not meaningful, and should be replaced by stochastic simulation. But the stochastic simulations may generate different results on parallel machines, if the same application

Liquan Mei; Clemens-august Thole

2008-01-01

256

A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

Rao, Hariprasad Nannapaneni

1989-01-01

257

Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

NASA Technical Reports Server (NTRS)

This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.

Kiris, Cetin C.; Kwak, Dochan; Chan, William

2000-01-01

258

Parallelism and pipelining in high-speed digital simulators

NASA Technical Reports Server (NTRS)

The attainment of high computing speed as measured by the computational throughput is seen as one of the most challenging requirements. It is noted that high speed is cardinal in several distinct classes of applications. These classes are then discussed; they comprise (1) the real-time simulation of dynamic systems , (2) distributed parameter systems, and (3) mixed lumped and distributed systems. From the 1950s on, the quest for high speed in digital simulators concentrated on overcoming the limitations imposed by the so-called von Neumann bottleneck. Two major architectural approaches have made ig possible to circumvent this bottleneck and attain high speeds. These are pipelining and parallelism. Supercomputers, peripheral array processors, and microcomputer networks are then discussed.

Karplus, W. J.

1983-01-01

259

Exception handling controllers: An application of pushdown systems to discrete event control

Recent work by the author has extended the Supervisory Control Theory to include the class of control languages defined by pushdown machines. A pushdown machine is a finite state machine extended by an infinite stack memory. In this paper, we define a specific type of deterministic pushdown machine that is particularly useful as a discrete event controller. Checking controllability of pushdown machines requires computing the complement of the controller machine. We show that Exception Handling Controllers have the property that algorithms for taking their complements and determining their prefix closures are nearly identical to the algorithms available for finite state machines. Further, they exhibit an important property that makes checking for controllability extremely simple. Hence, they maintain the simplicity of the finite state machine, while providing the extra power associated with a pushdown stack memory. We provide an example of a useful control specification that cannot be implemented using a finite state machine, but can be implemented using an Exception Handling Controller.

Griffin, Christopher H [ORNL] [ORNL

2008-01-01

260

Safety analysis of discrete event systems using a simplified Petri net controller.

This paper deals with the problem of forbidden states in discrete event systems based on Petri net models. So, a method is presented to prevent the system from entering these states by constructing a small number of generalized mutual exclusion constraints. This goal is achieved by solving three types of Integer Linear Programming problems. The problems are designed to verify the constraints that some of them are related to verifying authorized states and the others are related to avoiding forbidden states. The obtained constraints can be enforced on the system using a small number of control places. Moreover, the number of arcs related to these places is small, and the controller after connecting them is maximally permissive. PMID:24074873

Zareiee, Meysam; Dideban, Abbas; Asghar Orouji, Ali

2014-01-01

261

Did a discrete event 200,000-100,000 years ago produce modern humans?

Scenarios for modern human origins are often predicated on the assumption that modern humans arose 200,000-100,000 years ago in Africa. This assumption implies that something 'special' happened at this point in time in Africa, such as the speciation that produced Homo sapiens, a severe bottleneck in human population size, or a combination of the two. The common thread is that after the divergence of the modern human and Neandertal evolutionary lineages ?400,000 years ago, there was another discrete event near in time to the Middle-Late Pleistocene boundary that produced modern humans. Alternatively, modern human origins could have been a lengthy process that lasted from the divergence of the modern human and Neandertal evolutionary lineages to the expansion of modern humans out of Africa, and nothing out of the ordinary happened 200,000-100,000 years ago in Africa. Three pieces of biological (fossil morphology and DNA sequences) evidence are typically cited in support of discrete event models. First, living human mitochondrial DNA haplotypes coalesce ?200,000 years ago. Second, fossil specimens that are usually classified as 'anatomically modern' seem to appear shortly afterward in the African fossil record. Third, it is argued that these anatomically modern fossils are morphologically quite different from the fossils that preceded them. Here I use theory from population and quantitative genetics to show that lengthy process models are also consistent with current biological evidence. That this class of models is a viable option has implications for how modern human origins is conceptualized. PMID:22658331

Weaver, Timothy D

2012-07-01

262

Dynamic process simulation of a distillation column on a shared memory parallel processor computer

In this study, results of an investigation into applying parallel computing on a shared memory multiprocessor computer to the dynamic process simulation of a distillation column with use of a sequential modular simulator are reported. Two DYFLO process simulation models of distillation columns were parallelized and ported to a BBN Butterfly Parallel Processor computer. Computations were performed with up to 14 concurrently operating processors. General performance aspects of simulation on parallel computers are discussed and speedup as a function of humber of concurrently operating processors is reported for the two distillation column simulations.

Cera, G.D. (Central Research and Development Dept., E.I. du Pont de Nemours and Co., Wilmington, DE (US))

1988-01-01

263

Parallel continuous simulated tempering and its applications in large-scale molecular simulations

NASA Astrophysics Data System (ADS)

In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

2014-07-01

264

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

The design of future parallel computers requires rapid simulation of target designs running realistic workloads. These simulations have been accelerated using two techniques: direct execution and the use of a parallel host. Historically, these techniques have been considered to have poor portability. This paper identi- fies and describes the implementation of four key oper- ations necessary to make such simulation

Shubhendu S. Mukherjee; Steven K. Reinhardt; Babak Falsafi; Mike Litzkow; Steve Huss-Lederman; Mark D. Hill; James R. Larus; David A. Wood

1997-01-01

265

Parallel algorithm for spin and spin-lattice dynamics simulations

NASA Astrophysics Data System (ADS)

To control numerical errors accumulated over tens of millions of time steps during the integration of a set of highly coupled equations of motion is not a trivial task. In this paper, we propose a parallel algorithm for spin dynamics and the newly developed spin-lattice dynamics simulation [P. W. Ma , Phys. Rev. B 78, 024434 (2008)]. The algorithm is successfully tested in both types of dynamic calculations involving a million spins. It shows good stability and numerical accuracy over millions of time steps (˜1ns) . The scheme is based on the second-order Suzuki-Trotter decomposition (STD). The usage can avoid numerical energy dissipation despite the trajectory and machine errors. The mathematical base of the symplecticity, for properly decomposed evolution operators, is presented. Due to the noncommutative nature of the spin in the present STD scheme, a unique parallel algorithm is needed. The efficiency and stability are tested. It can attain six to seven times speed up when eight threads are used. The run time per time step is linearly proportional to the system size.

Ma, Pui-Wai; Woo, C. H.

2009-04-01

266

Petascale turbulence simulation using a highly parallel fast multipole method

We present a 0.5 Petaflop/s calculation of homogeneous isotropic turbulence in a cube of 2048^3 particles, using a highly parallel fast multipole method (FMM) using 2048 GPUs on the TSUBAME 2.0 system. We compare this particle-based code with a spectral DNS code under the same calculation condition and the same machine. The results of our particle-based turbulence simulation match quantitatively with that of the spectral method. The calculation time for one time step is approximately 30 seconds for both methods; this result shows that the scalability of the FMM starts to become an advantage over FFT-based methods beyond 2000 GPUs.

Yokota, R; Barba, L A; Yasuoka, K

2011-01-01

267

Pseudorandom number generator for massively parallel molecular-dynamics simulations

A class of uniform pseudorandom number generators is proposed for modeling and simulations on massively parallel computers. The algorithm is simple, nonrecursive, and is easily transported to serial or vector computers. We have tested the procedure for uniformity, independence, and correlations by several methods. Related, less complex sequences passed some of these tests well enough; however, inadequacies were revealed by tests for correlations and in an interesting application, namely, annealing from an initial lattice that is mechanically unstable. In the latter case, initial velocities chosen by a random number generator that is not sufficiently random lead quickly to unphysical regularity in grain structure. The new class of generators passes this dynamical diagnostic for unwanted correlations.

Holian, B.L. (Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 (United States)); Percus, O.E. (Courant Institute of Mathematical Science, New York University, 251 Mercer Street, New York, New York 10012 (United States)); Warnock, T.T. (Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 (United States)); Whitlock, P.A. (Computer and Information Sciences Department, Brooklyn College, 2900 Bedford Avenue, Brooklyn, New York 11210 (United States))

1994-08-01

268

Efficiency of Parallel Machine for Large-Scale Simulation in Computational Physics

In this paper, we report on the efficiency of parallelization for atomistic-level large-scale simulations. Tight-binding and ab-initio molecular dynamics simulations are carried out on a supercomputer HITAC S-3800\\/380 and on a parallel computer HITAC SR2201. We compare the efficiencies of the two different machines based on large scale simulations to investigate advantages and disadvantages of parallel architecture.

Hiroshi Mizuseki; Keivan Esfarjani; Zhi-qiang Li; Kaoru Ohno; Yoko Akiyama; Kyoko Ichinoseki; Yoshiyuki Kawazoe

1997-01-01

269

Optimistic Parallel Simulation of Reliable Multicast Protocols \\Lambda Dan Rubenstein Jim Kurose Don Towsley Department of Computer Science University of Massachusetts Amherst MA 01003 USA fdrubenst,kurose

Massachusetts at Amherst, University of

270

Strengths & Drawbacks of MILP, CP and Discrete-Event Simulation based

Seminar, Carnegie Mellon University 2 #12;Introduction Â· Scheduling plays an important role in most manufacturing and service industries Â Pulp & Paper, Oil & Gas, Food & Beverages, Pharmaceuticals Â· Type of decisions involved Â Define production tasks from customer orders Â Assign production tasks to resources

Grossmann, Ignacio E.

271

through the rise of early cities in the Bronze Age [1]. The initial model focuses on the Jordan Valley from the middle of the Early Bronze Age I to the middle of the Early Bronze Age II - III (c.a. 3350 B

272

In most decision-analytic models in health care, it is assumed that there is treatment without delay and availability of all required resources. Therefore, waiting times caused by limited resources and their impact on treatment effects and costs often remain unconsidered. Queuing theory enables mathematical analysis and the derivation of several performance measures of queuing systems. Nevertheless, an analytical approach with

Beate Jahn; Engelbert Theurl; Uwe Siebert; Karl-Peter Pfeiffer

2010-01-01

273

Using Discrete Event Simulation to Model Multi-Robot Multi-Operator Teamwork

With the increasing need for teams of operators in controlling multiple robots, it is important to understand how to construct the team and support team processes. While running experiments can be time consuming and ...

Gao, F.

274

Quantifying supply chain disruption risk using Monte Carlo and discrete-event simulation

We present a model constructed for a large consumer products company to assess their vulnerability to disruption risk and quantify its impact on customer service. Risk profiles for the locations and connections in the ...

Schmitt, Amanda J.

275

Using Discrete-Event Simulation to Model Situational Awareness of Unmanned-Vehicle Operators

Nehme, Jacob W. Crandall, and M. L. Cummings Department of Aeronautics and Astronautics Massachusetts vehicles (Cummings et al., 2007; Olsen & Wood, 2003; Ruff et al., 2002). Earlier models of operator vehicles (Cummings & Mitchell, in press; Cummings et al., 2007). Although it is possible for human beings

Cummings, Mary "Missy"

276

Systems analysis and optimization through discrete event simulation at Amazon.com

The basis for this thesis involved a six and a half month LFM internship at the Amazon.com fulfillment center in the United Kingdom. The fulfillment center management sought insight into the substantial variation across ...

Price, Cameron S. (Cameron Stalker), 1972-

2004-01-01

277

NASA Technical Reports Server (NTRS)

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

Hsieh, Shang-Hsien

1993-01-01

278

Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations

Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.

Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia

2010-08-01

279

A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories

As parallel algorithms and architectures drive the longest molecular dynamics (MD) simulations towards the millisecond scale, traditional sequential post-simulation data analysis methods are becoming increasingly untenable. Inspired by the programming interface of Google's MapReduce, we have built a new parallel analysis framework called HiMach, which allows users to write trajectory analysis programs sequentially, and carries out the parallel execution of

Tiankai Tu; Charles A. Rendleman; David W. Borhani; Ron O. Dror; Justin Gullingsrud; Morten Ø. Jensen; John L. Klepeis; Paul Maragakis; Patrick Miller; Kate A. Stafford; David E. Shaw

2008-01-01

280

Parallel Computation in Simulating Di usion and Deformation in Human Brain

Parallel Computation in Simulating Di#11;usion and Deformation in Human Brain #3; Ning Kang y Jun of parallel and high performance computation in simulating the di#11;usion process in the human brain and in modeling the deformation of the human brain. Computational neuroscience is a branch of biomedical science

Zhang, Jun

281

Exploration of Cancellation Strategies for Parallel Simulation on Multi-core Beowulf Clusters

Exploration of Cancellation Strategies for Parallel Simulation on Multi-core Beowulf Clusters of cancellation strategies on multi-core Beowulf clusters with both shared memory and distributed memory, and dynamic cancellation strategies for parallel simulation on multi-core Beowulf Clusters in a multi

Wilsey, Philip A.

282

Parallel algorithm for mass transfer simulations of weakly-magnetic nanoparticles

We present a parallel algorithm for the simulation of mass transfer of weakly-magnetic nanoparticles in high gradient magnetic separation. The transport phenomena of weakly-magnetic nanoparticles in regions around a ferromagnetic long cylindrical wire in static fluid is considered. The normalized continuity equations describing the dynamics of particle volume concentration are solved numerically by using finite difference method. Parallel simulation is

Kanok Hournkumnuard; Chantana Phongpensri

2009-01-01

283

High performance parallel computers offer the promise of sufficient computational power to enable the routine use of large scale simulations during the process of engineering design. With this in mind, and with particular reference to the aerospace industry, this paper describes developments that have been undertaken to provide parallel implementations of algorithms for simulation, mesh generation and visualization. Designers are

K. Morgan; N. P. Weatherill; O. Hassan; P. J. Brookes; R. Said; J. Jones

1999-01-01

284

Parallel-in-time implementation of transient stability simulations on a transputer network

The most time consuming computer simulation in power system studies is the transient stability analysis. Parallel processing has been applied for time domain simulations of power system transient behavior. In this paper, a parallel implementation of an algorithm based on Shifted-Picard dynamic iterations is presented. The main idea is that a set of nonlinear Differential Algebraic Equations (DAEs), which describes

M. La Scala; G. Sblendorio; R. Sbrizzai

1994-01-01

285

Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

NASA Technical Reports Server (NTRS)

The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

Baganoff, Donald

1996-01-01

286

Contact-impact simulations on massively parallel SIMD supercomputers

The implementation of explicit finite element methods with contact-impact on massively parallel SIMD computers is described. The basic parallel finite element algorithm employs an exchange process which minimizes interprocessor communication at the expense of redundant computations and storage. The contact-impact algorithm is based on the pinball method in which compatibility is enforced by preventing interpenetration on spheres embedded in elements adjacent to surfaces. The enhancements to the pinball algorithm include a parallel assembled surface normal algorithm and a parallel detection of interpenetrating pairs. Some timings with and without contact-impact are given.

Plaskacz, E.J. (Argonne National Lab., IL (United States)); Belytscko, T.; Chiang, H.Y. (Northwestern Univ., Evanston, IL (United States))

1992-01-01

287

A sweep algorithm for massively parallel simulation of circuit-switched networks

NASA Technical Reports Server (NTRS)

A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

1992-01-01

288

GloMoSim: A Library for Parallel Simulation of Large-Scale Wireless Networks

A number of library-based parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication protocol stack for wireless networks is divided into a set of layers, each with its own API. Models of

Xiang Zeng; Rajive Bagrodia; Mario Gerla

1998-01-01

289

Intelligent simulation environments: identification of the basics

A problem exists in efficiently combining a non-deterministic decision capability with a current discrete event simulation language for use by the simulationist (programmer and in the future the user). This paper explores this problem in the context of the discrete event simulation problem domain implemented in Siman. The purpose is (1) to provide an ontological definition of abstract ideas from

Jordan Snyder; Gerald T. Mackulack

1988-01-01

290

Intelligent simulation environments: Identification of the basics

A problem exists in efficiently combining a non-deterministic decision capability with a current discrete event simulation language for use by the simulationist (programmer and in the future the user). This paper explores this problem in the context of the discrete event simulation problem domain implemented in Siman [tml. The purpose is (1) to provide an ontological definition of abstract ideas

Jordan Snyder; Gerald T. Mackulack

1988-01-01

291

Parallel computing in enterprise modeling.

This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

2008-08-01

292

Efficient parallelization of molecular dynamics simulations with short-ranged forces

NASA Astrophysics Data System (ADS)

Recently, an alternative strategy for the parallelization of molecular dynamics simulations with short-ranged forces has been proposed. In this work, this algorithm is tested on a variety of multi-core systems using three types of benchmark simulations. The results show that the new algorithm gives consistent speedups which are depending on the properties of the simulated system either comparable or superior to those obtained with spatial decomposition. Comparisons of the parallel speedup on different systems indicates that on multi-core machines the parallel efficiency of the method is mainly limited by memory access speed.

Meyer, Ralf

2014-10-01

293

Balanced Decomposition for Power System Simulation on Parallel Computers

Rudnick, and Aldo Cipriano Department of Electrical Engineering, Catholic University of Chile P.O. Box 306 to t the prob- lem to coarse grain parallel architectures. The blocks are distributed on the processors

Catholic University of Chile (Universidad CatÃ³lica de Chile)

294

Parallel climate model (PCM) control and transient simulations

The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3)\\u000a and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory\\u000a Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on\\u000a several distributed and

W. M. Washington; J. W. Weatherly; G. A. Meehl; A. J. Semtner Jr.; T. W. Bettge; A. P. Craig; W. G. Strand Jr.; J. M. Arblaster; V. B. Wayland; R. James; Y. Zhang

2000-01-01

295

A NEW DIMENSION OF URBAN CLIMATE MODELLING WITH PARALLEL LARGE-EDDY SIMULATION

We introduce the topography version of the parallelized large-eddy simulation (LES) model PALM and describe its new features and methods and its performance on current supercomputers. Validation shows that PALM is in line with experimental and other LES results, i.e. superior to the conventional Reynolds-averaged models. State-of-the-art parallel computing and parallel, on-the-fly graphics processing make the LES technique quick and

Marcus Oliver Letzel; Manabu Kanda; Siegfried Raasch

296

Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation

Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation Surendra Byna, Jerry. Email: daughton@lanl.gov Â¶Brown University, USA. Email: mhowison@brown.edu Abstract--Petascale plasma physics simulations have recently entered the regime of simulating trillions of particles

Southern California, University of

297

Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method

An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an

H. M. Cave; K.-C. Tseng; J.-S. Wu; M. C. Jermy; J.-C. Huang; S. P. Krumdieck

2008-01-01

298

In this paper, we present an approach for the optimization of a racecar using vehicle dynamics simulation in a parallel- computing environment. The use of vehicle dynamics simulations in the automotive and auto racing industries is widespread. Complex vehicle simulations can include hundreds of parameters and be very computationally expensive to perform. This limits the number of design configurations that

Kurt Hacker; Edward M. Kasprzak; Kemper Lewis

299

This paper presents the performance evaluation, workload characterization and trace driven simulation of a hypercube multi-computer running realistic workloads. Six representative parallel applications were selected as benchmarks. Software monitoring techniques were then used to collect execution traces. Based on the measurement results, we investigated both the computation and communication behavior of these parallel programs, including CPU utilization, computation task granularity,

Jiun-Ming Hsu; Prithviraj Banerjee

1990-01-01

300

Parallel Event-Driven Neural Network Simulations Using the Hodgkin-Huxley Neuron Model

Parallel Event-Driven Neural Network Simulations Using the Hodgkin-Huxley Neuron Model Collin J called the Hodgkin-Huxley neuron[1]. We describe the conversion of this model into an event to determine the feasibility of this parallel event-driven Hodgkin-Huxley model and analyze its viability

Tropper, Carl

301

Computer simulation program for parallel SITAN. [Sandia Inertia Terrain-Aided Navigation, in FORTRAN

This computer program simulates the operation of parallel SITAN using digitized terrain data. An actual trajectory is modeled including the effects of inertial navigation errors and radar altimeter measurements.

Andreas, R.D.; Sheives, T.C.

1980-11-01

302

Parallelization of particle-in-cell simulation modeling Hall-effect thrusters

MIT's fully kinetic particle-in-cell Hall thruster simulation is adapted for use on parallel clusters of computers. Significant computational savings are thus realized with a predicted linear speed up efficiency for certain ...

Fox, Justin M., 1981-

2005-01-01

303

35487-0203, USA June 24, 2003 Abstract We conduct simulations for the 3D unsteady state anisotropic di-algebraic equation, parallel preconditioning techniques 1 Introduction The solution of the general unsteady state

Zhang, Jun

304

iPRIDE: a parallel integrated circuit simulator using direct method

A parallel circuit simulator, iPRIDE, which uses a direct solution method and runs on a shared-memory multiprocessor is described. The simulator is based on a multilevel node tearing approach which produces a nested bordered-block-diagonal (BBD) form of the circuit equation matrix. The parallel solution of the nested BBD matrix is described. Its efficiency is shown to depend on how the

Mi-Chang Chang; I. N. Hajj

1988-01-01

305

's subjective judgment always plays a vital role. Randomized controlled clinical trials are a superior method be applied to other medical or nonmedical problems. Index Terms--Artificial intelligence, discrete event

Lin, Feng

306

A Queue Simulation Tool for a High Performance Scientific Computing Center

NASA Technical Reports Server (NTRS)

The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

Spear, Carrie; McGalliard, James

2007-01-01

307

We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

Kothe, D.B.; Turner, J.A.; Mosso, S.J. [Los Alamos National Lab., NM (United States); Ferrell, R.C. [Cambridge Power Computer Assoc. (United States)

1997-03-01

308

Xyce parallel electronic simulator users' guide, Version 6.0.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory. [Raytheon, Albuquerque, NM

2014-01-01

309

Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506

Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

2014-01-01

310

Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA}}, booktitle = {Proc CSTN-073 Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA D0 Computational Science Technical Note CSTN-073 Data Parallel Three-Dimensional Cahn-Hilliard Field

Hawick, Ken

311

Parallel Numerical Simulation of Boltzmann Transport in Single-Walled Carbon Nanotubes

NSDL National Science Digital Library

This module teaches the basic principles of semi-classical transport simulation based on the time-dependent Boltzmann transport equation (BTE) formalism with performance considerations for parallel implementations of multi-dimensional transport simulation and the numerical methods for efficient and accurate solution of the BTE for both electronic and thermal transport using the simple finite difference discretization and the stable upwind method.

Zlatan Aksamija

312

We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications in this domain model and simulate natural phenomena, and often direct visual components of motion pictures. We study a set of three workloads that exemplify the span

Christopher J. Hughes; Radek Grzeszczuk; Eftychios Sifakis; Daehyun Kim; Sanjeev Kumar; Andrew P. Selle; Jatin Chhugani; Matthew Holliman; Yen-Kuang Chen

2007-01-01

313

Distributing computation among multiple processors is one approach to reducing simulation time for large VLSI circuit designs. However, parallel simulation introduces the problem of how to partition the logic gates and system behaviors of the circuit among the available processors in order to obtain maximum speedup. A complicating factor that is often ignored is the effect of the time-synchronization protocol

Kevin L. Kapp; Thomas C. Hartrum; Tom S. Wailes

1995-01-01

314

A Multidimensional Study on the Feasibility of Parallel Switch-Level Circuit Simulation

This paper presents the results of an experimental study to evaluate the effectiveness of multiple synchronization protocols and partitioning algorithms in reducing the execution time of switch-level models of VLSI circuits. Specific contributions of this paper include: (i) parallelizing an existing switch-level simulator such that the model can be executed using conservative and optimistic simulation protocols with minor changes, (ii)

Yu-an Chen; Vikas Jha; Rajive Bagrodia

1997-01-01

315

Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

2011-09-06

316

This paper presents results of tests with a parallel implementation of a power system dynamic simulation methodology for transient stability analysis in a parallel computer. The test system is a planned configuration of the interconnected Brazilian South-Southeastern power system with 616 buses, 995 lines, and 88 generators. The parallel machine used in the computer simulation is a distributed memory multiprocessor arranged in a hypercube topology architecture. The nodes are based on the Inmos T800 processor with 4 Mbytes of local memory. The simulation methodology is based on the interlaced alternating implicit integration scheme in which the network equations are re-ordered such that the network admittance matrix appears in the block bordered diagonal form and then is solved by a combined application of the LU factorization and the Conjugate Gradient Method. The results obtained show considerable reductions in the simulation time.

Decker, I.C.; Falcao, D.M.; Kaszkurewicz, E. (COPP-EE/Center for Parallel Computing, Federal Univ. of Rio de Janeiro, 21945 Rio de Janeiro (BR))

1992-02-01

317

Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers

Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.

Vashishta, Priya; Kalia, Rajiv

2005-02-24

318

Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

Padovan, Joe

1995-01-01

319

A parallel FFT accelerated transient field-circuit simulator

A novel fast electromagnetic field-circuit simulator that permits the full-wave modeling of transients in nonlinear microwave circuits is proposed. This time-domain simulator is composed of two components: 1) a full-wave solver that models interactions of electromagnetic fields with conducting surfaces and finite dielectric volumes by solving time-domain surface and volume electric field integral equations, respectively, and 2) a circuit solver

Ali E. Yilmaz; Jian-Ming Jin; Eric Michielssen

2005-01-01

320

Parallel Quantum Computer Simulation on the CUDA Architecture

Due to their increasing computational power, modern graphics processing architectures are becoming more and more popular for\\u000a general purpose applications with high performance demands. This is the case of quantum computer simulation, a problem with\\u000a high computational requirements both in memory and processing power. When dealing with such simulations, multiprocessor architectures\\u000a are an almost obliged tool. In this paper we

Eladio Gutiérrez; Sergio Romero; María A. Trenas; Emilio L. Zapata

2008-01-01

321

Computational Efficiency of Parallel Unstructured Finite Element Simulations

In this paper we address various efficiency aspects of finite element (FE) simulations on vector computers. Especially for\\u000a the numerical simulation of large scale Computational Fluid Dynamics (CFD) and Fluid-Structure Interaction (FSI) problems\\u000a efficiency and robustness of the algorithms are two key requirements.\\u000a \\u000a In the first part of this paper a straightforward concept is described to increase the performance of

Malte Neumann; Ulrich Küttler; Sunil Reddy Tiyyagura; Wolfgang A. Wall; Ekkehard Ramm

322

Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)

NASA Astrophysics Data System (ADS)

Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.

Kum, Oyeon

2004-11-01

323

Shared Memory Implementation of a Parallel Switch-Level Circuit Simulator

Circuit simulation is a critical bottleneck in VLSIdesign. This paper describes the implementation ofan existing parallel switch-level simulator called MIRSIMon a shared-memory multiprocessor architecture.The simulator uses a set of three different conservativeprotocols: the null message protocol, the conditionalevent protocol and the accelerated null message protocol,a combinations of the preceding two algorithms.The paper describes the implementation of these protocolsto exploit...

Yu-an Chen; Rajive Bagrodia

1998-01-01

324

Simulations of Ion Thruster Plume–Spacecraft Interactions on Parallel Supercomputer

A parallel three-dimensional electrostatic Particle-In-Cell (PIC) code is developed for large-scale simulations of ion thruster plume-spacecraft interactions on parallel supercomputers. This code is based on a newly developed immersed finite-element (IFE) PIC. The IFE-PIC is designed to handle complex boundary conditions accurately while maintaining the computational speed of the standard PIC code. Domain decomposition is used in both field solve

Joseph Wang; Yong Cao; Raed Kafafy; Julien Pierru; Viktor K. Decyk

2006-01-01

325

Parallel-adaptive simulation with the multigrid-based software framework UG

In this paper we present design aspects and concepts of the unstructured grids (UG) software framework that are relevant for\\u000a parallel-adaptive simulation of time-dependent, nonlinear partial differential equations. The architectural design is discussed\\u000a on system, subsystem and component level for distributed mesh management and local adaptation capabilities. Parallelization\\u000a is founded on top of the innovative programming model dynamic distributed data

Stefan Lang

2006-01-01

326

A parallel algorithm for transient solid dynamics simulations with contact detection

Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations.

Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

1996-06-01

327

Object-oriented particle simulation on parallel computers

A general purpose, object-oriented particle simulation (OOPS) library has been developed for use on a variety of system architectures with a uniform high-level interface. This includes the development of library implementations for the CM5, Intel Paragon, and CRI T3D. Codes written on any of these platforms can be ported to other platforms without modifications by utilizing the high-level library. The general character of the library allows application to such diverse areas as plasma physics, suspension flows, vortex simulations, porous media, and materials science.

Reynders, J.V.W.; Forslund, D.W.; Hinker, P.J.; Tholburn, M.; Kilman, D.G.; Humphrey, W.F.

1994-04-01

328

A General Simulation Framework for Supply Chain Modeling: State of the Art and Case Study

Nowadays there is a large availability of discrete event simulation software that can be easily used in different domains: from industry to supply chain, from healthcare to business management, from training to complex systems design. Simulation engines of commercial discrete event simulation software use specific rules and logics for simulation time and events management. Difficulties and limitations come up when commercial discrete event simulation software are used for modeling complex real world-systems (i.e. supply chains, industrial plants). The objective of this paper is twofold: first a state of the art on commercial discrete event simulation software and an overview on discrete event simulation models development by using general purpose programming languages are presented; then a Supply Chain Order Performance Simulator (SCOPS, developed in C++) for investigating the inventory management problem along the supply chain under different supply chain scenarios is proposed to readers.

Cimino, Antonio; Mirabelli, Giovanni

2010-01-01

329

NASA Technical Reports Server (NTRS)

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

Sohn, Andrew; Biswas, Rupak

1996-01-01

330

Parallel Performance of a Combustion Chemistry Simulation Gregg Skinner

simulation of reactive ow is numerical sti ness. Sti equations have one or more rapidly decaying solutions identi ed the problem of sti ness in ordinary di erential equations in 1952. In reactive ow, sti ness of seconds. Sti ness also results where large temperature gradients occur. To overcome these numerical di

Padua, David

331

Characterization of parallelism and deadlocks in distributed digital logic simulation

This paper explores the suitability of the Chandy-Misra algorithm for digital logic simulation. We use four realistic circuits as benchmarks for our analysis, with one of them being the vector-unit controller for the Titan supercomputer from Ardent. Our results show that the average number of logic elements available for concurrent execution ranges from 10 to 111 for the four circuits,

Larry Soulé; Anoop Gupta

1989-01-01

332

Parallel Transient Dynamics Simulations: Algorithms for Contact Detection

February 5, 1998 Abstract Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to highspeed impacts. Physical ing of crashes and explosions which can involve the interaction of fluids with complex structural

Plimpton, Steve

333

Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

NASA Technical Reports Server (NTRS)

The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.

Sawyer, Darren Charles

1994-01-01

334

NASA Astrophysics Data System (ADS)

In this paper features of numerical simulation of the large-scale system artificial satellites motion by parallel computing is discussed per example instantiation program complex "Numerical model of the system artificial satellites motion" in cluster "Skiff Cyberia". It is shown that using of parallel computing allows to implement simultaneously high-precision numerical simulation of the motion of large-scale system artificial satellites. It opens comprehensive facilities in solve direct and regressive problems of dynamics such satellite system as GLONASS and objects of space debris.

Bordovitsyna, T. V.; Avdyushev, V. A.; Chuvashov, I. N.; Aleksandrova, A. G.; Tomilova, I. V.

2009-11-01

335

NASA Technical Reports Server (NTRS)

The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

Krosel, S. M.; Milner, E. J.

1982-01-01

336

Characterization of parallel-hole collimator using Monte Carlo Simulation

Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730

Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh

2015-01-01

337

Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

338

Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.

Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J. [and others

1996-09-01

339

NASA Technical Reports Server (NTRS)

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

Fijany, Amir (inventor); Bejczy, Antal K. (inventor)

1993-01-01

340

Parallel FEM Simulation of Electromechanics in the Heart

NASA Astrophysics Data System (ADS)

Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.

Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

2011-11-01

341

A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

Jones, Mark Howard

1987-01-01

342

A Solver for Massively Parallel Direct Numerical Simulation of Three-Dimensional Multiphase Flows

We present a new solver for massively parallel simulations of fully three-dimensional multiphase flows. The solver runs on a variety of computer architectures from laptops to supercomputers and on 65536 threads or more (limited only by the availability to us of more threads). The code is wholly written by the authors in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of the LCRM hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We discuss the implementation of this interface method and its particular suitability to distributed processing where all operations are carried out locally on distributed subdomains. We have developed parallel GMRES and Multigrid iterative solvers suited to the linear systems arising from the implicit solution of the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across flu...

Shin, S; Juric, D

2014-01-01

343

Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

NASA Technical Reports Server (NTRS)

A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.

Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

2009-01-01

344

Parallel Brownian dynamics simulations with the message-passing and PGAS programming models

NASA Astrophysics Data System (ADS)

The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes.

Teijeiro, C.; Sutmann, G.; Taboada, G. L.; Touriño, J.

2013-04-01

345

GalaxSee HPC Module 1: The N-Body Problem, Serial and Parallel Simulation

NSDL National Science Digital Library

This module introduces the N-body problem, which seeks to account for the dynamics of systems of multiple interacting objects. Galaxy dynamics serves as the motivating example to introduce a variety of computational methods for simulating change and criteria that can be used to check for model accuracy. Finally, the basic issues and ideas that must be considered when developing a parallel implementation of the simulation are introduced.

David Joiner

346

Xyce parallel electronic simulator reference guide, Version 6.0.1.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] .

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory. [Raytheon, Albuquerque, NM

2014-01-01

347

A parallel multi-block Navier-Stokes solver with the k-? turbulence model is developed to simulate the 3-dimensional un- steady flo w through an annular turbine cascade. Results at mid- span are compared with the experimental results of Standard Test Case 4. Comparisons are made between 3-dimensional and 2- dimensional, and inviscid and viscous simulations. The inclusion of a viscous flo w

Ivan McBean; Feng Liu; Kerry Hourigan; Mark Thompson

2002-01-01

348

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×?120× performance gain over various state-of-the-art serial algorithms when simulating different types of models. PMID:23152751

Komarov, Ivan; D'Souza, Roshan M.

2012-01-01

349

Accelerated, parallelized time and frequency domain simulators for complex high-speed microsystems

The paper presents methodologies for efficient simulation and design of electromagnetic effects in microelectronic circuits for digital, analog, mixed-signal, RF and microwave applications in time and frequency domain. Also parallelization of the presented methodologies and incorporation into standard design flow has been discussed

V. Jandhyala; S. Chakraborty; D. Gope; Chuanyi Yang; I. Choudhury; G. Ouyang

2006-01-01

350

Scalar and Parallel Optimized Implementation of the Direct Simulation Monte Carlo Method

This paper describes a new concept for the implementation of the direct simulation Monte Carlo (DSMC) method. It uses a localized data structure based on a computational cell to achieve high performance, especially on workstation processors, which can also be used in parallel. Since the data structure makes it possible to freely assign any cell to any processor, a domain

Stefan Dietrich; Iain D. Boyd

1996-01-01

351

Direct Simulation Based Model-Predictive Control of Flow Maldistribution in Parallel Microchannels

sinks and heat exchangers, to improve heat transfer effectiveness. For improved efficiency and cooling Mathieu Martin Chris Patton John Schmitt Sourabh V. Apte School of Mechanical, Industrial that employ parallel microchannels for heat transfer. In this work, direct numerical simulations of fluid flow

Apte, Sourabh V.

352

We explore the emerging application area of physics-based simu- lation for computer animation and visual special effects. I n par- ticular, we examine its parallelization potential and char acterize its behavior on a chip multiprocessor (CMP). Applications in this do- main model and simulate natural phenomena, and often direct vi- sual components of motion pictures. We study a set of

Christopher J. Hughes; Radek Grzeszczuk; Eftychios Sifakis; Daehyun Kim; Sanjeev Kumar; Andrew Selle; Jatin Chhugani; Matthew J. Holliman; Yen-kuang Chen

2007-01-01

353

Mobile Agents Based Collective Communication: An Application to a Parallel Plasma Simulation

Mobile Agents Based Collective Communication: An Application to a Parallel Plasma Simulation lack in Mobile Agents systems even if mes- sage passing is always supported to grant communication benefit social ability and interactions of collaborative agents. Mobile Agents technology has been widely

Vlad, Gregorio

354

Dynamic modeling and simulation of transmotor based series-parallel HEV applied to Toyota Prius 2004

Pollution and limited fossil fuels are the critical issues that lead HEVs to emerge. Nowadays new designs and topologies of HEVs are suggesting and developing. In this paper we have dynamically modeled and simulated a new structure of series- parallel HEV based on a special machine named Transmotor and applied it to Toyota Prius 2004. In This structure, generator and

Hiva Nasiri; Ahmad Radan; Abbas Ghayebloo; Kiarash Ahi

2011-01-01

355

This paper presents two new techniques for accelerating circuit simulation. The first technique is an improvement of the parallel Waveform Relaxation Newton (WRN) method. The computations of all the timepoints are executed concurrently. Static task partitioning is shown to be an efficient method to limit the scheduling overhead. The second technique combines in a dynamic way the efficiency of the

Patrick Odent; Luc J. M. Claesen; Hugo De Man

1990-01-01

356

On parallel random number generation for accelerating simulations of communication systems

NASA Astrophysics Data System (ADS)

Powerful compute clusters and multi-core systems have become widely available in research and industry nowadays. This boost in utilizable computational power tempts people to run compute-intensive tasks on those clusters, either for speed or accuracy reasons. Especially Monte Carlo simulations with their inherent parallelism promise very high speedups. Nevertheless, the quality of Monte Carlo simulations strongly depends on the quality of the employed random numbers. In this work we present a comprehensive analysis of state-of-the-art pseudo random number generators like the MT19937 or the WELL generator used for parallel stream generation in different settings. These random number generators can be realized in hardware as well as in software and help to accelerate the analysis (or simulation) of communications systems. We show that it is possible to generate high-quality parallel random number streams with both generators, as long as some configuration constraints are met. We furthermore depict that distributed simulations with those generator types are viable even to very high degrees of parallelism.

Brugger, C.; Weithoffer, S.; de Schryver, C.; Wasenmüller, U.; Wehn, N.

2014-11-01

357

Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

358

Robust large-scale parallel nonlinear solvers for simulations.

This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.

Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)

2005-11-01

359

Relationship between parallel faults and stress field in rock mass based on numerical simulation

NASA Astrophysics Data System (ADS)

Parallel cracks and faults, caused by earthquakes and crustal deformations, are often observed in various scales from regional to laboratory scales. However, the mechanism of formation of these parallel faults has not been quantitatively clarified, yet. Since the stress field plays a key role to the nucleation of parallel faults, it is fundamentally to investigate the failure and the extension of cracks in a large-scale rock mass (not with a laboratory-scale specimen) due to mechanically loaded stress field. In this study, we developed a numerical simulations code for rock mass failures under different loading conditions, and conducted rock failure experiments using this code. We assumed a numerical rock mass consisting of basalt with a rectangular shape for the model. We also assumed the failure of rock mass in accordance with the Mohr-Coulomb criterion, and the distribution of the initial tensile and compressive strength of rock elements to be the Weibull model. In this study, we use the Hamiltonian Particle Method (HPM), one of the particle methods, to represent large deformation and the destruction of materials. Out simulation results suggest that the confining pressure would have dominant influence for the initiation of parallel faults and their conjugates in compressive conditions. We conclude that the shearing force would provoke the propagation of parallel fractures along the shearing direction, but prevent that of fractures to the conjugate direction.

Imai, Y.; Mikada, H.; Goto, T.; Takekawa, J.

2012-12-01

360

Parallel Simulation Algorithms for the Three Dimensional Strong-Strong Beam-Beam Interaction

The strong-strong beam-beam effect is one of the most important effects limiting the luminosity of ring colliders. Little is known about it analytically, so most studies utilize numeric simulations. The two-dimensional realm is readily accessible to workstation-class computers (cf.,e.g.,[1, 2]), while three dimensions, which add effects such as phase averaging and the hourglass effect, require vastly higher amounts of CPU time. Thus, parallelization of three-dimensional simulation techniques is imperative; in the following we discuss parallelization strategies and describe the algorithms used in our simulation code, which will reach almost linear scaling of performance vs. number of CPUs for typical setups.

Kabel, A.C.; /SLAC

2008-03-17

361

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

NASA Astrophysics Data System (ADS)

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.

Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji

2013-03-01

362

A parallel 3D Poisson solver for space charge simulation in cylindrical coordinates

NASA Astrophysics Data System (ADS)

This paper presents the development of a parallel three-dimensional Poisson solver in cylindrical coordinate system for the electrostatic potential of a charged particle beam in a circular tube. The Poisson solver uses Fourier expansions in the longitudinal and azimuthal directions, and Spectral Element discretization in the radial direction. A Dirichlet boundary condition is used on the cylinder wall, a natural boundary condition is used on the cylinder axis and a Dirichlet or periodic boundary condition is used in the longitudinal direction. A parallel 2D domain decomposition was implemented in the ( r,?) plane. This solver was incorporated into the parallel code PTRACK for beam dynamics simulations. Detailed benchmark results for the parallel solver and a beam dynamics simulation in a high-intensity proton LINAC are presented. When the transverse beam size is small relative to the aperture of the accelerator line, using the Poisson solver in a Cartesian coordinate system and a Cylindrical coordinate system produced similar results. When the transverse beam size is large or beam center located off-axis, the result from Poisson solver in Cartesian coordinate system is not accurate because different boundary condition used. While using the new solver, we can apply circular boundary condition easily and accurately for beam dynamic simulations in accelerator devices.

Xu, J.; Ostroumov, P. N.; Nolen, J.

2008-02-01

363

Large-scale numerical simulation of laser propulsion by parallel computing

NASA Astrophysics Data System (ADS)

As one of the most significant methods to study laser propelled rocket, the numerical simulation of laser propulsion has drawn an ever increasing attention at present. Nevertheless, the traditional serial simulation model cannot satisfy the practical needs because of insatiable memory overhead and considerable computation time. In order to solve this problem, we study on a general algorithm for laser propulsion design, and bring about parallelization by using a twolevel hybrid parallel programming model. The total computing domain is decomposed into distributed data spaces, and each partition is assigned to a MPI process. A single step of computation operates in the inter loop level, where a compiler directive is used to split MPI process into several OpenMP threads. Finally, parallel efficiency of hybrid program about two typical configurations on a China-made supercomputer with 4 to 256 cores is compared with pure MPI program. And, the hybrid program exhibits better performance than the pure MPI program on the whole, roughly as expected. The result indicates that our hybrid parallel approach is effective and practical in large-scale numerical simulation of laser propulsion.

Zeng, Yaoyuan; Zhao, Wentao; Wang, Zhenghua

2013-05-01

364

NASA Technical Reports Server (NTRS)

This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

Lyons, Daniel T.; Desai, Prasun N.

2005-01-01

365

Parallel Monte Carlo simulations on an ARC-enabled computing grid

NASA Astrophysics Data System (ADS)

Grid computing opens new possibilities for running heavy Monte Carlo simulations of physical systems in parallel. The presentation gives an overview of GaMPI, a system for running an MPI-based random walker simulation on grid resources. Integrating the ARC middleware and the new storage system Chelonia with the Ganga grid job submission and control system, we show that MPI jobs can be run on a world-wide computing grid with good performance and promising scaling properties. Results for relatively communication-heavy Monte Carlo simulations run on multiple heterogeneous, ARC-enabled computing clusters in several countries are presented.

Nilsen, Jon K.; Samset, Bjørn H.

2011-12-01

366

Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.

Turner, Mark G.; Topp, David A.

1998-01-01

367

libMesh: a C++ library for parallel adaptive mesh refinement\\/coarsening simulations

In this paper we describe the \\u000a libMesh\\u000a (http:\\/\\/libmesh.sourceforge.net) framework for parallel adaptive finite element applications. \\u000a libMesh\\u000a is an open-source software library that has been developed to facilitate serial and parallel simulation of multiscale, multiphysics\\u000a applications using adaptive mesh refinement and coarsening strategies. The main software development is being carried out\\u000a in the CFDLab (http:\\/\\/cfdlab.ae.utexas.edu) at the University of Texas, but

Benjamin S. Kirk; John W. Peterson; Roy H. Stogner; Graham F. Carey

2006-01-01

368

Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

369

Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.

This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models. PMID:12714301

Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S

2003-01-01

370

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276

Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.

2011-01-01

371

Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the S?-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

2014-01-01

372

A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

Bui, Trong T.

1999-01-01

373

Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution

We investigate properties of the electric field component parallel to the magnetic field (E/sub /parallel//) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component. We emphasize particularly the spatial location of E/sub /parallel//, the concept of a diffusion zone and the role of E/sub /parallel// in accelerating electrons. We find a localization of the region of enhanced E/sub /parallel// in all space directions with a strong concentration in the z direction. We identify this region as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B/sub y/ implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B/sub y/ field is strong enough to force particles to follow field lines through the diffusion region. We estimate that for a typical net B/sub y/ field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the northern (southern) hemisphere should dominate for duskward (dawnward) net B/sub y/. In addition, we observe a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras. 12 refs., 9 figs.

Hesse, M.; Birn, J.

1989-01-01

374

Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas

The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.

Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y. [General Atomics, San Diego, California 92121 (United States); Center for Integrated Plasma Studies, University of Colorado at Boulder, Boulder, Colorado 80309 (United States)

2006-07-15

375

Parallel-in-time implementation of transient stability simulations on a transputer network

The most time consuming computer simulation in power system studies is the transient stability analysis. In recent years, parallel processing has been applied for time domain simulations of power system transient behavior. In this paper, a parallel implementation of an algorithm based on Shifted-Picard dynamic iterations is presented. The main idea is that a set of nonlinear Differential Algebraic Equations (DAEs), which describes the system, can be solved by the iterative solution of a linear set of DAEs. The time behavior of the linear set of differential equations can be obtained by the evaluation of the convolution integral. In the parallel-in-time implementation of the proposed algorithm, each processor is devoted to the evaluation of the complete set of variables relative to each time step. The quadrature formula, adopted for the integral evaluation, can be easily parallelized by using a number of processors equal to the number of time steps. The algorithm, implemented on a transputer network with 32 Inmos T800/20 adopting a uni-directional ring topology, has been tested on standard power systems.

La Scala, M.; Sblendorio, G.; Sbrizzai, R. (Politecnico di Bari (Italy). Dept. di Elettrotecnica ed Elettronica)

1994-05-01

376

Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).

Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; /SLAC; Syratchev, I.; /CERN

2009-06-19

377

OSIRIS 2.0: an integrated framework for parallel PIC simulations

NASA Astrophysics Data System (ADS)

We describe OSIRIS 2.0 framework, an integrated framework for particle-in-cell (PIC) simulations. This framework is based on a three-dimensional, fully relativistic, massively parallel, object oriented particle-in-cell code, that has successfully been applied to a number of problems, ranging from laser-plasma interaction and inertial fusion to plasma shell collisions in astrophysical scenarios. The OSIRIS 2.0 framework is the new version of the OSIRIS code. Developed in Fortran 95, the code runs on multiple platforms and can be easily ported to new ones. Details on the capabilities of the framework are given, focusing on the new capabilities introduced, such as bessel beams, binary collisions, tunnel (ADK) and impact ionization, and new diagnostics, and also dynamic load balancing and parallel I/O. This framework also includes a visualization and data-analysis infrastructure, tightly integrated into the framework, developed to post-process the scalar and vector results from our simulations.

Fonseca, Ricardo; Tsung, Frank; Deng, Suzhi; Ren, Chuang

2005-10-01

378

The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

2010-07-15

379

Adaptive finite element simulation of flow and transport applications on parallel computers

NASA Astrophysics Data System (ADS)

The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the design and to demonstrate the capability for resolving complex multiscale processes efficiently and reliably. The first application considered is the simulation of chemotactic biological systems such as colonies of Escherichia coli. This work appears to be the first application of AMR to chemotactic processes. These systems exhibit transient, highly localized features and are important in many biological processes, which make them ideal for simulation with adaptive techniques. A nonlinear reaction-diffusion model for such systems is described and a finite element formulation is developed. The solution methodology is described in detail. Several phenomenological studies are conducted to study chemotactic processes and resulting biological patterns which use the parallel adaptive refinement capability developed in this work. The other application study is much more extensive and deals with fine scale interactions for important hypersonic flows arising in aerospace applications. These flows are characterized by highly nonlinear, convection-dominated flowfields with very localized features such as shock waves and boundary layers. These localized features are well-suited to simulation with adaptive techniques. A novel treatment of the inviscid flux terms arising in a streamline-upwind Petrov-Galerkin finite element formulation of the compressible Navier-Stokes equations is also presented and is found to be superior to the traditional approach. The parallel adaptive finite element formulation is then applied to several complex flow studies, culminating in fully three-dimensional viscous flows about complex geometries such as the Space Shuttle Orbiter. Physical phenomena such as viscous/inviscid interaction, shock wave/boundary layer interaction, shock/shock interaction, and unsteady acoustic-driven flowfield response are considered in detail. A computational investigation of a 25°/55° double cone configuration details the complex multiscale flow features and investigates a potential source of experimentally-observed unsteady flowfield response.

Kirk, Benjamin Shelton

380

NASA Astrophysics Data System (ADS)

Stochastic Parallel Gradient Descent(SPGD) algorithm can optimize the system exhibition firsthand without using of wavefront sensor, it predigests the adaptive optic system. Based on SPGD algotithm, a model with 32 element demormable mirror was simulated, the capability of correct toward static aberration and convergence of SPGD algorithm are analysed, the relationship of gain coefficient, stochastic perturbation amplitude are discussed, an adaptive adjustment of gain coefficient is proposed, and it can improve convergence rate effectively.

Wang, Gang

2014-11-01

381

A highly scalable simulation code for turbulent flows which solves the fully compressible Navier-Stokes equations is presented. The code, which supports one, two and three dimensional domain decompositions is shown to scale well on up to 262,144 cores. Introducing multiple levels of parallelism based on distributed message passing and shared-memory paradigms results in a reduction of up to 33% of

Shriram Jagannathan; Diego A. Donzis

2012-01-01

382

Simulation of three-dimensional laminar flow and heat transfer in an array of parallel microchannels

SIMULATION OF THREE-DIMENSIONAL LAMINAR FLOW AND HEAT TRANSFER IN AN ARRAY OF PARALLEL MICROCHANNELS A Thesis by JUSTIN DALE MLCAK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment... by JUSTIN DALE MLCAK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Approved by: Chair of Committee, N.K. Anand Committee Members, J.C. Han...

Mlcak, Justin Dale

2009-05-15

383

NASA Technical Reports Server (NTRS)

An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.

Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas

2000-01-01

384

Construction of a parallel processor for simulating manipulators and other mechanical systems

NASA Technical Reports Server (NTRS)

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

Hannauer, George

1991-01-01

385

PPM A highly efficient parallel particle mesh library for the simulation of continuum systems

NASA Astrophysics Data System (ADS)

This paper presents a highly efficient parallel particle-mesh (PPM) library, based on a unifying particle formulation for the simulation of continuous systems. In this formulation, the grid-free character of particle methods is relaxed by the introduction of a mesh for the reinitialization of the particles, the computation of the field equations, and the discretization of differential operators. The present utilization of the mesh does not detract from the adaptivity, the efficient handling of complex geometries, the minimal dissipation, and the good stability properties of particle methods. The coexistence of meshes and particles, allows for the development of a consistent and adaptive numerical method, but it presents a set of challenging parallelization issues that have hindered in the past the broader use of particle methods. The present library solves the key parallelization issues involving particle-mesh interpolations and the balancing of processor particle loading, using a novel adaptive tree for mixed domain decompositions along with a coloring scheme for the particle-mesh interpolation. The high parallel efficiency of the library is demonstrated in a series of benchmark tests on distributed memory and on a shared-memory vector architecture. The modularity of the method is shown by a range of simulations, from compressible vortex rings using a novel formulation of smooth particle hydrodynamics, to simulations of diffusion in real biological cell organelles. The present library enables large scale simulations of diverse physical problems using adaptive particle methods and provides a computational tool that is a viable alternative to mesh-based methods.

Sbalzarini, I. F.; Walther, J. H.; Bergdorf, M.; Hieber, S. E.; Kotsalis, E. M.; Koumoutsakos, P.

2006-07-01

386

NASA Astrophysics Data System (ADS)

Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more readily than smaller ones. Nucleation initiates at the center of the rod and grows outward.

Lee, Nicholas Jabari Ouma

387

Efficient parallelization of short-range molecular dynamics simulations on many-core systems

NASA Astrophysics Data System (ADS)

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark simulations on a typical 12-core machine show that the described algorithm achieves excellent parallel efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These simulations demonstrate that the algorithm scales well to large numbers of cores.

Meyer, R.

2013-11-01

388

NASA Technical Reports Server (NTRS)

This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

Morgan, Philip E.

2004-01-01

389

The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.

Holkundkar, Amol R. [Department of Physics, Birla Institute of Technology and Science, Pilani-333 031 (India)] [Department of Physics, Birla Institute of Technology and Science, Pilani-333 031 (India)

2013-11-15

390

Massively Parallel Spectral Element Large Eddy Simulation of a Turbulent Channel Using Wall Models

MASSIVELY-PARALLEL SPECTRAL ELEMENT LARGE EDDY SIMULATION OF A TURBULENT CHANNEL USING WALL MODELS A Thesis by JOSHUA IAN RABAU Submitted to the O ce of Graduate Studies of Texas A&M University in partial ful llment of the requirements...-Stokes LES Large Eddy Simulation FEM Finite Element Method SEM Spectral Element Method SGS Sub-Grid Scale TLM Two Layer Method Re Reynolds Number Re Friction Reynolds Number U1 Characteristic Velocity GLL Gauss-Lobatto-Legendre Cs Smagorinski Coe...

Rabau, Joshua I

2013-05-01

391

NASA Astrophysics Data System (ADS)

I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.

Honkonen, I.

2015-03-01

392

Testing parallel simulators for two-dimensional lattice-gas automata

This paper describes a test method for lattice-gas automata. The test method consists of inserting test patterns into the initial state of the automation and using a graphics display to detect errors. The test patterns are carefully constructed limit cycles that are disrupted by errors occurring at any level of the simulator system. The patterns can be run independently to test the system for debugging purposes, or they can be run as sub-simulations embedded in a larger lattice-gas simulation to detect faults at runtime. The authors describe the use of this method on a prototype parallel machine for lattice-gas simulations, and discuss the range of systems that can make use of this type of test method. The test patterns detect all significant one-bit errors. The authors include experimental results indicating that multiple bit errors are unlikely to escape detection.

Squier, R.; Steiglitz, K.

1990-01-01

393

Massively parallel Monte Carlo for many-particle simulations on GPUs

NASA Astrophysics Data System (ADS)

Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. We present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU.[1] We reproduce results of serial high-precision Monte Carlo runs to verify the method.[2] This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a GeForce GTX 680, our GPU implementation executes 95 times faster than on a single Intel Xeon E5540 CPU core, enabling 17 times better performance per dollar and cutting energy usage by a factor of 10. [1] J.A. Anderson, E. Jankowski, T. Grubb, M. Engel and S.C. Glotzer, arXiv:1211.1646. [2] J.A. Anderson, M. Engel, S.C. Glotzer, M. Isobe, E.P. Bernard and W. Krauth, arXiv:1211.1645.

Glotzer, Sharon; Anderson, Joshua; Jankowski, Eric; Grubb, Thomas; Engel, Michael

2013-03-01

394

Massively parallel Monte Carlo for many-particle simulations on GPUs

Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.

Anderson, Joshua A.; Jankowski, Eric [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Grubb, Thomas L. [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Engel, Michael [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Glotzer, Sharon C., E-mail: sglotzer@umich.edu [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)

2013-12-01

395

In this paper, we present results of using high-performance parallel computers to simulate beam dynamics in an early design of the Spallation Neutron Source (SNS) linac. These are among the most detailed linac simulations ever performed. The simulations have been performed using up to 500 million macroparticles, which is close to the number of particles in the physical system. The

J. Qiang; Robert D. Ryne; Barbara Blind; James H. Billen; Tarlochan Bhatia; Robert W. Garnett; George Neuschaefer; Harunori Takeda

2001-01-01

396

A novel CAD approach to machine a 3D free form surface by using a milling machine and a 3 -UPU parallel manipulator is proposed. Based on the CAD variation geometry technique, first, a simulation mechanism of the 3- UPU parallel manipulator is created. Next, a 3D free form surface and a guiding plane of tool path are constituted, and are

Tatu Leinonen

2007-01-01

397

Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations

The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.

Perumalla, Kalyan S [ORNL

2009-01-01

398

Parallel, adaptive, multi-object trajectory integrator for space simulation applications

NASA Astrophysics Data System (ADS)

Computer simulation is a very helpful approach for improving results from space born experiments. Initial-value problems (IVPs) can be applied for modeling dynamics of different objects - artificial Earth satellites, charged particles in magnetic and electric fields, charged or non-charged dust particles, space debris. An ordinary differential equations systems (ODESs) integrator based on applying different order embedded Runge-Kutta-Fehlberg methods is developed. These methods enable evaluation of the local error. Instead of step-size control based on local error evaluation, an optimal integration method is selected. Integration while meeting the required local error proceeds with constant-sized steps. This optimal scheme selection reduces the amount of calculation needed for solving the IVPs. In addition, for an implementation on a multi core processor and parallelization based on threads application, we describe how to solve multiple systems of IVPs efficiently in parallel. The proposed integrator allows the application of a different force model for every object in multi-satellite simulation models. Simultaneous application of the integrator toward different kinds of problems in the frames of one combined simulation model is possible too. The basic application of the integrator is solving mechanical IVPs in the context of simulation models and their application in complex multi-satellite space missions and as a design tool for experiments.

Atanassov, Atanas Marinov

2014-10-01

399

NASA Astrophysics Data System (ADS)

We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ˜10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm-3, bubble formation and restructuring of the water layers are observed.

Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M. H.; Najafi, Bijan

2012-11-01

400

Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.

Aaby, Brandon G [ORNL; Perumalla, Kalyan S [ORNL; Seal, Sudip K [ORNL

2010-01-01

401

Parallelization of Particle-Particle, Particle-Mesh Method within N-Body Simulation

NSDL National Science Digital Library

The N-Body problem has become an intricate part of the computational sciences, and there has been rise to many methods to solve and approximate the problem. The solution potentially requires on the order of calculations each time step, therefore efficient performance of these N-Body algorithms is very significant [5]. This work describes the parallelization and optimization of the Particle-Particle, Particle-Mesh (P3M) algorithm within GalaxSeeHPC, an open-source N-Body Simulation code. Upon successful profiling, MPI (Message Passing Interface) routines were implemented into the population of the density grid in the P3M method in GalaxSeeHPC. Each problem size recorded different results, and for a problem set dealing with 10,000 celestial bodies, speedups up to 10x were achieved. However, in accordance to Amdahl's Law, maximum speedups for the code should have been closer to 16x. In order to achieve maximum optimization, additional research is needed and parallelization of the Fourier Transform routines could prove to be rewarding. In conclusion, the GalaxSeeHPC Simulation was successfully parallelized and obtained very respectable results, while further optimization remains possible.

Nicholas Nocito

402

Parallelization of a dynamic Monte Carlo algorithm: A partially rejection-free conservative approach

The authors experiment with a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models. The parallel scheme is directly applicable to a wide range of stochastic cellular automata where the discrete events (updates) are Poisson arrivals. For high performance, they utilize a continuous-time, asynchronous parallel version of the n-fold way rejection-free algorithm. Each processing element carries an l {times} l block of spins, and they employ fast one-sided communication routines on a distributed-memory parallel architecture. Different processing elements have different local simulated times. To ensure causality, the algorithm handles the asynchrony in a conservative fashion. Despite relatively low utilization and an intricate relationship between the average time increment and the size of the spin blocks, they find that the algorithm is scalable and for sufficiently large l it outperforms its corresponding parallel Metropolis (non-rejection-fee) counterpart. As a sample application, they present results for metastable decay in a model ferromagnetic or ferroelectric film, observed with a probe of area smaller than the total system.

Korniss, G.; Novotny, M.A.; Rikvold, P.A. [Florida State Univ., Tallahassee, FL (United States)] [Florida State Univ., Tallahassee, FL (United States)

1999-08-10

403

Estimation of signal activity in digital circuits based on multiple abstraction levels and massive parallel simulation techniques Werner W. Bachmann, and Sorin A. Huss Department of Computer Science, Integrated Circuits and Systems Laboratory Darmstadt University of Technology, 64283 Darmstadt, Germany

Huss, Sorin A.

404

GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358

Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

405

De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM/MD simulation on a Grid consisting of 6 supercomputer centers in the US and Japan (in total of 150 thousand processor-hours), in which the number of processors change dynamically on demand and resources are allocated and migrated dynamically in response to faults. Furthermore, performance portability has been demonstrated on a wide range of platforms such as BlueGene/L, Altix 3000, and AMD Opteron-based Linux clusters.

Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H

2006-09-04

406

CLUSTEREASY: A Program for Simulating Scalar Field Evolution on Parallel Computers

We describe a new, parallel programming version of the scalar field simulation program LATTICEEASY. The new C++ program, CLUSTEREASY, can simulate arbitrary scalar field models on distributed-memory clusters. The speed and memory requirements scale well with the number of processors. As with the serial version of LATTICEEASY, CLUSTEREASY can run simulations in one, two, or three dimensions, with or without expansion of the universe, with customizable parameters and output. The program and its full documentation are available on the LATTICEEASY website at http://www.science.smith.edu/departments/Physics/fstaff/gfelder/latticeeasy/. In this paper we provide a brief overview of what CLUSTEREASY does and the ways in which it does and doesn't differ from the serial version of LATTICEEASY.

Gary N Felder

2007-12-05

407

Xyce parallel electronic simulator design : mathematical formulation, version 2.0.

This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.

Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.

2004-06-01

408

Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and optimize. Performance depends both on system characteristics such as the floating point rate, the memory hierarchy, and the interconnect performance, as well as input matrix characteristics such as such as the number and location of nonzeros. We present LUsim, a simulation framework for modeling the performance of sparse LU factorization. Our framework uses micro-benchmarks to calibrate the parameters of machine characteristics and additional tools to facilitate real-time performance modeling. We are using LUsim to analyze an existing parallel sparse LU factorization code, and to explore a latency tolerant variant. We developed and validated a model of the factorization in SuperLU_DIST, then we modeled and implemented a new variant of slud, replacing a blocking collective communication phase with a non-blocking asynchronous point-to-point one. Our strategy realized a mean improvement of 11percent over a suite of test matrices.

Univ. of California, San Diego; Li, Xiaoye Sherry; Cicotti, Pietro; Li, Xiaoye Sherry; Baden, Scott B.

2008-04-15

409

NASA Astrophysics Data System (ADS)

This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to simulate thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and parallelization scheme have been tested and verified in terms of precision and performance.

He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.

2015-03-01

410

Superposition-Enhanced Estimation of Optimal Temperature Spacings for Parallel Tempering Simulations

Effective parallel tempering simulations rely crucially on a properly chosen sequence of temperatures. While it is desirable to achieve a uniform exchange acceptance rate across neighboring replicas, finding a set of temperatures that achieves this end is often a difficult task, in particular for systems undergoing phase transitions. Here we present a method for determination of optimal replica spacings, which is based upon knowledge of local minima in the potential energy landscape. Working within the harmonic superposition approximation, we derive an analytic expression for the parallel tempering acceptance rate as a function of the replica temperatures. For a particular system and a given database of minima, we show how this expression can be used to determine optimal temperatures that achieve a desired uniform acceptance rate. We test our strategy for two atomic clusters that exhibit broken ergodicity, demonstrating that our method achieves uniform acceptance as well as significant efficiency gains. PMID:25512744

2014-01-01

411

Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems

An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.

Martinez, E. [IMDEA-Materiales, Madrid 28040 (Spain); Monasterio, P.R. [Massachusetts Institute of Technology, Cambridge, MA 02139 (United States); Marian, J., E-mail: marian1@llnl.go [Lawrence Livermore National Laboratory, Livermore, CA 94551 (United States)

2011-02-20

412

A fast parallel Poisson solver on irregular domains applied to beam dynamics simulations

We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or 'mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our iterative solver with an FFT-based solver that is more commonly used for applications in beam dynamics.

Adelmann, A. [Paul Scherrer Institut, CH-5234 Villigen (Switzerland)], E-mail: andreas.adelmann@psi.ch; Arbenz, P. [ETH Zuerich, Chair of Computational Science, Universitaetsstrasse 6, CH-8092 Zuerich (Switzerland)], E-mail: arbenz@inf.ethz.ch; Ineichen, Y. [Paul Scherrer Institut, CH-5234 Villigen (Switzerland); ETH Zuerich, Chair of Computational Science, Universitaetsstrasse 6, CH-8092 Zuerich (Switzerland)], E-mail: ineichen@inf.ethz.ch

2010-06-20

413

The use of a parallel virtual machine (PVM) for finite-difference wave simulations

NASA Astrophysics Data System (ADS)

Computer modelling is now applied routinely throughout the geosciences in an attempt to create synthetic data for comparison with real data. At present, in seismology, there is no analytical solution to the wave equation which allows wave simulations in "geologically realistic" (complex) media. Consequently, computationally expensive numerical solutions are required. Using a finite-difference solution to the wave equation provides a suitable means of modelling seismic waves in a heterogeneous medium. However, when applying this method the grid sizes and the number of time steps required (to ensure numerical stability and sufficiently long wave propagation distances) are limited because of their demand on computer time and memory. Supercomputers represent an obvious solution to these limitations. This paper presents an alternative which is inexpensive, convenient and portable. By clustering a set of processors, for example PCs or workstations, a parallel configuration can be obtained by using the processors available on each machine to perform sections of the calculations simultaneously. By using Parallel Virtual Machine (PVM) — a public domain software package which allows a programmer to create and access a concurrent computing system made from networks of loosely coupled processing elements (Geist and others, 1994) — we have reduced wall-clock times and increased array sizes for a finite-difference solution to the acoustic, elastic and viscoelastic wave equations. In this paper we present methods of parallelizing a serial code and load-balancing this parallelized code. A comparison of serial and parallel wall-clock times, a comparison of wall-clock times on a variety of clusters of machines and the role of communication in this application are presented for a finite-difference solution to the acoustic wave equation.

Niccanna, Clodagh; Bean, Christopher J.

1997-08-01

414

Supporting the Development of Resilient Message Passing Applications using Simulation

An emerging aspect of high-performance computing (HPC) hardware/software co-design is investigating performance under failure. The work in this paper extends the Extreme-scale Simulator (xSim), which was designed for evaluating the performance of message passing interface (MPI) applications on future HPC architectures, with fault-tolerant MPI extensions proposed by the MPI Fault Tolerance Working Group. xSim permits running MPI applications with millions of concurrent MPI ranks, while observing application performance in a simulated extreme-scale system using a lightweight parallel discrete event simulation. The newly added features offer user-level failure mitigation (ULFM) extensions at the simulated MPI layer to support algorithm-based fault tolerance (ABFT). The presented solution permits investigating performance under failure and failure handling of ABFT solutions. The newly enhanced xSim is the very first performance tool that supports ULFM and ABFT.

Naughton, III, Thomas J [ORNL; Engelmann, Christian [ORNL] [ORNL; Vallee, Geoffroy R [ORNL] [ORNL; Boehm, Swen [ORNL] [ORNL

2014-01-01

415

Propagation of action potentials between parallel chains of cardiac muscle cells was simulated using the PSpice program. Excitation was transmitted from cell to cell along a strand of three or four cells not connected by low-resistance tunnels (gap-junction connexons) in parallel with one or two similar strands. Thus, two models were used: a 2 x 3 model (two parallel chains of three cells each) and a 3 x 4 model (three parallel chains of four cells each). The entire surface membrane of each cell fired nearly simultaneously, and nearly all the propagation time was spent at the cell junctions, thus giving a staircase-shaped propagation profile. The junctional delay time between contiguous cells in a chain was about 0.2-0.5 ms. A significant negative cleft potential develops in the narrow junctional clefts, whose magnitude depends on several factors, including the radial cleft resistance (Rjc). The cleft potential (Vjc) depolarizes the postjunctional membrane to threshold by a patch-clamp action. Therefore, one mechanism for the transfer of excitation from one cell to the next is by the electric field (EF) that is generated in the junctional cleft when the prejunctional membrane fires. Propagation velocity increased with elevation of Rjc. With electrical stimulation of the first cell of the first strand (cell A1), propagation rapidly spread down that chain and then jumped to the second strand (B chain), followed by jumping to the third strand (C chain) when present. The rapidity by which the parallel chains became activated depended on the longitudinal resistance of the narrow extracellular cleft between the parallel strands (Rol2). The higher the Rol2 resistance, the faster the propagation (lower propagation time) over the cardiac muscle sheet (2-dimensional). The transverse resistance of the cleft had no effect. When the first cell of the second strand (cell B1) was stimulated, propagation spread down the B chain and jumped to the other two strands (A and C) nearly simultaneously. When cell C1 was stimulated, propagation traveled down the C chain and jumped to the B chain, followed by excitation of the A chain. Thus, there was transverse propagation of excitation as longitudinal propagation was occurring. Therefore, transmission of excitation by the EF mechanism can occur between myocardial cells lying closely parallel to one another without the requirement of a specialized junction. PMID:12665257

Sperelakis, Nicholas

2003-01-01

416

NASA Astrophysics Data System (ADS)

Uninterrupted power supply has become indispensable during the maintenance task of active electric power lines as a result of today's highly information-oriented society and increasing demand of electric utilities. The maintenance task has the risk of electric shock and the danger of falling from high place. Therefore it is necessary to realize an autonomous robot system using electro-hydraulic manipulator because hydraulic manipulators have the advantage of electric insulation. Meanwhile it is relatively difficult to realize autonomous assembly tasks particularly in the case of manipulating flexible objects such as electric lines. In this report, a discrete event control system is introduced for automatic assembly task of electric lines into sleeves as one of a typical task of active electric power lines. In the implementation of a discrete event control system, LVQNN (learning vector quantization neural network) is applied to the insertion task of electric lines to sleeves. In order to apply these proposed control system to the unknown environment, virtual learning data for LVQNN was generated by fuzzy inference. By the experimental results of two types of electric lines and sleeves, these proposed discrete event control and neural network learning algorithm are confirmed very effective to the insertion tasks of electric lines to sleeves as a typical task of active electric power maintenance tasks.

Ahn, Kyoungkwan; Yokota, Shinichi

417

NASA Astrophysics Data System (ADS)

I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for parallel programming is also provided by allowing users to select which simulation variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.

Honkonen, I.

2014-07-01

418

NASA Astrophysics Data System (ADS)

Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.

Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles

2007-11-01

419

We present three-dimensional hybrid simulations of collisionless shocks that propagate parallel to the background magnetic field to study the acceleration of protons that forms a high-energy tail on the distribution. We focus on the initial acceleration of thermal protons and compare it with results from one-dimensional simulations. We find that for both one- and three-dimensional simulations, particles that end up in the high-energy tail of the distribution later in the simulation gained their initial energy right at the shock. This confirms previous results but is the first to demonstrate this using fully three-dimensional fields. The result is not consistent with the ''thermal leakage'' model. We also show that the gyrocenters of protons in the three-dimensional simulation can drift away from the magnetic field lines on which they started due to the removal of ignorable coordinates that exist in one- and two-dimensional simulations. Our study clarifies the injection problem for diffusive shock acceleration.

Guo Fan [Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Giacalone, Joe, E-mail: guofan.ustc@gmail.com [Department of Planetary Sciences and Lunar and Planetary Laboratory, University of Arizona, 1629 E. University Blvd., Tucson, AZ 85721 (United States)

2013-08-20

420

Accelerating groundwater flow simulation in MODFLOW using JASMIN-based parallel computing.

To accelerate the groundwater flow simulation process, this paper reports our work on developing an efficient parallel simulator through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and parallel algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is simulating flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine times faster than the GMG-based MODFLOW program. The third test is a simplified transient flow problem with the order of tens of millions of cells to examine the scalability. Compared to 32 cores, parallel efficiency of 77 and 68% are obtained on 512 and 1024 cores, respectively, which indicates impressive scalability. PMID:23600445

Cheng, Tangpei; Mo, Zeyao; Shao, Jingli

2014-01-01

421

Xyce parallel electronic simulator users%3CU%2B2019%3E guide, version 6.0.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G. [Raytheon, Albuquerque, NM

2013-08-01

422

Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop

Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.

Ghosh, K K

2011-11-07

423

Real-time multibody system dynamic simulation. II - A parallel algorithm and numerical results

In designing a parallel algorithm, an essential requirement is to distribute tasks evenly among all processors. The velocity state recursive Newton-Euler formulation, however, has embedded recurrence relations that must be executed in forward and backward computational path sequences. Here, an algorithm is developed which reduces the critical path time by extracting some operations from the forward and backward computational paths and distributing them evenly among the processors. Numerical examples are presented to show that real-time simulation can be achieved for moderately complex mechanical systems using a shared memory multiprocessor. 6 refs.

Tsai, Fuh-Feng; Haug, E.J. (Iowa, University, Iowa City (United States))

1991-06-01

424

Simple LabVIEW DC Circuit Simulation With Parallel Resistors: Overview

NSDL National Science Digital Library

This is a downloadable simple DC circuit simulation with 2 resistors in parallel with a third resistor. This is useful for studying Ohm's Law. Users can adjust the voltage and the resistors while the current changes in real time, just like the real thing. Users are then asked whether the current increases or decreases as the ohms of the resistors increases. Includes instructions on how to measure DC / AC current. This free program requires Windows 9x, NT, XP or later. Note that this will NOT run on Mac OS.

425

A parallel multigrid preconditioner for the simulation of large fracture networks

Computational modeling of a fracture in disordered materials using discrete lattice models requires the solution of a linear system of equations every time a new lattice bond is broken. Solving these linear systems of equations successively is the most expensive part of fracture simulations using large three-dimensional networks. In this paper, we present a parallel multigrid preconditioned conjugate gradient algorithm to solve these linear systems. Numerical experiments demonstrate that this algorithm performs significantly better than the algorithms previously used to solve this problem.

Sampath, Rahul S [ORNL; Barai, Pallab [ORNL; Nukala, Phani K [ORNL

2010-01-01

426

NASA Astrophysics Data System (ADS)

The features of high-precision numerical simulation of the Earth satellite motion using parallel computing are discussed on example the implementation of the cluster "Skiff Cyberia" software complex "Numerical model of the motion of system satellites". It is shown that the use of 128 bit word length allows considering weak perturbations from the high-order harmonics in the expansion of the geopotential and the effect of strain geopotential harmonics arising due to the combination of tidal perturbations associated with exposure to the moon and sun on the solid Earth and its oceans.

Chuvashov, I. N.

2010-12-01

427

3-D Hybrid Simulation of Quasi-Parallel Bow Shock and Its Effects on the Magnetosphere

A three-dimensional (3-D) global-scale hybrid simulation is carried out for the structure of the quasi-parallel bow shock, in particular the foreshock waves and pressure pulses. The wave evolution and interaction with the dayside magnetosphere are discussed. It is shown that diamagnetic cavities are generated in the turbulent foreshock due to the ion beam plasma interaction, and these compressional pulses lead to strong surface perturbations at the magnetopause and Alfven waves/field line resonance in the magnetosphere.

Lin, Y.; Wang, X.Y. [Physics Department, Auburn University, Auburn, AL 36849-5311 (United States)

2005-08-01

428

pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions. PMID:24732497

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

429

Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097

Hammond, G E; Lichtner, P C; Mills, R T

2014-01-01

430

Distributed Network Simulations Using the Dynamic Simulation Backplane

We present an approach for creating distributed, component-based, simulations of communication networks by interconnecting models of sub-networks drawn from different network simulation packages. This approach supports rapid construction of simu- lations for large networks by reusing existing mod- els and software, and fast execution using paral- lel discrete event simulation techniques. A dy- namic simulation backplane is proposed that pro-

George F. Riley; Mostafa H. Ammar; Richard M. Fujimoto; Donghua Xu; Kalyan S. Perumalla

2001-01-01

431

Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069

Pesce, Lorenzo L.; Lee, Hyong C.; Stevens, Rick L.

2013-01-01

432

PCSIM: A Parallel Simulation Environment for Neural Circuits Fully Integrated with Python

The Parallel Circuit SIMulator (PCSIM) is a software package for simulation of neural circuits. It is primarily designed for distributed simulation of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit simulator with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural simulations. PMID:19543450

Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus

2008-01-01

433

Parallel computing simulation of electrical excitation and conduction in the 3D human heart.

A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart simulation model with the use of massive parallel computing with GPGPU and OpenGL. The simulation algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based simulation outperforms the conventional CPU-based approach and significantly improves the speed of simulation. By adopting modern computer architecture, this present investigation enables real-time simulation and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart. PMID:25570947

Di Yu; Dongping Du; Hui Yang; Yicheng Tu

2014-01-01

434

We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the ear...

Buntemeyer, Lars; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E

2015-01-01

435

We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.

Deiterding, Ralf [ORNL; Wood, Stephen L [University of Tennessee, Knoxville (UTK)

2013-01-01

436

Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development

NASA Technical Reports Server (NTRS)

NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.

Mangieri, Mark L.; Hoang, June

2014-01-01

437

A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.

Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A., E-mail: bharath@u.northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL (United States)

2013-02-15

438

MDSLB: A new static load balancing method for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simulation efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.

Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang

2014-02-01

439

Parallel Simulation of HGMS of Weakly Magnetic Nanoparticles in Irrotational Flow of Inviscid Fluid

The process of high gradient magnetic separation (HGMS) using a microferromagnetic wire for capturing weakly magnetic nanoparticles in the irrotational flow of inviscid fluid is simulated by using parallel algorithm developed based on openMP. The two-dimensional problem of particle transport under the influences of magnetic force and fluid flow is considered in an annular domain surrounding the wire with inner radius equal to that of the wire and outer radius equal to various multiples of wire radius. The differential equations governing particle transport are solved numerically as an initial and boundary values problem by using the finite-difference method. Concentration distribution of the particles around the wire is investigated and compared with some previously reported results and shows the good agreement between them. The results show the feasibility of accumulating weakly magnetic nanoparticles in specific regions on the wire surface which is useful for applications in biomedical and environmental works. The speedup of parallel simulation ranges from 1.8 to 21 depending on the number of threads and the domain problem size as well as the number of iterations. With the nature of computing in the application and current multicore technology, it is observed that 4–8 threads are sufficient to obtain the optimized speedup. PMID:24955411

Hournkumnuard, Kanok

2014-01-01

440

Parallel 3D Simulation of Seismic Wave Propagation in the Structure of Nobi Plain, Central Japan

NASA Astrophysics Data System (ADS)

We performed large-scale parallel simulations of the seismic wave propagation to understand the complex wave behavior in the 3D basin structure of the Nobi Plain, which is one of the high population cities in central Japan. In this area, many large earthquakes occurred in the past, such as the 1891 Nobi earthquake (M8.0), the 1944 Tonankai earthquake (M7.9) and the 1945 Mikawa earthquake (M6.8). In order to mitigate the potential disasters for future earthquakes, 3D subsurface structure of Nobi Plain has recently been investigated by local governments. We referred to this model together with bouguer anomaly data to construct a detail 3D basin structure model for Nobi plain, and conducted computer simulations of ground motions. We first evaluated the ground motions for two small earthquakes (M4~5); one occurred just beneath the basin edge at west, and the other occurred at south. The ground motions from these earthquakes were well recorded by the strong motion networks; K-net, Kik-net, and seismic intensity instruments operated by local governments. We compare the observed seismograms with simulations to validate the 3D model. For the 3D simulation we sliced the 3D model into a number of layers to assign to many processors for concurrent computing. The equation of motions are solved using a high order (32nd) staggered-grid FDM in horizontal directions, and a conventional (4th-order) FDM in vertical direction with the MPI inter-processor communications between neighbor region. The simulation model is 128km by 128km by 43km, which is discritized at variable grid size of 62.5-125m in horizontal directions and of 31.25-62.5m in vertical direction. We assigned a minimum shear wave velocity is Vs=0.4km/s, at the top of the sedimentary basin. The seismic sources for the small events are approximated by double-couple point source and we simulate the seismic wave propagation at maximum frequency of 2Hz. We used the Earth Simulator (JAMSTEC, Yokohama Inst) to conduct such large simulation. The parallel simulation using 256CPUs of the Earth Simulator took computer memory of 260GByte and wall-clock time of 8.3hours. Comparisons between the observed waveforms and computed simulations for two earthquakes agree well, so that it is indicating the effectiveness of the 3D model. We therefore conducted another simulation to estimate the pattern of strong ground motion during large earthquakes such as for the 1945 Mikawa earthquake. We employ the fault rupture model of Kikuchi et al (2003), which is derived from the inversion of regional records, and the pseudo dynamic source time function of Nakamura and Miyatake (2000). The simulated wavefield from the Mikawa earthquake is dominating in large surface waves at amplitude over 10cm/s and a relatively long period of 6-8s in the center of the Nobi plain. We also find directivity effect of the fault rupture from south to north in the PGV distribution and waveforms. This explains the major pattern of seismic intensity distribution and the strong motion damage during the earthquake.

Kotani, A.; Furumura, T.; Hirahara, K.

2003-12-01

441

Large scale parallel computing simulations of wire array Z-pinches

NASA Astrophysics Data System (ADS)

Until recently simulations of wire array Z-pinches have been undertaken in a piece-wise fashion, modelling either only part of the array volume, or modelling different aspects of the array behaviour separately. Recent simulations of a single wire in the array suggest that the short wavelength modulations of the ablating plasma observed in experiments are the result of a modified m=0 like instability. In order to simulate the growth of magneto-Rayleigh-Taylor instabilities during the implosion phase, a separate calculation is usually performed in which estimates for the structure of the modulated ablation are used to provide the initial seed perturbation for the implosion. Improvements to the parallel computing architecture of the Gorgon 3D resistive MHD code, however, mean that is now possible to run with large enough computational grids to encompass the entire volume of the array whilst retaining sufficient resolution to model the spontaneous development of the modulated ablation structure from microscopic noise. Thus we can model the evolution of the wire array from the point of initial plasma formation, right through the implosion, without imposing any predetermined perturbation or structure. A detailed comparison of synthetic diagnostic images with data from MAGPIE experiments is used to test this method. Preliminary data from similar simulations of Z experiments are also presented.

Chittenden, Jeremy; Niasse, Nicolas; Ciardi, Andrea

2008-11-01

442

Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach was evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.

University of Miami; Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.

2013-04-30

443

There has been a great concern about the origin of the parallel electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to simulate magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with simulation results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when parallel current flows. The electron convection term enables us to describe a situation in which a parallel electric field and parallel electron acceleration coexist, which is impossible for ideal or resistive MHD.

Matsuda, K.; Terada, N.; Katoh, Y. [Space and Terrestrial Plasma Physics Laboratory, Department of Geophysics, Graduate School of Science, Tohoku University, Sendai, Miyagi 980-8578 (Japan); Misawa, H. [Planetary Plasma and Atmospheric Research Center, Graduate School of Science, Tohoku University, Sendai, Miyagi 980-8578 (Japan)

2011-08-15

444

NASA Astrophysics Data System (ADS)

Molecular dynamics (MD) simulations of RDX is carried out using the ReaxFF force field supplied with the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). Validation of ReaxFF to model RDX is carried out by extracting the (i) crystal unit cell parameters, (ii) bulk modulus and (iii) thermal expansion coefficient and comparing with reported values from both experiments and simulations.

Warrier, M.; Pahari, P.; Chaturvedi, S.

2010-12-01

445

This paper presents a generalized, parallel imple- mentation methodology for real-time simulation of ac machine transients in an FPGA-based real-time simulator. The proposed method adopts nanosecond range simulation time-step and exploits the large response time of a rotating machine to: 1) eliminate the need for predictive-corrective action for the machine electrical and mechanical variables, 2) decouple the solution of the

Mahmoud Matar; Reza Iravani

2011-01-01

446

A Multi-Bunch, Three-Dimensional, Strong-Strong Beam-Beam Simulation Code for Parallel Computers

For simulating the strong-strong beam-beam effect, using Particle-In-Cell codes has become one of the methods of choice. While the two-dimensional problem is readily treatable using PC-class machines, the three-dimensional problem, i.e., a problem encompassing hourglass and phase-averaging effects, requires the use of parallel processors. In this paper, we introduce a strong-strong code NIMZOVICH, which was specifically designed for parallel processors and which is optimally used for many bunches and parasitic crossings. We describe the parallelization scheme and give some benchmarking results.

Cai, Y.; Kabel, A.C.; /SLAC

2005-05-11

447

Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment

NASA Astrophysics Data System (ADS)

Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model

Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.

2013-12-01

448

NASA Astrophysics Data System (ADS)

The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP-ODC code is ready now for real world petascale earthquake simulations. This GPU-based code has demonstrated excellent weak scaling up to the full Titan scale and achieved 2.3 PetaFLOPs sustained computation performance in single precision. The production simulation demonstrated the first 0-10Hz deterministic rough fault simulation. Using the accelerated AWP-ODC,