For comprehensive and current results, perform a real-time search at Science.gov.

1

Discrete Event Simulation Parallel Discrete-Event Simulation

exogeneous: from external source (starter, trace, . . . ) endogenous: from previous event processing Events be: Event-Based Schedule events at specific times exogeneous: from external source (starter, traceDiscrete Event Simulation Parallel Discrete-Event Simulation Using TM for PDES Conclusions & Future

2

Parallel Discrete Event Simulation to Network Emulation

Applying Parallel Discrete Event Simulation to Network Emulation Rob Simmonds, Russell Bradford y of a system that uses a parallel discrete event simulator to act as a high speed network emulator. With this can interact with modelled traffic in the emulator, thus providing a controlled test enviÂ ronment

Bradford, Russell

3

Parallel Discrete Event Simulation of Lyme Disease

Parallel Discrete Event Simulation of Lyme Disease Ewa Deelman , Thomas Caraco Â¡ and Boleslaw K distribution of Lyme disease, currently the most frequently re- ported vector-borne disease of humans). Our goal is to understand patterns in the Lyme disease epidemic at the regional scale through studying

Varela, Carlos

4

Parallel discrete event simulation of Lyme disease.

Our research concerns the dynamic processes underlying the rapid increase in the geographic distribution of Lyme disease, currently the most frequently reported vector-borne disease of humans in the United States [10, 1]. More specifically, we ask how spatially localized ecological interactions drive the Lyme disease epidemic at extended spatial and temporal scales. We have developed a parallel discrete event simulation system in C++ for the IBM SP2. The simulation model discussed here models the mouse-tick interaction, an essential element of the epidemic's ecology. The main entities of the simulation are ticks in various stages of development (larval, nymphal, and adult) and mice. We track the behavior of mice and the spread of disease over the course of 180 days (late spring, summer, and early fall). Our goal is to understand patterns in the Lyme disease epidemic at the regional scale through studying the spread of the pathogen across a single white-footed mouse deme. PMID:9390232

Deelman, E; Caraco, T; Szymanski, B K

1996-01-01

5

An adaptive synchronization protocol for parallel discrete event simulation

Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.

Bisset, K.R.

1998-12-01

6

Parallel discrete-event simulation of FCFS stochastic queueing networks

NASA Technical Reports Server (NTRS)

Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.

Nicol, David M.

1988-01-01

7

Parallel discrete event simulation: A shared memory approach

NASA Technical Reports Server (NTRS)

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

8

Synchronous parallel system for emulation and discrete event simulation

NASA Technical Reports Server (NTRS)

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

Steinman, Jeffrey S. (inventor)

1992-01-01

9

Synchronous Parallel System for Emulation and Discrete Event Simulation

NASA Technical Reports Server (NTRS)

A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to the state variables of the simulation object attributable to the event object and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

Steinman, Jeffrey S. (Inventor)

2001-01-01

10

Parallel discrete-event simulation schemes with heterogeneous processing elements

NASA Astrophysics Data System (ADS)

To understand the effects of nonidentical processing elements (PEs) on parallel discrete-event simulation (PDES) schemes, two stochastic growth models, the restricted solid-on-solid (RSOS) model and the Family model, are investigated by simulations. The RSOS model is the model for the PDES scheme governed by the Kardar-Parisi-Zhang equation (KPZ scheme). The Family model is the model for the scheme governed by the Edwards-Wilkinson equation (EW scheme). Two kinds of distributions for nonidentical PEs are considered. In the first kind computing capacities of PEs are not much different, whereas in the second kind the capacities are extremely widespread. The KPZ scheme on the complex networks shows the synchronizability and scalability regardless of the kinds of PEs. The EW scheme never shows the synchronizability for the random configuration of PEs of the first kind. However, by regularizing the arrangement of PEs of the first kind, the EW scheme is made to show the synchronizability. In contrast, EW scheme never shows the synchronizability for any configuration of PEs of the second kind.

Kim, Yup; Kwon, Ikhyun; Chae, Huiseung; Yook, Soon-Hyung

2014-07-01

11

An Adaptive Synchronization Protocol for Parallel Discrete Event Simulation Keith R. Bisset

a mes- sage, the event that is contained in the message is sched- uled for execution. The source LPAn Adaptive Synchronization Protocol for Parallel Discrete Event Simulation Keith R. Bisset TSA 5, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods

Marathe, Achla

12

Discrete event system simulation

This book provides a basic treatment of one of the most widely used operations research tools: discrete-event simulation. Prerequisites are calculus, probability theory, and elementary statistics. Contents, abridged: Introduction to discrete-event system simulation. Mathematical and statistical models. Random numbers. Analysis of simulation data. Index.

J. Banks; J. S. Carson

1984-01-01

13

SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

NASA Technical Reports Server (NTRS)

Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

Steinman, Jeff S.

1992-01-01

14

Application of Parallel Discrete Event Simulation to the Space Surveillance Network

NASA Astrophysics Data System (ADS)

In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to thousands of nodes. There are many PDES platforms we might have used, but two requirements led us to build our own. First, the parallel components of our SSN simulation are coded and maintained by separate teams, so TESSA is designed to support transparent coupling and interoperation of separately compiled components written in any of six programming languages. Second, conventional PDES simulators are designed so that while the parallel components run concurrently, each of them is internally sequential, whereas for TESSA we needed to support MPI-based parallelism within each component. The TESSA simulator is still a work in progress and currently has some significant limitations. The paper describes those as well.

Jefferson, D.; Leek, J.

2010-09-01

15

Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.

Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

2013-01-01

16

NASA Technical Reports Server (NTRS)

The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.

Steinman, Jeffrey S. (Inventor)

1998-01-01

17

NASA Astrophysics Data System (ADS)

Non-equilibrium surface growth for competitive growth models in (1+1) dimensions, particularly mixing random deposition (RD) with correlated growth process which occur with probability p are studied. The composite mixtures are found to be in the universality class of the correlated growth process, and a nonuniversal exponent delta is identified in the scaling in p. The only effects of the RD admixture are dilations of the time and height scales which result in a slowdown of the dynamics of building up the correlations. The bulk morphology is taken into account and is reflected in the surface roughening, as well as the scaling behavior. It is found that the continuum equations and scaling laws for RD added, in particular, to Kardar-Parisi-Zhang (KPZ) processes are partly determined from the underlying bulk structures. Nonequilibrium surface growth analysis are also applied to a study of the static and dynamic load balancing for a conservative update algorithm for Parallel Discrete Event Simulations (PDES). This load balancing is governed by the KPZ equation. For uneven load distributions in conservative PDES simulations, the simulated (virtual) time horizon (VTH) per Processing Element (PE) and the simulated time horizon per volume element N v are used to study the PEs progress in terms of utilization. The width of these time horizons relates to the desynchronization of the system of processors, and is related to the memory requirements of the PEs. The utilization increases when dynamic, rather than static, load balancing is performed.

Verma, Poonam Santosh

18

Performance bounds on parallel self-initiating discrete-event

NASA Technical Reports Server (NTRS)

The use is considered of massively parallel architectures to execute discrete-event simulations of what is termed self-initiating models. A logical process in a self-initiating model schedules its own state re-evaluation times, independently of any other logical process, and sends its new state to other logical processes following the re-evaluation. The interest is in the effects of that communication on synchronization. The performance is considered of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and lower bounds on the performance of a new conservative protocol. The analysis of Time Warp includes the overhead costs of state-saving and rollback. The analysis points out sufficient conditions for the conservative protocol to outperform Time Warp. The analysis also quantifies the sensitivity of performance to message fan-out, lookahead ability, and the probability distributions underlying the simulation.

Nicol, David M.

1990-01-01

19

Discrete-Event Simulation and the Event Horizon Part 2: Event List Management

The event horizon is a very important concept that applies to both parallel and sequential discrete-event simulations. By exploiting the event horizon, parallel simulations can processes events optimistically in a risk-free manner (i.e., without requiring antimessages) using adaptable \\

Jeffrey S. Steinman

1996-01-01

20

Discrete event simulation in the artificial intelligence environment

Discrete Event Simulations performed in an Artificial Intelligence (AI) environment provide benefits in two major areas. The productivity provided by Object Oriented Programming, Rule Based Programming, and AI development environments allows simulations to be developed and maintained more efficiently than conventional environments allow. Secondly, the use of AI techniques allows direct simulation of human decision making processes and Command and Control aspects of a system under study. An introduction to AI techniques is presented. Two discrete event simulations produced in these environments are described. Finally, a software engineering methodology is discussed that allows simulations to be designed for use in these environments. 3 figs.

Egdorf, H.W.; Roberts, D.J.

1987-01-01

21

Flexible model for analyzing production systems with discrete event simulation

This paper presents the structure of a flexible discrete event simulation model for analyzing production systems. Based on BOM and routing information a simulation model is generated to analyze a shop floor structure. Different modules are used for generating customer orders and production orders and handling the material flow until the customer is satisfied. The basic idea is that the

Alexander Hubl; Klaus Altendorfer; Herbert Jodlbauer; Margaretha Gansterer; Richard F. Hartl

2011-01-01

22

Optimization of Operations Resources via Discrete Event Simulation Modeling

NASA Technical Reports Server (NTRS)

The resource levels required for operation and support of reusable launch vehicles are typically defined through discrete event simulation modeling. Minimizing these resources constitutes an optimization problem involving discrete variables and simulation. Conventional approaches to solve such optimization problems involving integer valued decision variables are the pattern search and statistical methods. However, in a simulation environment that is characterized by search spaces of unknown topology and stochastic measures, these optimization approaches often prove inadequate. In this paper, we have explored the applicability of genetic algorithms to the simulation domain. Genetic algorithms provide a robust search strategy that does not require continuity and differentiability of the problem domain. The genetic algorithm successfully minimized the operation and support activities for a space vehicle, through a discrete event simulation model. The practical issues associated with simulation optimization, such as stochastic variables and constraints, were also taken into consideration.

Joshi, B.; Morris, D.; White, N.; Unal, R.

1996-01-01

23

Synchronization of autonomous objects in discrete event simulation

NASA Technical Reports Server (NTRS)

Autonomous objects in event-driven discrete event simulation offer the potential to combine the freedom of unrestricted movement and positional accuracy through Euclidean space of time-driven models with the computational efficiency of event-driven simulation. The principal challenge to autonomous object implementation is object synchronization. The concept of a spatial blackboard is offered as a potential methodology for synchronization. The issues facing implementation of a spatial blackboard are outlined and discussed.

Rogers, Ralph V.

1990-01-01

24

Discrete-Event Simulation of Health Care Systems

Over the past forty years, health care organizations have faced ever-increasing pressures to deliver quality care while facing\\u000a rising costs, lower reimbursements, and new regulatory demands. Discrete-event simulation has become a popular and effective\\u000a decision-making tool for the optimal allocation of scarce health care resources to improve patient flow, while minimizing\\u000a health care delivery costs and increasing patient satisfaction. The

Sheldon H. Jacobson; Shane N. Hall; James R. Swisher

25

A survey of recent advances in discrete input parameter discrete-event simulation optimization

Discrete-event simulation optimization is a problem of significant interest to practitioners interested in extracting useful information about an actual (or yet to be designed) system that can be modeled using discrete-event simulation. This paper presents a survey of the literature on discrete-event simulation optimization published in recent years (1988 to the present), with a particular focus on discrete input parameter

JAMES R. SWISHER; PAUL D. HYDEN; SHELDON H. JACOBSON; LEE W. SCHRUBEN

2004-01-01

26

A Parallel Discrete Event IP Network Emulator Russell Bradford y , Rob Simmonds and Brian Unger

A Parallel Discrete Event IP Network Emulator Russell Bradford y , Rob Simmonds #3; and Brian Unger that can act as a realÂtime network emulator. Real Internet Protocol (IP) traffic generated by application programs running on user workstations can interact with modelled traffic in the emulator, thus providing

Bradford, Russell

27

Predicting Liver Transplant Capacity Using Discrete Event Simulation.

The number of liver transplants (LTs) performed in the US increased until 2006 but has since declined despite an ongoing increase in demand. This decline may be due in part to decreased donor liver quality and increasing discard of poor-quality livers. We constructed a discrete event simulation (DES) model informed by current donor characteristics to predict future LT trends through the year 2030. The data source for our model is the United Network for Organ Sharing database, which contains patient-level information on all organ transplants performed in the US. Previous analysis showed that liver discard is increasing and that discarded organs are more often from donors who are older, are obese, have diabetes, and donated after cardiac death. Given that the prevalence of these factors is increasing, the DES model quantifies the reduction in the number of LTs performed through 2030. In addition, the model estimatesthe total number of future donors needed to maintain the current volume of LTs and the effect of a hypothetical scenario of improved reperfusion technology.We also forecast the number of patients on the waiting list and compare this with the estimated number of LTs to illustrate the impact that decreased LTs will have on patients needing transplants. By altering assumptions about the future donor pool, this model can be used to develop policy interventions to prevent a further decline in this lifesaving therapy. To our knowledge, there are no similar predictive models of future LT use based on epidemiological trends. PMID:25391681

Toro-Diaz, Hector; Mayorga, Maria E; Barritt, A Sidney; Orman, Eric S; Wheeler, Stephanie B

2014-11-12

28

Enhancing Complex System Performance Using Discrete-Event Simulation

In this paper, we utilize discrete-event simulation (DES) merged with human factors analysis to provide the venue within which the separation and deconfliction of the system/human operating principles can occur. A concrete example is presented to illustrate the performance enhancement gains for an aviation cargo flow and security inspection system achieved through the development and use of a process DES. The overall performance of the system is computed, analyzed, and optimized for the different system dynamics. Various performance measures are considered such as system capacity, residual capacity, and total number of pallets waiting for inspection in the queue. These metrics are performance indicators of the system's ability to service current needs and respond to additional requests. We studied and analyzed different scenarios by changing various model parameters such as the number of pieces per pallet ratio, number of inspectors and cargo handling personnel, number of forklifts, number and types of detection systems, inspection modality distribution, alarm rate, and cargo closeout time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted performance measures identified effective ways to meet inspection requirements while maintaining or reducing overall operational cost and eliminating any shipping delays associated with any proposed changes in inspection requirements. With this understanding effective operational strategies can be developed to optimally use personnel while still maintaining plant efficiency, reducing process interruptions, and holding or reducing costs.

Allgood, Glenn O [ORNL; Olama, Mohammed M [ORNL; Lake, Joe E [ORNL

2010-01-01

29

Dessert, an Open-Source .NET Framework for Process-Based Discrete-Event Simulation

Dessert, an Open-Source .NET Framework for Process-Based Discrete-Event Simulation Giovanni Lagorio.p.A. Genova, Italy Email: alessio.parma@finsa.it Abstract--We present Dessert, an open-source framework for process-based discrete-event simulation, designed to retain the simplicity and flexibility of Sim

Robbiano, Lorenzo

30

This paper reports on the progress made toward the emergence of standards to support the integration of heteroge- neous discrete-event simulations (DESs) created in specialist sup- port tools called commercial-off-the-shelf (COTS) discrete-event simulation packages (CSPs). The general standard for heteroge- neous integration in this area has been developed from research in distributed simulation and is the IEEE 1516 standard The

Simon J. E. Taylor; Xiaoguang Wang; Stephen John Turner; Malcolm Yoke-hean Low

2006-01-01

31

Discrete event simulation and production system design for Rockwell hardness test blocks

The research focuses on increasing production volume and decreasing costs at a hardness test block manufacturer. A discrete event simulation model is created to investigate potential system wide improvements. Using the ...

Scheinman, David Eliot

2009-01-01

32

Modeling and analyzing a physician clinic environment using discrete-event (visual) simulation

This paper examines the design and development of a discrete-event (visual) simulation model of a physician clinic environment within a physician network. Biological & Popular Culture, Inc. (Biopop) sought to partner with healthcare professionals to provide high-quality, cost-effective medical care within a physician network setting. Towards this end, a discrete-event (visual) simulation model that captures both the operations of a

James R. Swisher; Sheldon H. Jacobson; J. Brian Jun; Osman Balci

2001-01-01

33

A generic framework for real-time discrete event simulation (DES) modelling

This paper suggests a generic simulation platform that can be used for real-time discrete event simulation modeling. The architecture of the proposed system is based on a tested flexible input data architecture developed in Lab- view, a real-time inter-process communication module be- tween the Labview application and a discrete event simula- tion software (in this case Arena). Two example applications

Siamak Tavakoli; Alireza Mousavi; Alexander Komashie

2008-01-01

34

Evaluating the Design of a Family Practice Healthcare Clinic Using Discrete-Event Simulation

With increased pressures from governmental and insurance agencies, today's physician devotes less time to patient care and more time to administration. To assist physician clinics in evaluating potential operating procedures that improve operating efficiencies and better satisfy patients, an object-oriented discrete-event simulation model has been constructed using the Visual Simulation Environment (VSE). The research presented herein describes a methodology for

James R. Swisher; Sheldon H. Jacobson

2002-01-01

35

Discrete-event based simulation conceptual modeling of systems biology

The protein production from DNA to protein via RNA is a very complicated process, which could be called central dogma. In this paper, we used event based simulation to model, simulate, analyze and specify the three main processes that are involved in the process of protein production: replication, transcription, and translation. The whole control flow of event-based simulation is composed

Joe W. Yeol; Issac Barjis; Yeong S. Ryu; Joseph Barjis

2005-01-01

36

Discrete-event simulation on the World Wide Web using Java

This paper introduces Simkit, a small set of Java classes for creating discrete event simulation models. Simkit may be used to either implement stand-alone models or Web page applets. Exploiting network capabilities of Java, the lingua franca of the World Wide Web (WWW), Simkit models can easily be implemented as applets and executed in a Web browser. Java's graphical capabilities

Arnold H. Buss; Kirk A. Stork

1996-01-01

37

An important use for discrete-event simulation models lies in comparing and contrasting competing design alternatives without incurring any physical costs. This article presents a survey of the literature for two widely used classes of statistical methods for selecting the best design from among a finite set of k alternatives: ranking and selection (R&S) and multiple comparison procedures (MCPs). A comprehensive

James R. Swisher; Sheldon H. Jacobson; Enver Yücesan

2003-01-01

38

A survey of ranking, selection, and multiple comparison procedures for discrete-event simulation

Discrete event simulation models are often constructed so that an analyst may compare two or more competing design alternatives. The paper presents a survey of the literature for two widely used statistical methods for selecting the best design from among a finite set of k alternatives: ranking and selection (R&S) and multiple comparison procedures (MCPs). A comprehensive survey of each

James R. Swisher; Sheldon H. Jacobson

1999-01-01

39

Component-based Discrete Event Simulation Using the Fractal Component Model

-based Software Engineering, Fractal Component model. I. INTRODUCTION From an historical perspectiveComponent-based Discrete Event Simulation Using the Fractal Component Model Olivier Dalle MASCOTTE, FRANCE. E-mail: Olivier.Dalle@sophia.inria.fr Abstract-- In this paper we show that Fractal, a generic

Bermond, Jean-Claude

40

Combining latin hypercube designs and discrete event simulation in a study of a surgical unit

Summary form given only:In this article experiments on a discrete event simulation model for an orthopedic surgery are considered. The model is developed as part of a larger project in co-operation with Copenhagen University Hospital in Gentofte. Experiments on the model are performed by using Latin hypercube designs. The parameter set consists of system settings such as use of preparation

C. Dehlendorff; M. Kulahci; K. K. Andersen

2007-01-01

41

DISCRETE EVENT SIMULATION OF OPTICAL SWITCH MATRIX PERFORMANCE IN COMPUTER NETWORKS

In this paper, we present application of a Discrete Event Simulator (DES) for performance modeling of optical switching devices in computer networks. Network simulators are valuable tools in situations where one cannot investigate the system directly. This situation may arise if the system under study does not exist yet or the cost of studying the system directly is prohibitive. Most available network simulators are based on the paradigm of discrete-event-based simulation. As computer networks become increasingly larger and more complex, sophisticated DES tool chains have become available for both commercial and academic research. Some well-known simulators are NS2, NS3, OPNET, and OMNEST. For this research, we have applied OMNEST for the purpose of simulating multi-wavelength performance of optical switch matrices in computer interconnection networks. Our results suggest that the application of DES to computer interconnection networks provides valuable insight in device performance and aids in topology and system optimization.

Imam, Neena [ORNL; Poole, Stephen W [ORNL

2013-01-01

42

This paper presents a literature survey on recent use of discrete-event simulation in real-world manufacturing logistics decision-making. The sample of the survey consists of 52 relevant application papers from recent Winter Simulation Conference proceedings. We investigated what decisions were supported by the applications, case company characteristics, some methodological issues, and the software tools used. We found that the majority of

Marco Semini; Hakon Fauske; Jan Ola Strandhagen

2006-01-01

43

Discrete event model-based simulation for train movement on a single-line railway

NASA Astrophysics Data System (ADS)

The aim of this paper is to present a discrete event model-based approach to simulate train movement with the considered energy-saving factor. We conduct extensive case studies to show the dynamic characteristics of the traffic flow and demonstrate the effectiveness of the proposed approach. The simulation results indicate that the proposed discrete event model-based simulation approach is suitable for characterizing the movements of a group of trains on a single railway line with less iterations and CPU time. Additionally, some other qualitative and quantitative characteristics are investigated. In particular, because of the cumulative influence from the previous trains, the following trains should be accelerated or braked frequently to control the headway distance, leading to more energy consumption.

Xu, Xiao-Ming; Li, Ke-Ping; Yang, Li-Xing

2014-08-01

44

A survey of ranking, selection, and multiple comparison procedures for discrete-event simulation

Discrete-event simulation models are often constructed so that an analyst may compare two or more competing design alternatives. This paper presents a survey of the literature for two widely-used statistical methods for selecting the best design from among a finite set of k alternatives: ranking and selection (R&S) and multiple comparison procedures (MCPs). A comprehensive survey of each topic is

James R. Swisher; Sheldon H. Jacobson

1999-01-01

45

This study, through the use of discrete event simulation and modeling, explores various prioritization disciplines for U.S. Air Force Military Family Housing maintenance, repair, and renovation projects. Actual data from the Military Family Housing...

Krukenberg, Harry J.

1996-01-01

46

In most decision-analytic models in health care, it is assumed that there is treatment without delay and availability of all required resources. Therefore, waiting times caused by limited resources and their impact on treatment effects and costs often remain unconsidered. Queuing theory enables mathematical analysis and the derivation of several performance measures of queuing systems. Nevertheless, an analytical approach with closed formulas is not always possible. Therefore, simulation techniques are used to evaluate systems that include queuing or waiting, for example, discrete event simulation. To include queuing in decision-analytic models requires a basic knowledge of queuing theory and of the underlying interrelationships. This tutorial introduces queuing theory. Analysts and decision-makers get an understanding of queue characteristics, modeling features, and its strength. Conceptual issues are covered, but the emphasis is on practical issues like modeling the arrival of patients. The treatment of coronary artery disease with percutaneous coronary intervention including stent placement serves as an illustrative queuing example. Discrete event simulation is applied to explicitly model resource capacities, to incorporate waiting lines and queues in the decision-analytic modeling example. PMID:20345550

Jahn, Beate; Theurl, Engelbert; Siebert, Uwe; Pfeiffer, Karl-Peter

2010-01-01

47

DeMO: An Ontology for Discrete-event Modeling and Simulation

Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

2011-01-01

48

Incorporating discrete event simulation into quality improvement efforts in health care systems.

Quality improvement (QI) efforts are an indispensable aspect of health care delivery, particularly in an environment of increasing financial and regulatory pressures. The ability to test predictions of proposed changes to flow, policy, staffing, and other process-level changes using discrete event simulation (DES) has shown significant promise and is well reported in the literature. This article describes how to incorporate DES into QI departments and programs in order to support QI efforts, develop high-fidelity simulation models, conduct experiments, make recommendations, and support adoption of results. The authors describe how DES-enabled QI teams can partner with clinical services and administration to plan, conduct, and sustain QI investigations. PMID:24324280

Rutberg, Matthew Harris; Wenczel, Sharon; Devaney, John; Goldlust, Eric Jonathan; Day, Theodore Eugene

2015-01-01

49

NASA Technical Reports Server (NTRS)

CONFIG is a modeling and simulation tool prototype for analyzing the normal and faulty qualitative behaviors of engineered systems. Qualitative modeling and discrete-event simulation have been adapted and integrated, to support early development, during system design, of software and procedures for management of failures, especially in diagnostic expert systems. Qualitative component models are defined in terms of normal and faulty modes and processes, which are defined by invocation statements and effect statements with time delays. System models are constructed graphically by using instances of components and relations from object-oriented hierarchical model libraries. Extension and reuse of CONFIG models and analysis capabilities in hybrid rule- and model-based expert fault-management support systems are discussed.

Malin, Jane T.; Basham, Bryan D.

1989-01-01

50

Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling

NASA Technical Reports Server (NTRS)

NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand DES capabilities to address KSC's planning needs.

Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.

2010-01-01

51

Sudden Cardiac Death (SCD) is responsible for at least 180,000 deaths a year and incurs an average cost of $286 billion annually in the United States alone. Herein, we present a novel discrete event simulation model of SCD, which quantifies the chains of events associated with the formation, growth, and rupture of atheroma plaques, and the subsequent formation of clots, thrombosis and on-set of arrhythmias within a population. The predictions generated by the model are in good agreement both with results obtained from pathological examinations on the frequencies of three major types of atheroma, and with epidemiological data on the prevalence and risk of SCD. These model predictions allow for identification of interventions and importantly for the optimal time of intervention leading to high potential impact on SCD risk reduction (up to 8-fold reduction in the number of SCDs in the population) as well as the increase in life expectancy. PMID:23648451

Andreev, Victor P.; Head, Trajen; Johnson, Neil; Deo, Sapna K.; Daunert, Sylvia; Goldschmidt-Clermont, Pascal J.

2013-01-01

52

Molecular dynamics simulation based on discrete event simulation (DMD) is emerging as an alternative to time-step driven molecular dynamics (MD). Although DMD improves performance by sev- eral orders of magnitude, it is still compute bound. In previous work, we found that FPGAs are extremely well suited to accelerating DMD, with speed-ups of to being achieved. Large models, how- ever, are

Martin C. Herbordt; Francois Kosie; Josh Model

2008-01-01

53

Evaluating the design of a family practice healthcare clinic using discrete-event simulation.

With increased pressures from governmental and insurance agencies, today's physician devotes less time to patient care and more time to administration. To assist physician clinics in evaluating potential operating procedures that improve operating efficiencies and better satisfy patients, an object-oriented discrete-event simulation model has been constructed using the Visual Simulation Environment (VSE). The research presented herein describes a methodology for determining appropriate staffing and physical resources in a clinical environment using this simulation model. This methodology takes advantage of several simulation-based statistical techniques, including batch means: fractional factorial design: and simultaneous ranking, selection, and multiple comparisons. A clinic effectiveness measure is introduced that captures several objectives within a health care clinic, including profitability and patient satisfaction. An explanation of the experimental design is provided and results of the experimentation are presented. Based upon the experimental results, conclusions are drawn and recommendations are made for an appropriate staffing and facility size for a two physician family practice healthcare clinic. PMID:11993750

Swisher, James R; Jacobson, Sheldon H

2002-04-01

54

In this paper, we present the first published healthcare application of discrete-event simulation embedded in an ant colony optimization model. We consider the problem of choosing optimal screening policies for retinopathy, a serious complication of diabetes. In order to minimize the screening cost per year of sight saved, compared with a situation with no screening, individuals aged between 30 and

Marion S. Rauner; Walter J. Gutjahr; Sally C. Brailsford; Wolfgang Zeppelzauer

55

Comparing Simulation Output Accuracy of Discrete Event and Agent Based Models: A Quantitive Approach

In our research we investigate the output accuracy of discrete event simulation models and agent based simulation models when studying human centric complex systems. In this paper we focus on human reactive behaviour as it is possible in both modelling approaches to implement human reactive behaviour in the model by using standard methods. As a case study we have chosen the retail sector, and here in particular the operations of the fitting room in the women wear department of a large UK department store. In our case study we looked at ways of determining the efficiency of implementing new management policies for the fitting room operation through modelling the reactive behaviour of staff and customers of the department. First, we have carried out a validation experiment in which we compared the results from our models to the performance of the real system. This experiment also allowed us to establish differences in output accuracy between the two modelling methids. In a second step a multi-scenario experimen...

Majid, Mazlina Abdul; Siebers, Peer-Olaf

2010-01-01

56

Tutorial: Parallel Simulation on Supercomputers

This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.

Perumalla, Kalyan S [ORNL

2012-01-01

57

Combining Simulation and Animation of Queueing Scenarios in a Flash-Based Discrete Event Simulator

eLearning is an effective medium for delivering knowledge and skills to scattered learners. In spite of improvements in electronic\\u000a delivery technologies, eLearning is still a long way away from offering anything close to efficient and effective learning\\u000a environments. Much of the literature supports eLearning materials which embed simulation will improve eLearning experience\\u000a and promise many benefits for both teachers and

Ruzelan Khalid; Wolfgang Kreutzer; Tim Bell

2009-01-01

58

This report outlines a methodology to study the effects of disruptive events on nuclear waste material in stable geologic sites. The methodology is based upon developing a discrete events model that can be simulated on the computer. This methodology allows a natural development of simulation models that use computer resources in an efficient manner. Accurate modeling in this area depends in large part upon accurate modeling of ion transport behavior in the storage media. Unfortunately, developments in this area are not at a stage where there is any consensus on proper models for such transport. Consequently, our work is directed primarily towards showing how disruptive events can be properly incorporated in such a model, rather than as a predictive tool at this stage. When and if proper geologic parameters can be determined, then it would be possible to use this as a predictive model. Assumptions and their bases are discussed, and the mathematical and computer model are described.

Aggarwal, S.; Ryland, S.; Peck, R.

1980-06-19

59

Using machine learning techniques to interpret results from discrete event

Using machine learning techniques to interpret results from discrete event simulation Dunja Mladeni machine learning techniques. The results of two simulators were processed as machine learning problems discovered. Key words: discrete event simulation, machine learning, artificial intelligence 1 Introduction

Mladenic, Dunja

60

Background Computer simulation studies of the emergency department (ED) are often patient driven and consider the physician as a human resource whose primary activity is interacting directly with the patient. In many EDs, physicians supervise delegates such as residents, physician assistants and nurse practitioners each with different skill sets and levels of independence. The purpose of this study is to present an alternative approach where physicians and their delegates in the ED are modeled as interacting pseudo-agents in a discrete event simulation (DES) and to compare it with the traditional approach ignoring such interactions. Methods The new approach models a hierarchy of heterogeneous interacting pseudo-agents in a DES, where pseudo-agents are entities with embedded decision logic. The pseudo-agents represent a physician and delegate, where the physician plays a senior role to the delegate (i.e. treats high acuity patients and acts as a consult for the delegate). A simple model without the complexity of the ED is first created in order to validate the building blocks (programming) used to create the pseudo-agents and their interaction (i.e. consultation). Following validation, the new approach is implemented in an ED model using data from an Ontario hospital. Outputs from this model are compared with outputs from the ED model without the interacting pseudo-agents. They are compared based on physician and delegate utilization, patient waiting time for treatment, and average length of stay. Additionally, we conduct sensitivity analyses on key parameters in the model. Results In the hospital ED model, comparisons between the approach with interaction and without showed physician utilization increase from 23% to 41% and delegate utilization increase from 56% to 71%. Results show statistically significant mean time differences for low acuity patients between models. Interaction time between physician and delegate results in increased ED length of stay and longer waits for beds. Conclusion This example shows the importance of accurately modeling physician relationships and the roles in which they treat patients. Neglecting these relationships could lead to inefficient resource allocation due to inaccurate estimates of physician and delegate time spent on patient related activities and length of stay. PMID:23692710

2013-01-01

61

Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

62

A methodology for fabrication of intelligent discrete-event simulation models

In this article a meta-specification for the software requirements and design of intelligent discrete next-event simulation models has been presented. The specification is consistent with established practices for software development as presented in the software engineering literature. The specification has been adapted to take into consideration the specialized needs of object-oriented programming resulting in the actor-centered taxonomy. The heart of the meta-specification is the methodology for requirements specification and design specification of the model. The software products developed by use of the methodology proposed herein are at the leading edge of technology in two very synergistic disciplines - expert systems and simulation. By incorporating simulation concepts into expert systems a deeper reasoning capability is obtained - one that is able to emulate the dynamics or behavior of the object system or process over time. By including expert systems concepts into simulation, the capability to emulate the reasoning functions of decision-makers involved with (and subsumed by) the object system is attained. In either case the robustness of the technology is greatly enhanced.

Morgeson, J.D.; Burns, J.R.

1987-01-01

63

This paper presents a literature survey on recent use of dis- crete-event simulation in real-world manufacturing logis- tics decision-making. The sample of the survey consists of 52 relevant application papers from recent Winter Simula- tion Conference proceedings. We investigated what deci- sions were supported by the applications, case company characteristics, some methodological issues, and the soft- ware tools used. We

Marco Semini; Hakon Fauske; Jan Ola Strandhagen

2006-01-01

64

Forest biomass supply logistics for a power plant using the discrete-event simulation approach

This study investigates the logistics of supplying forest biomass to a potential power plant. Due to the complexities in such a supply logistics system, a simulation model based on the framework of Integrated Biomass Supply Analysis and Logistics (IBSAL) is developed in this study to evaluate the cost of delivered forest biomass, the equilibrium moisture content, and carbon emissions from the logistics operations. The model is applied to a proposed case of 300 MW power plant in Quesnel, BC, Canada. The results show that the biomass demand of the power plant would not be met every year. The weighted average cost of delivered biomass to the gate of the power plant is about C$ 90 per dry tonne. Estimates of equilibrium moisture content of delivered biomass and CO2 emissions resulted from the processes are also provided.

Mobini, Mahdi [University of British Columbia, Vancouver; Sowlati, T. [University of British Columbia, Vancouver; Sokhansanj, Shahabaddine [ORNL

2011-04-01

65

Objective Develop and validate particular, concrete, and abstract yet plausible in silico mechanistic explanations for large intra- and interindividual variability observed for eleven bioequivalence study participants. Do so in the face of considerable uncertainty about mechanisms. Methods We constructed an object-oriented, discrete event model called subject (we use small caps to distinguish computational objects from their biological counterparts). It maps abstractly to a dissolution test system and study subject to whom product was administered orally. A subject comprises four interconnected grid spaces and event mechanisms that map to different physiological features and processes. Drugs move within and between spaces. We followed an established, Iterative Refinement Protocol. Individualized mechanisms were made sufficiently complicated to achieve prespecified Similarity Criteria, but no more so. Within subjects, the dissolution space is linked to both a product-subject Interaction Space and the GI tract. The GI tract and Interaction Space connect to plasma, from which drug is eliminated. Results We discovered parameterizations that enabled the eleven subject simulation results to achieve the most stringent Similarity Criteria. Simulated profiles closely resembled those with normal, odd, and double peaks. We observed important subject-by-formulation interactions within subjects. Conclusion We hypothesize that there were interactions within bioequivalence study participants corresponding to the subject-by-formulation interactions within subjects. Further progress requires methods to transition currently abstract subject mechanisms iteratively and parsimoniously to be more physiologically realistic. As that objective is achieved, the approach presented is expected to become beneficial to drug development (e.g., controlled release) and to a reduction in the number of subjects needed per study plus faster regulatory review. PMID:22938185

2012-01-01

66

Simulating Billion-Task Parallel Programs

In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

Perumalla, Kalyan S [ORNL] [ORNL; Park, Alfred J [ORNL] [ORNL

2014-01-01

67

A Comparison of Two Methods for Advancing Time In Parallel Discrete Event Simulation

. Galluscio, John T. Douglass Brain A. Malloy and A. Joe Turner malloy@cs.clemson.edu Department of Computer-driven and time-driven. We begin by designing an eÃ?cient event-driven approach to model the traÃ?c network; ourÃ?c ow network using two di#11;erent approaches: event-driven and time-driven. We begin by designing an e

Malloy, Brian

68

M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed. PMID:23560037

Khalid, Ruzelan; Nawawi, Mohd Kamal M; Kawsar, Luthful A; Ghani, Noraida A; Kamil, Anton A; Mustafa, Adli

2013-01-01

69

M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed. PMID:23560037

Khalid, Ruzelan; M. Nawawi, Mohd Kamal; Kawsar, Luthful A.; Ghani, Noraida A.; Kamil, Anton A.; Mustafa, Adli

2013-01-01

70

Terminal Dynamics Approach to Discrete Event Systems

NASA Technical Reports Server (NTRS)

This paper presents and discusses a mathematical formalism for simulation of discrete event dynamic (DED)-a special type of 'man-made' systems to serve specific purposes of information processing. The main objective of this work is to demonstrate that the mathematical formalism for DED can be based upon a terminal model of Newtonian dynamics which allows one to relax Lipschitz conditions at some discrete points.!.

Zak, Michail; Meyers, Ronald

1995-01-01

71

Reasoning about Discrete Event Sources

We investigate the modelling of workflows, plans, and other event-generating processes as discrete event sources and reason about the possibility of having event sequences ending in undesirable states. In previous re- search, the problem is shown to be NP-Complete even if the number of events to occur is fixed in advance. In this paper, we consider possible events sequences of

Shieu-hong Lin

2006-01-01

72

Background Osteoporotic fractures cause a large health burden and substantial costs. This study estimated the expected fracture numbers and costs for the remaining lifetime of postmenopausal women in Germany. Methods A discrete event simulation (DES) model which tracks changes in fracture risk due to osteoporosis, a previous fracture or institutionalization in a nursing home was developed. Expected lifetime fracture numbers and costs per capita were estimated for postmenopausal women (aged 50 and older) at average osteoporosis risk (AOR) and for those never suffering from osteoporosis. Direct and indirect costs were modeled. Deterministic univariate and probabilistic sensitivity analyses were conducted. Results The expected fracture numbers over the remaining lifetime of a 50 year old woman with AOR for each fracture type (% attributable to osteoporosis) were: hip 0.282 (57.9%), wrist 0.229 (18.2%), clinical vertebral 0.206 (39.2%), humerus 0.147 (43.5%), pelvis 0.105 (47.5%), and other femur 0.033 (52.1%). Expected discounted fracture lifetime costs (excess cost attributable to osteoporosis) per 50 year old woman with AOR amounted to €4,479 (€1,995). Most costs were accrued in the hospital €1,743 (€751) and long-term care sectors €1,210 (€620). Univariate sensitivity analysis resulted in percentage changes between -48.4% (if fracture rates decreased by 2% per year) and +83.5% (if fracture rates increased by 2% per year) compared to base case excess costs. Costs for women with osteoporosis were about 3.3 times of those never getting osteoporosis (€7,463 vs. €2,247), and were markedly increased for women with a previous fracture. Conclusion The results of this study indicate that osteoporosis causes a substantial share of fracture costs in postmenopausal women, which strongly increase with age and previous fractures. PMID:24981316

2014-01-01

73

NASA Technical Reports Server (NTRS)

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

Nicol, David; Fujimoto, Richard

1992-01-01

74

DISCRETE EVENT MODELING IN PTOLEMY II

Abstract This report describes the discrete-event semantics and its implementation,in the Ptolemy II software architecture. The discrete-event system representation is appropriate for time-oriented systems such as queueing systems, communication networks, and hardware systems. A key strength in our discrete-event implementation ,is that simultaneous ,events are handled systematically and deterministically. A formal and rigorous treatment of this property is given. One

Lukito Muliadi

75

We review the design, selected applications and parallelperformance of WiPPET, a general parallel simulationtestbed for various types of wireless networks. WiPPEThas been written in TeD\\/C++, a new object-oriented modelingframework that isolates network modeling from theunderlying parallel discrete event simulator. We describethe techniques for modeling radio propagation (long andshort-scale fading and interference) and protocols thatpromote scalability of parallel simulations at session

Owen Kelly; Jie Lai; Narayan B. Mandayam; Andrew T. Ogielski; Jignesh Panchal; Roy D. Yates

2000-01-01

76

Efficient Parallel Transaction Level Simulation by Exploiting Temporal Decoupling

NASA Astrophysics Data System (ADS)

In recent years, transaction level modeling (TLM) has enabled designers to simulate complex embedded systems and SoCs, orders of magnitude faster than simulation at the RTL. The increasing complexity of the systems on one hand, and availability of low cost parallel processing resources on the other hand have motivated the development of parallel simulation environments for TLMs. The existing simulation environments used for parallel simulation of TLMs are intended for general discrete event models and do not take advantage of the specific properties of TLMs. The fine-grain synchronization and communication between simulators in these environments can become a major impediment to the efficiency of the simulation environment. In this work, we exploit the properties of temporally decoupled TLMs to increase the efficiency of parallel simulation. Our approach does not require a special simulation kernel. We have implemented a parallel TLM simulation framework based on the publicly available OSCI SystemC simulator. The framework is based on the communication interfaces proposed in the recent OSCI TLM 2 standard. Our experimental results show the reduced synchronization overhead and improved simulation performance.

Salimi Khaligh, Rauf; Radetzki, Martin

77

We review the design, selected applications and performance of WiPPET (Wireless Propagation and Protocol Evaluation Testbed), a general parallel simulation testbed for various types of wireless networks. WiPPET has been written in TeD\\/C++, an object-oriented modeling framework that isolates network modeling from the underlying parallel discrete event simulator. We describe the techniques for modeling radio propagation (long and short-scale fading

O. E. Kelly; J. Lai; N. B. Mandayam; A. T. Ogielski; J. Panchal; R. D. Yates

2000-01-01

78

An algebra of discrete event processes

NASA Technical Reports Server (NTRS)

This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.

Heymann, Michael; Meyer, George

1991-01-01

79

Discrete event control of an unmanned aircraft

This paper describes the application of a limited-lookahead discrete event supervisory controller that handles the control aspects of the unmanned aerial vehicle (UAV) ldquosense and avoidrdquo (SAA) problem. The controlled UAV and the approaching uncontrolled intruding aircraft that must be avoided are treated as a hybrid system. The UAV control decision making process is discrete, while the embedded flight model

Mehdi Fatemi; James Millan; Jonathan Stevenson; Tina Yu; Siu O'Young

2008-01-01

80

Opacity verification in stochastic discrete event systems

Motivated by security and privacy considerations in applications of discrete event systems, various notions of opacity have been introduced. Specifically, a system is said to be current-state opaque if the entrance of the system state to a set of secret states remains opaque (uncertain) to an intruder - at least until the system leaves the set of secret states. This

Anooshiravan Saboori; Christoforos N. Hadjicostis

2010-01-01

81

Objectives Transfusion of allogeneic blood is still common in orthopedic surgery. This analysis evaluates from the perspective of a German hospital the potential cost savings of Epoetin alfa (EPO) compared to predonated autologous blood transfusions or to a nobloodconservationstrategy (allogeneic blood transfusion strategy)during elective hip and knee replacement surgery. Methods Individual patients (N?=?50,000) were simulated based on data from controlled trials, the German DRG institute (InEK) and various publications and entered into a stochastic model (Monte-Carlo) of three treatment arms: EPO, preoperative autologous donation and nobloodconservationstrategy. All three strategies lead to a different risk for an allogeneic blood transfusion. The model focused on the costs and events of the three different procedures. The costs were obtained from clinical trial databases, the German DRG system, patient records and medical publications: transfusion (allogeneic red blood cells: €320/unit and autologous red blood cells: €250/unit), pneumonia treatment (€5,000), and length of stay (€300/day). Probabilistic sensitivity analyses were performed to determine which factors had an influence on the model's clinical and cost outcomes. Results At acquisition costs of €200/40,000 IU EPO is cost saving compared to autologous blood donation, and cost-effective compared to a nobloodconservationstrategy. The results were most sensitive to the cost of EPO, blood units and hospital days. Conclusions EPO might become an attractive blood conservation strategy for anemic patients at reasonable costs due to the reduction in allogeneic blood transfusions, in the modeled incidence of transfusion-associated pneumonia andthe prolongedlength of stay. PMID:24039829

Tomeczkowski, Jörg; Stern, Sean; Müller, Alfred; von Heymann, Christian

2013-01-01

82

Nonlinear Control and Discrete Event Systems

NASA Technical Reports Server (NTRS)

As the operation of large systems becomes ever more dependent on extensive automation, the need for an effective solution to the problem of design and validation of the underlying software becomes more critical. Large systems possesses much detailed structure, typically hierarchical, and they are hybrid. Information processing at the top of the hierarchy is by means of formal logic and sentences; on the bottom it is by means of simple scalar differential equations and functions of time; and in the middle it is by an interacting mix of nonlinear multi-axis differential equations and automata, and functions of time and discrete events. The lecture will address the overall problem as it relates to flight vehicle management, describe the middle level, and offer a design approach that is based on Differential Geometry and Discrete Event Dynamic Systems Theory.

Meyer, George; Null, Cynthia H. (Technical Monitor)

1995-01-01

83

Robust distributed control in discrete-event systems

In the context of discrete-event systems, control is defined as the enforcement of legal behavior in the propagation of and reaction to random discrete events. Conditions for the existence of supervisory control and its efficiency for decentralized discrete-event systems are presented. Monitoring state machines (MSMs) are introduced for robust and efficient supervisory control for a class of distributed discrete-event systems.

S. Chand; Camino Dos Rios; Thousand Oaks

1990-01-01

84

ZAMBEZI: a parallel pattern parallel fault sequential circuit fault simulator

Sequential circuit fault simulators use the multiple bits in a computer data word to accelerate simulation. We introduce, and implement, a new sequential circuit fault simulator, a parallel pattern parallel fault simulator, ZAMBEZI, which simultaneously simulates multiple faults with multiple vectors in one data word. ZAMBEZI is developed by enhancing the control flow, of existing parallel pattern algorithms. For a

Minesh B. Amin; Bapiraju Vinnakota

1996-01-01

85

Planning and supervision of reactor defueling using discrete event techniques

New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.

Garcia, H.E.; Imel, G.R. [Argonne National Lab., IL (United States); Houshyar, A. [Western Michigan Univ., Kalamazoo, MI (United States). Dept. of Physics

1995-12-31

86

Parallel circuit simulation on supercomputers

Circuit simulation is a very time-consuming and numerically intensive application, especially when the problem size is large as in the case of VLSI circuits. To improve the performance of circuit simulators without sacrificing accuracy, a variety of parallel processing algorithms have been investigated due to the recent availability of a number of commercial multiprocessor machines. In this paper, research in

R. A. Saleh; K. A. Gallivan; M.-C. Chang; I. N. Hajj; T. N. Trick; D. Smart

1989-01-01

87

Parallel Event-Driven Global Magnetospheric Hybrid Simulations

NASA Astrophysics Data System (ADS)

Global MHD/Hall-MHD magnetospheric models are not able to capture the full diversity of scales and processes that control the Earth's magnetosphere. In order to significantly improve the predictive capabilities of global space weather models, new CPU-efficient algorithms are needed, which could properly account for ion kinetic effects in a large computational domain over long simulation times. To achieve this much expected breakthrough in hybrid (particle ions and fluid electrons) simulations we developed a novel asynchronous time integration technique known as Discrete-Event Simulation (DES). DES replaces conventional time stepping with event processing, which allows to update macro-particles and grid-based fields on their own timescales. This unique capability of DES removes the traditional CFL constraint on the global timestep and enables natural (event-driven) coupling of multi-physics components in a global application model. We report first-ever parallel 2D hybrid DES (HYPERS) runs and compare them with similar time-stepped simulations. We also discuss our undergoing efforts on developing efficient load-balancing strategies for future 3D HYPERS runs on petascale architectures.

Omelchenko, Y. A.; Karimabadi, H.; Saule, E.; Catalyurek, U. V.

2010-12-01

88

An assessment of the ModSim/TWOS parallel simulation environment

The Time Warp Operating System (TWOS) has been the focus of significant research in parallel, discrete-event simulation (PDES). A new language, ModSim, has been developed for use in conjunction with TWOS. The coupling of ModSim and TWOS is an attempt to address the development of large-scale, complex, discrete-event simulation models for parallel execution. The approach, simply stated, is to provide a high-level simulation-language that embodies well-known software engineering principles combined with a high-performance parallel execution environment. The inherent difficulty with this approach is the mapping of the simulation application to the parallel run-time environment. To use TWOS, Time Warp applications are currently developed in C and must be tailored according to a set of constraints and conventions. C/TWOS applications are carefully developed using explicit calls to the Time Warp primitives; thus, the mapping of application to parallel run-time environment is done by the application developer. The disadvantage to this approach is the questionable scalability to larger software efforts; the obvious advantage is the degree of control over managing the efficient execution of the application. The ModSim/TWOS system provides an automatic mapping from a ModSim application to an equivalent C/TWOS application. The major flaw with the ModSim/TWOS system is it currently exists is that there is no compiler support for mapping a ModSim application into an efficient C/TWOS application. Moreover, the ModSim language as currently defined does not provide explicit hooks into the Time Warp Operating System and hence the developer is unable to tailor a ModSim application in the same fashion that a C application can be tailored. Without sufficient compiler support, there is a mismatch between ModSim's object-oriented, process-based execution model and the Time Warp execution model.

Rich, D.O.; Michelsen, R.E.

1991-01-01

89

A deterministic optimal control theory for discrete event systems

In certain discrete event applications it may be desirable to find a particular controller, within the set of acceptable controllers, which extremises some quantitative performance measure. In this paper we propose a theory of optimal control to meet such design requirements for deterministic systems. The discrete event system (DES) is modelled by a regular language. Event and cost functions are

Raja Sengupta; StCphane Lafortune

1993-01-01

90

Optimal Sensor and Actuator Choices for Discrete Event Systems \\Lambda

Optimal Sensor and Actuator Choices for Discrete Event Systems \\Lambda Stanley D. Young y Vijay K sensors and actuators to control a given discrete event system so that the closed loop beÂ havior: the polynomial solution to the choice of actuators, the polynomial soÂ lution to the choice of sensors

Garg, Vijay

91

CAISSON: Interconnect Network Simulator

NASA Technical Reports Server (NTRS)

Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.

Springer, Paul L.

2006-01-01

92

Computational Issues in Intelligent Control: Discrete-Event and Hybrid Systems

that are cen- tral in intelligent control. In particular, we discuss how the design, simulationComputational Issues in Intelligent Control: Discrete-Event and Hybrid Systems Xenofon D, IN 46556 e-mail: xkoutsou,antsaklis.1@nd.edu Abstract Intelligent control methodologies are being developed

Koutsoukos, Xenofon D.

93

LAN attack detection using Discrete Event Systems.

Address Resolution Protocol (ARP) is used for determining the link layer or Medium Access Control (MAC) address of a network host, given its Internet Layer (IP) or Network Layer address. ARP is a stateless protocol and any IP-MAC pairing sent by a host is accepted without verification. This weakness in the ARP may be exploited by malicious hosts in a Local Area Network (LAN) by spoofing IP-MAC pairs. Several schemes have been proposed in the literature to circumvent these attacks; however, these techniques either make IP-MAC pairing static, modify the existing ARP, patch operating systems of all the hosts etc. In this paper we propose a Discrete Event System (DES) approach for Intrusion Detection System (IDS) for LAN specific attacks which do not require any extra constraint like static IP-MAC, changing the ARP etc. A DES model is built for the LAN under both a normal and compromised (i.e., spoofed request/response) situation based on the sequences of ARP related packets. Sequences of ARP events in normal and spoofed scenarios are similar thereby rendering the same DES models for both the cases. To create different ARP events under normal and spoofed conditions the proposed technique uses active ARP probing. However, this probing adds extra ARP traffic in the LAN. Following that a DES detector is built to determine from observed ARP related events, whether the LAN is operating under a normal or compromised situation. The scheme also minimizes extra ARP traffic by probing the source IP-MAC pair of only those ARP packets which are yet to be determined as genuine/spoofed by the detector. Also, spoofed IP-MAC pairs determined by the detector are stored in tables to detect other LAN attacks triggered by spoofing namely, man-in-the-middle (MiTM), denial of service etc. The scheme is successfully validated in a test bed. PMID:20804980

Hubballi, Neminath; Biswas, Santosh; Roopa, S; Ratti, Ritesh; Nandi, Sukumar

2011-01-01

94

Xyce parallel electronic simulator design.

This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

2010-09-01

95

Modelling machine ensembles with discrete event dynamical system theory

NASA Technical Reports Server (NTRS)

Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).

Hunter, Dan

1990-01-01

96

Discrete Event Supervisory Control Applied to Propulsion Systems

NASA Technical Reports Server (NTRS)

The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.

Litt, Jonathan S.; Shah, Neerav

2005-01-01

97

Parallel sorting algorithms for optimizing particle simulations

Real world particle simulation codes have to handle a huge number of particles and their interactions. Thus, parallel implementations are required to get suitable production codes. Parallel sorting is often used to organize the set of particles or to redistribute data for locality and load balancing concerns. In this article, the use and design of parallel sorting algorithms for parallel

Michael Hofmann; G. Runger; P. Gibbon; R. Speck

2010-01-01

98

Data parallel sequential circuit fault simulation

Sequential circuit fault simulation is a compute-intensive problem. Parallel simulation is one method to reduce fault simulation time. In this paper, we discuss a novel technique to partition the fault set for the fault parallel simulation of sequential circuits on multiple processors. When applied statically, the technique can scale well for up to thirty two processors on an ethernet. The

Minesh B. Amin; Bapiraju Vinnakota

1996-01-01

99

Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offs in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.

Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

2013-01-01

100

Graphite : a parallel distributed simulator for multicores

This thesis describes Graphite, a parallel, distributed simulator for simulating large-scale multicore architectures, and focuses particularly on the functional aspects of simulating a single, unmodified multi-threaded ...

Kasture, Harshad

2010-01-01

101

The Computational Complexity of Decentralized Discrete-Event Control Problems

exible manufacturing systems [LW90] and communication systems [CDFV88], [RW90], [RW92a]. At this On one a pedagogical and mathematical purpose and have been highly simpli#12;ed versions of physically realistic appli by the restricted decentralized discrete-event control formulation. In particular, communication protocol veri#12

102

Extracting Discrete Event System Models from Hybrid Control Systems

consists of three parts. The modeling and interactions of these parts are now described. 2.1 Plant halfspaces are used to define a set of plant events and a discrete event system model is generated which captures the behavior of the plant and interface of the hybrid control sys- tem. 1 Introduction be used

Antsaklis, Panos

103

Diagnosis of asynchronous discrete event systems, a net unfolding approach

collects, is not a sequence of alarms, but rather a partially ordered set of alarms. Fault diagnosis1 Diagnosis of asynchronous discrete event systems, a net unfolding approach Albert Benveniste, Fellow, IEEE, Eric Fabre, Stefan Haar, and Claude Jard Abstract-- In this paper we consider the diagnosis

Paris-Sud XI, UniversitÃ© de

104

Diagnosis of Discrete-Event Systems Using Satisfiability Algorithms

The diagnosis of a discrete-event system is the problem of computing possible behaviors of the system given observa- tions of the actual behavior, and testing whether the behaviors are normal or faulty. We show how the diagnosis problems can be translated into the propositional satisfiability pro blem (SAT) and solved by algorithms for SAT. Our experiments demonstrate that current SAT

Alban Grastien; Anbulagan; Jussi Rintanen; Elena Kelareva

2007-01-01

105

Optimal Sensor and Actuator Choices for Discrete Event Systems

Optimal Sensor and Actuator Choices for Discrete Event Systems Stanley D. Young y Vijay K. Garg@pine.ece.utexas.edu January 6, 1994 Abstract We present algorithms to optimally choose sensors and actuators to control results of this paper are algorithms which demonstrate: the polynomial solution to the choice of actuators

Garg, Vijay

106

Notions of security and opacity in discrete event systems

In this paper, we follow a state-based approach to extend the notion of opacity in computer security to discrete event systems. A system is (S, P)-opaque if the evolution of its true state through a set of secret states S remains opaque to an observer who is observing activity in the system through the projection map P. In other words,

Anooshiravan Saboori; Christoforos N. Hadjicostis

2007-01-01

107

On the Scalability and Dynamic Load-Balancing of Optimistic Gate Level Simulation

As proscribed by Moore's law, the size of integrated circuits has grown geometrically, resulting in simulation becoming the major bottleneck in the circuit design process. Parallel simulation provides us with a way to cope with this growth. In this paper, we describe an optimistic (time warp) parallel discrete event simulator which can simulate all synthesizeable Verilog circuits. We investigate its

Sina Meraji; Wei Zhang; Carl Tropper

2010-01-01

108

Parallel quantum computer simulation on the GPU

Simulation of quantum computers using classical computers is a hard problem with high memory and computa- tional requirements. Parallelization can alleviate this problem, allowing the simulation of more qubits at the same time or the same number of qubits to be simulated in less time. A promising approach is to exploit the high performance computing capa- bilities provided by the

Andrei Amariutei; Simona Caraiman

2011-01-01

109

Parallelized simulation code for multiconjugate adaptive optics

Advances in adaptive optics (AO) systems are necessary to achieve optical performance that is suitable for future extremely large telescopes (ELTs). Accurate simulation of system performance during the design process is essential. We detail the current implementation and near-term development plans for a coarse-grain parallel code for simulations of multiconjugate adaptive optics (MCAO). Included is a summary of the simulation\\

Aron J. Ahmadia; Brent L. Ellerbroek

2003-01-01

110

Graphite: A Distributed Parallel Simulator for Multicores

This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, ...

Beckmann, Nathan

2009-11-09

111

Data Parallel SwitchLevel Simulation \\Lambda Randal E. Bryant

Mellon University Abstract Data parallel simulation involves simulating the beÂ havior of a circuit over runs on a a massivelyÂ parallel SIMD machine, with each processor simulatÂ ing the circuit behavior parallelism in simulation utilize circuit parallelism. In this mode, the simulator extracts parallelism from

Bryant, Randal E.

112

Xyce parallel electronic simulator : users' guide.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

113

Matrix-Based Discrete Event Control for Surveillance Mobile Robotics

This paper focuses on the control system for an autonomous robot for the surveillance of indoor environments. Our approach\\u000a proposes a matrix-based formalism which allows us to merge in a single framework discrete-event supervisory control, conflict\\u000a resolution and reactive control. As a consequence, the robot is able to autonomously handle high level tasks as well as low-level\\u000a behaviors, solving control

Donato Di Paola; David Naso; Biagio Turchiano; Grazia Cicirelli; Arcangelo Distante

2009-01-01

114

Supervisory control of fuzzy discrete event systems: a formal approach

Fuzzy {\\\\it discrete event systems} (DESs) were proposed recently by Lin and\\u000aYing [19], which may better cope with the real-world problems with fuzziness,\\u000aimpreciseness, and subjectivity such as those in biomedicine. As a continuation\\u000aof [19], in this paper we further develop fuzzy DESs by dealing with\\u000asupervisory control of fuzzy DESs. More specifically, (i) we reformulate the\\u000aparallel

Daowen Qiu; D. Qiu

2005-01-01

115

Diagnosis of Discrete-Event Systems using BDDs

We improve the efficiency of Sampath's diagnoser ap- proach by exploiting compact symbolic representations of the system and diagnoser in terms of BDDs. We show promising results on test cases derived from a telecommunication application. Approaches to the diagnosis of discrete-event systems usually suffer from poor on-line performance or space explosion. At one extrem- ity of the spectrum lie on-line

Anika Schumann; Yannick Pencole; Sylvie Thiebaux

116

Opacity-enforcing supervisory strategies for secure discrete event systems

Initial-state opacity emerges as a key property in numerous security applications of discrete event systems including key-stream generators for cryptographic protocols. Specifically, a system is initial-state opaque if the membership of its true initial state to a set of secret states remains uncertain (opaque) to an outside intruder who observes system activity through a given projection map. In this paper,

Anooshiravan Saboori; Christoforos N. Hadjicostis

2008-01-01

117

Parallel Circuit Simulation Using Hierarchical Relaxation

This paper describes a class of parallel algorithms for circuit simulation based on hierarchical relaxation that has been implemented on the Cedar multiprocessor. The Cedar machine is a reconfigurable, general-purpose supercomputer that was designed and implemented at the University of Illinois. A hierarchical circuit simulation scheme was developed to exploit the hierarchical organization of Cedar. The new algorithm and a

Gih-guang Hung; Yen-cheng Wen; Kyle Gallivan; Resve A. Saleh

1990-01-01

118

Parallel optimization for large eddy simulations

We developed a parallel Bayesian optimization algorithm for large eddy simulations. These simulations challenge optimization methods because they take hours or days to compute, and their objective function contains noise as turbulent statistics that are averaged over a finite time. Surrogate based optimization methods, including Bayesian optimization, have shown promise for noisy and expensive objective functions. Here we adapt Bayesian optimization to minimize drag in a turbulent channel flow and to design the trailing edge of a turbine blade to reduce turbulent heat transfer and pressure loss. Our optimization simultaneously runs several simulations, each parallelized to thousands of cores, in order to utilize additional concurrency offered by today's supercomputers.

Talnikar, Chaitanya; Bodart, Julien; Wang, Qiqi

2014-01-01

119

Simulating the scheduling of parallel supercomputer applications

An Event Driven Simulator for Evaluating Multiprocessing Scheduling (EDSEMS) disciplines is presented. The simulator is made up of three components: machine model; parallel workload characterization ; and scheduling disciplines for mapping parallel applications (many processes cooperating on the same computation) onto processors. A detailed description of how the simulator is constructed, how to use it and how to interpret the output is also given. Initial results are presented from the simulation of parallel supercomputer workloads using Dog-Eat-Dog,'' Family'' and Gang'' scheduling disciplines. These results indicate that Gang scheduling is far better at giving the number of processors that a job requests than Dog-Eat-Dog or Family scheduling. In addition, the system throughput and turnaround time are not adversely affected by this strategy. 10 refs., 8 figs., 1 tab.

Seager, M.K.; Stichnoth, J.M.

1989-09-19

120

Department of Electrical Engineering and Computer Science Discrete Event Systems Group

Department of Electrical Engineering and Computer Science 1 Discrete Event Systems Group A Discrete 2000 #12;Department of Electrical Engineering and Computer Science 2 Discrete Event Systems Group of Electrical Engineering and Computer Science 3 Discrete Event Systems Group Requirements for Industrial

Tilbury, Dawn

121

Parallel logic simulation on general purpose machines

Three parallel algorithms for logic simulation have been developed and implemented on a general purpose shared-memory parallel machine. The first algorithm is a synchronous version of a traditional event-driven algorithm which achieves speed-ups of 6 to 9 with 15 processors. The second algorithm is a synchronous unit-delay compiled mode algorithm which achieves speed-ups of 10 to 13 with 15 processors.

Larry Soulé; Tom Blank

1988-01-01

122

Parallel distributed-time logic simulation

The Chandy-Misra algorithm offers more parallelism than the standard event-driven algorithm for digital logic simulation. With suitable enhancements, the Chandy-Misra algorithm also offers significantly better parallel performance. The authors present methods to optimize the algorithm using information about the large number of global synchronization points, called deadlocks, that limit performance. They classify deadlocks and describe them in terms of circuit

L. Soule; A. Gupta

1989-01-01

123

Xyce parallel electronic simulator release notes.

The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

2010-05-01

124

Discrete-Event Execution Alternatives on General Purpose Graphical Processing Units

Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of time-stepped simulations, little is known about the applicability of GPGPUs to discrete event simulation (DES). Here, we identify some of the issues & challenges that the GPGPU stream-based interface raises for DES, and present some possible approaches to moving DES to GPGPUs. Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPGPU.

Perumalla, Kalyan S [ORNL

2006-01-01

125

VALIDATION OF MASSIVELY PARALLEL SIMULATIONS OF DYNAMIC FRACTURE AND

VALIDATION OF MASSIVELY PARALLEL SIMULATIONS OF DYNAMIC FRACTURE AND FRAGMENTATION OF BRITTLE element simulations of dynamic fracture and fragmentation of brittle solids are presented. Fracture the results of massively parallel numerical simulations of dynamic fracture and fragmentation in brittle

Barr, Al

126

A parallel computational model for GATE simulations.

GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. PMID:24070545

Rannou, F R; Vega-Acevedo, N; El Bitar, Z

2013-12-01

127

NWChem: Exploiting parallelism in molecular simulations

NASA Astrophysics Data System (ADS)

NWChem is the software package for computational chemistry on massively parallel computing systems developed by the High Performance Computational Chemistry group for the Environmental Molecular Sciences Laboratory. The software provides a variety of modules for quantum mechanical and classical mechanical simulation. This article describes the design of the molecular dynamics simulation module, which is based on a domain decomposition, and provides implementation details on the data and communication structure and how the code deals with the complexity of atom redistribution and load balancing.

Straatsma, T. P.; Philippopoulos, M.; McCammon, J. A.

2000-06-01

128

Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.

Menon, Suresh

1996-01-01

129

Improving ICU patient flow through discrete-event simulation

Massachusetts General Hospital (MGH), the largest hospital in New England and a national leader in care delivery, teaching, and research, operates ten Intensive Care Units (ICUs), including the 20-bed Ellison 4 Surgical ...

Christensen, Benjamin A. (Benjamin Arthur)

2012-01-01

130

Incremental Checkpointing with Application to Distributed Discrete Event Simulation

and the following companies: Agilent, DGIST, General Motors, Hewlett Packard, Infineon, Microsoft, and Toyota. #12, with which they can recover data from unintended destructive oper- ations, storage failure, or program crash Motors, Hewlett Packard, Infineon, Microsoft, and Toyota. #12;editing applications may attempt

131

Parallel algorithm strategies for circuit simulation.

Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

2010-01-01

132

Xyce parallel electronic simulator : reference guide.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

133

Parallel node placement method by bubble simulation

NASA Astrophysics Data System (ADS)

An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

2014-03-01

134

Massively Parallel Direct Simulation of Multiphase Flow

The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

2000-08-10

135

Decision Making in Fuzzy Discrete Event Systems1.

The primary goal of the study presented in this paper is to develop a novel and comprehensive approach to decision making using fuzzy discrete event systems (FDES) and to apply such an approach to real-world problems. At the theoretical front, we develop a new control architecture of FDES as a way of decision making, which includes a FDES decision model, a fuzzy objective generator for generating optimal control objectives, and a control scheme using both disablement and enforcement. We develop an online approach to dealing with the optimal control problem efficiently. As an application, we apply the approach to HIV/AIDS treatment planning, a technical challenge since AIDS is one of the most complex diseases to treat. We build a FDES decision model for HIV/AIDS treatment based on expert's knowledge, treatment guidelines, clinic trials, patient database statistics, and other available information. Our preliminary retrospective evaluation shows that the approach is capable of generating optimal control objectives for real patients in our AIDS clinic database and is able to apply our online approach to deciding an optimal treatment regimen for each patient. In the process, we have developed methods to resolve the following two new theoretical issues that have not been addressed in the literature: (1) the optimal control problem has state dependent performance index and hence it is not monotonic, (2) the state space of a FDES is infinite. PMID:19562097

Lin, F; Ying, H; Macarthur, R D; Cohn, J A; Barth-Jones, D; Crane, L R

2007-09-15

136

Expression-level Parallelism for Distributed Spice Circuit Simulation

Expression-level Parallelism for Distributed Spice Circuit Simulation Dylan Pfeifer Electrical in parallel with this method, with a selectable tradeoff in speed versus accuracy. Keywords: distributed Spice solution is achieved, it can be applied to conduct the multiple-simulator, parallel execution of a Spice

Gerstlauer, Andreas

137

Parallel gate-level circuit simulation on shared memory architectures

This paper presents the results of an experimental study to evaluate the effectiveness of parallel simulation in reducing the execution time of gate-level models of VLSI circuits. Specific contributions of this paper include (i) the design of a gate-level parallel simulator that can be executed, without any changes on both distributed memory and shared memory parallel architectures, (ii) demonstrated speedups

Rajive Bagrodia; Yu-an Chen; Vikas Jha; Nicki Sonpar

1995-01-01

138

Parallel and Distributed Multi-Algorithm Circuit Simulation

. Increased VLSI design complexity has made circuit simulation an ever growing bottleneck, making parallel processing an appealing solution for addressing this challenge. In this thesis, we propose and develop a parallel and distributed multi-algorithm...

Dai, Ruicheng

2012-10-19

139

Optimal Parametric Discrete Event Control: Problem and Solution

We present a novel optimization problem for discrete event control, similar in spirit to the optimal parametric control problem common in statistical process control. In our problem, we assume a known finite state machine plant model $G$ defined over an event alphabet $\\Sigma$ so that the plant model language $L = \\LanM(G)$ is prefix closed. We further assume the existence of a \\textit{base control structure} $M_K$, which may be either a finite state machine or a deterministic pushdown machine. If $K = \\LanM(M_K)$, we assume $K$ is prefix closed and that $K \\subseteq L$. We associate each controllable transition of $M_K$ with a binary variable $X_1,\\dots,X_n$ indicating whether the transition is enabled or not. This leads to a function $M_K(X_1,\\dots,X_n)$, that returns a new control specification depending upon the values of $X_1,\\dots,X_n$. We exhibit a branch-and-bound algorithm to solve the optimization problem $\\min_{X_1,\\dots,X_n}\\max_{w \\in K} C(w)$ such that $M_K(X_1,\\dots,X_n) \\models \\Pi$ and $\\LanM(M_K(X_1,\\dots,X_n)) \\in \\Con(L)$. Here $\\Pi$ is a set of logical assertions on the structure of $M_K(X_1,\\dots,X_n)$, and $M_K(X_1,\\dots,X_n) \\models \\Pi$ indicates that $M_K(X_1,\\dots,X_n)$ satisfies the logical assertions; and, $\\Con(L)$ is the set of controllable sublanguages of $L$.

Griffin, Christopher H [ORNL

2008-01-01

140

Parallel Numerical Simulations of Water Reservoirs

NASA Astrophysics Data System (ADS)

The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.

Torres, Pedro; Mangiavacchi, Norberto

2010-11-01

141

Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

Carr, Eric; Nicol, David M.

1994-01-01

142

SPITFIRE: scalable parallel algorithms for test set partitioned fault simulation

We propose three synchronous parallel algorithms for scalable parallel test set partitioned fault simulation. The algorithms are based on a new two-stage approach to par- allelizing fault simulation for sequential VLSI circuits in which the test set is partitioned among the available pro- cessors. The test set partitioning inherent in the algo- rithms overcomes the good circuit logic simulation bot-

Dilip Krishnaswamyt; Elizabeth M. Rudnickt; Janak H. Patel; Prithviraj Banerjeet

1997-01-01

143

Parallel Finite Element Simulation of Tracer Injection in Oil Reservoirs

Parallel Finite Element Simulation of Tracer Injection in Oil Reservoirs Alvaro L.G.A. Coutinho In this work, parallel finite element techniques for the simulation of tracer injection in oil reservoirs a renewed interest on the utilization of finite element approximations in reservoir simulations, mainly

Coutinho, Alvaro L. G. A.

144

A polymorphic reconfigurable emulator for parallel simulation

NASA Technical Reports Server (NTRS)

Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

1980-01-01

145

Parallel Proximity Detection for Computer Simulations

NASA Technical Reports Server (NTRS)

The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

1998-01-01

146

Parallel Proximity Detection for Computer Simulation

NASA Technical Reports Server (NTRS)

The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

1997-01-01

147

Parallel multiscale simulations of a brain aneurysm

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver ?? ?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( ?? ?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2012-01-01

148

Parallel multiscale simulations of a brain aneurysm

NASA Astrophysics Data System (ADS)

Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

2013-07-01

149

MAPS: multi-algorithm parallel circuit simulation

The emergence of multi-core and many-core processors has introduced new opportunities and challenges to EDA research and development. While the availability of increasing parallel computing power holds new promise to address many computing challenges in CAD, the leverage of hardware parallelism can only be possible with a new generation of parallel CAD applications. In this paper, we propose a novel

Xiaoji Ye; Wei Dong; Peng Li; Sani R. Nassif

2008-01-01

150

Parallel Monte Carlo simulation of multilattice thin film growth

This paper describe a new parallel algorithm for the multi-lattice Monte Carlo atomistic simulator for thin film deposition (ADEPT), implemented on parallel computer using the PVM (Parallel Virtual Machine) message passing library. This parallel algorithm is based on domain decomposition with overlapping and asynchronous communication. Multiple lattices are represented by a single reference lattice through one-to-one mappings, with resulting computational

J. W. Shu; Qin Lu; Wai-on Wong; Han-chen Huang

2001-01-01

151

Improving the performance of parallel relaxation-based circuit simulators

Describes methods of increasing parallelism, thereby improving the performance, of waveform relaxation-based parallel circuit simulators. The key contribution is the use of parallel nonlinear relaxation and parallel model evaluation to solve large subcircuits that may lead to load balancing problems. These large subcircuits are further partitioned and solved on clusters of tightly-coupled multiprocessors. This paper describes a general hybrid\\/hierarchical approach

Gih-guang Hung; Yen-cheng Wen; Kyle A. Gallivan; Resve A. Saleh

1993-01-01

152

Parallelization of Rocket Engine Simulator Software (PRESS)

NASA Technical Reports Server (NTRS)

Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be encapsulated in C + + code thereby facilitating reuse without undue development effort. The details are covered in the aforementioned section of the interim report filed on April 28, 1997.

Cezzar, Ruknet

1997-01-01

153

On Decentralized and Distributed Control of Partially-Observed Discrete Event Systems

This paper surveys recent work of the author with several collaborators, principally Feng Lin, Weilin Wang, and Tae-Sic Yoo;\\u000a they are kindly acknowledged. Decentralized control of discrete event systems, where local controllers cannot explicitly communicate\\u000a in real-time, is considered in the first part of the paper. Then the problem of real-time communication among a set of local\\u000a discrete-event controllers (or

Stéphane Lafortune

154

Parallel architecture for real-time simulation. Master's thesis

This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel

Cockrell

1989-01-01

155

PARASPICE: A Parallel Circuit Simulator for Shared-Memory Multiprocessors

This paper presents a general approach to parallelizing direct method circuit simulation. The approach extracts parallel tasks at the algorithmic level for each compute-intensive module and therefore is suitable for a wide range of shared-memory multiprocessors. The implementation of the approach in SPICE2 resulted in a portable parallel direct circuit simulator, PARASPICE. The superior performance of PARASPICE is demonstrated on

Gung-chung Yang

1990-01-01

156

Partitioning strategies for parallel KIVA-4 engine simulations

Parallel KIVA-4 is described and simulated in four different engine geometries. The Message Passing-Interface (MPl) was used to parallelize KIVA-4. Par itioning strategies ar accesed in light of the fact that cells can become deactivated and activated during the course of an engine simulation which will affect the load balance between processors.

Torres, D J [Los Alamos National Laboratory; Kong, S C [IOWA STATE UNIV

2008-01-01

157

Parallel Monte Carlo Ion Recombination Simulation in Orca

Parallel Monte Carlo Ion Recombination Simulation in Orca Frank J. Seinstra Department. This report describes the implementation in Orca of a realistic Monte Carlo simulation of the recombinations. Keywords: Parallel computing, Orca, Ethernet, Myrinet, Monte Carlo simÂ ulation, ion recombination

Seinstra, Frank J.

158

Parallel magnetic field perturbations in gyrokinetic simulations

At low beta it is common to neglect parallel magnetic field perturbations on the basis that they are of order beta{sup 2}. This is only true if effects of order beta are canceled by a term in the nablaB drift also of order beta[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B{sub ||} is required for accurate gyrokinetic simulations at low beta for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B{sub ||} are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with eta{sub e}. The effect of B{sub ||} on the ETG instability is shown to depend on the normalized pressure gradient beta{sup '} at constant beta.

Joiner, N.; Hirose, A. [Department of Physics and Engineering Physics, University of Saskatchewan, Saskatoon, Saskatchewan S7N 5E2 (Canada); Dorland, W. [University of Maryland, College Park, Maryland 20742 (United States)

2010-07-15

159

A Parallel and Accelerated Circuit Simulator with Precise Accuracy

We have developed a hi ghly parallel and accelerated circuit simulator which produces precise results for large scale simulation. We incorporated multithreading in both the model and matrix calculations to achieve not only a factor of 10 acceleration compared to the defacto standard circuit simulator used worldwide, but also equal or exceed the performance of timing-based event -driven simulators with

Peter M. Lee; Shinji Ito; Takeaki Hashimoto; Tomomasa Touma; Junji Sato; Goichi Yokomizo; Ic

2002-01-01

160

Hybrid particle-field molecular dynamics simulations: parallelization and benchmarks.

The parallel implementation of a recently developed hybrid scheme for molecular dynamics (MD) simulations (Milano and Kawakatsu, J Chem Phys 2009, 130, 214106) where self-consistent field theory (SCF) and particle models are combined is described. Because of the peculiar formulation of the hybrid method, considering single particles interacting with density fields, the most computationally expensive part of the hybrid particle-field MD simulation can be efficiently parallelized using a straightforward particle decomposition algorithm. Benchmarks of simulations, including comparisons of serial MD and MD-SCF program profiles, serial MD-SCF and parallel MD-SCF program profiles, and parallel benchmarks compared with efficient MD program GROMACS 4.5.4 are tested and reported. The results of benchmarks indicate that the proposed parallelization scheme is very efficient and opens the way to molecular simulations of large scale systems with reasonable computational costs. PMID:22278759

Zhao, Ying; De Nicola, Antonio; Kawakatsu, Toshihiro; Milano, Giuseppe

2012-03-30

161

An efficient parallel model for coastal transport process simulation

A three-dimensional (3D) parallel model for efficient simulation of sediment–water transport processes in coastal regions is introduced in this paper with a main focus on the parallel architecture of the model. The model’s parallel efficiency is maximized in two steps. First, a fully parallelizable hybrid operator splitting numerical technique is applied to discretize the governing equations of the model. Within

Onyx Wing-Hong Wai; Qimiao Lu

2000-01-01

162

Parallel transistor level circuit simulation using domain decomposition methods

This paper presents an efficient parallel transistor level full-chip circuit simulation tool with SPICE-accuracy. The new approach partitions the circuit into a linear domain and several non-linear domains based on circuit non-linearity and connectivity. The linear domain is solved by parallel fast linear solver while nonlinear domains are parallelly distributed into different processors and solved by direct solver. Parallel domain

He Peng; Chung-kuan Cheng

2009-01-01

163

Parallel architecture for real-time simulation. Master's thesis

This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel architecture based upon The University of Alabama's OPERA architecture. Each component of this system is discussed and rationale presented for its selection. The problem selected, real-time simulation of the Space Shuttle Main Engine for the test and evaluation of the proposed architecture, is explored, identifying the areas where parallelism can be exploited and parallel processing applied. Results from the test and evaluation phase are presented and compared with the results of the same problem that has been processed on a uniprocessor system.

Cockrell, C.D.

1989-01-01

164

Parallel canonical Monte Carlo simulations through sequential updating of particles

In canonical Monte Carlo simulations, sequential updating of particles is equivalent to random updating due to particle indistinguishability. In contrast, in grand canonical Monte Carlo simulations, sequential implementation of the particle transfer steps in a dense grid of distinct points in space improves both the serial and the parallel efficiency of the simulation. The main advantage of sequential updating in

C. J. O'Keeffe; G. Orkoulas

2009-01-01

165

New Iterative Linear Solvers For Parallel Circuit Simulation

This thesis discusses iterative linear solvers for parallel transient analysis of large scale logic circuits. Theincreasing importance of large scale circuit simulation is the driving force of the researches on efficient parallelcircuit simulation. The most time consuming part of circuit transient analysis is the model evaluation, andthe next is the linear solver, which takes about 1\\/5 of simulation time. Although

Reiji Suda

1996-01-01

166

Development of a Massively-Parallel, Biological Circuit Simulator

Genetic expression and control pathways can be successfully modeled as electrical circuits. Given the vast quantity of genomic data, very large and complex genetic circuits can be constructed. To tackle such problems, the massively-parallel, electronic circuit simulator, Xyce™, is being adapted to address biological problems. Unique to this biocircuit simulator is the ability to simulate not just one or a

Richard L. Schiek; Elebeoba E. May

2003-01-01

167

Parallelizing Circuit Simulation - A Combined Algorithmic And Specialized Hardware Approach

Accurate performance estimation of high-density integrated circuits requires the kind of detailed numerical simulation performed in programs like ASTAP[1] and SPICE[2]. Because of the large computation time required for such prograins when applied to large circuits, accelerating numerical simulation is an important problem. Parallel processing promises to be a viable approach to accclerating the simulation of large circuits. This paper

Jacob White; Nicholas Weiner

168

DCCB and SCC Based Fast Circuit Partition Algorithm For Parallel SPICE Simulation

DCCB and SCC Based Fast Circuit Partition Algorithm For Parallel SPICE Simulation Xiaowei Zhou, Yu facing VLSI circuits for parallel simulation. This paper presents an efficient circuit partition algorithm specially designed for VLSI circuit partition and parallel simulation. The algorithm

Wang, Yu

169

PARALLEL COMPUTER SIMULATION TECHNIQUES FOR THE STUDY OF MACROMOLECULES

PARALLEL COMPUTER SIMULATION TECHNIQUES FOR THE STUDY OF MACROMOLECULES Mark R. Wilson and Jaroslav years two important developments in computing have occurred. At the high-cost end of the scale, supercomputers have become parallel comput- ers. The ultra-fast (specialist) processors and the expensive vector-computers

Wilson, Mark R.

170

Development of parallelism for circuit simulation by tearing

A hierarchical clustering with min-cut exchange method for parallel circuit simulation is presented. Partitioning into subcircuits is near optimum in terms of distribution of computational cost and does not sacrifice the sparsity of the entire matrix. In order to compute the arising dense interconnection matrix in parallel, multilevel and distributed row-base dissection algorithms are used. A processing speed up of

H. Onozuka; M. Kanoh; C. Mizuta; T. Nakata; N. Tanabe

1993-01-01

171

Parallel Algorithms for Time and Frequency Domain Circuit Simulation

device model evaluation and matrix solutions. This dissertation also exploits the recently developed explicit telescopic projective integration method for efficient parallel transient circuit simulation by addressing the stability limitation of explicit...

Dong, Wei

2010-10-12

172

Xyce Parallel Electronic Simulator : users' guide, version 4.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-02-01

173

Xyce parallel electronic simulator : users' guide. Version 5.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-11-01

174

SPECIFICATION OF DISCRETE EVENT MODELS FOR FIRE SPREADING

The fire spreading phenomenon is highly complex, and existing mathematical models of fire are so complex themselves, that any possibility of analytical solution is precluded. Instead, there has been some success when studying fire spread by means of simulation. However, precise and reliable mathematical models are still under development. They require extensive com- puting resources, being adequate to run in

Alexandre Muzy; Eric Innocenti; Antoine Aiello; Jean-François Santucci

175

Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

176

Accelerating Quantum Computer Simulation via Parallel Eigenvector Computation

Quantum-dot cellular automata (QDCA) hold great potential to produce the next generation of computer hardware, but their development is hindered by computationally intensive simulations. Our research therefore focuses on rewriting one such simulation to run parallel calculations on a graphics processing unit (GPU). We have decreased execution time from 33 hours 11 minutes to 1 hour 39 minutes, but current

Karl Stathakis

2011-01-01

177

Parallel simulation of strong ground motions during recent and historical

Parallel simulation of strong ground motions during recent and historical damaging earthquakes such as the Earth Simulator supercomputer and the deployment of dense networks of strong ground motion instruments; Seismic wave; Strong ground motion 1. Introduction The Tokyo metropolitan area is located in a very

Furumura, Takashi

178

The virtual marathon: parallel computing supports crowd simulations.

To be realistic, an urban model must include appropriate numbers of pedestrians, vehicles, and other dynamic entities. Using a parallel-computing architecture, researchers simulated a marathon with more than a million participants. To simulate participant behavior, they used fuzzy logic on a GPU to perform millions of inferences in real time. PMID:19798860

Yilmaz, Erdal; Isler, Veysi; Cetin, Yasemin Yardimci

2009-01-01

179

Exploiting model independence for parallel PCS network simulation

In this paper, we present a parallel simulator (SWiMNet) for PCS networks using a combination of optimistic and conservative paradigms. The proposed methodology exploits event precomputation permitted by model independence within the PCS components. The low percentage of blocked calls is exploited in the channel allocation simulation of precomputed events by means of an optimistic approach. %To illustrate and verify

Azzedine Boukerche; Sajal K. Das; Alessandro Fabbri; Oktay Yildiz

1999-01-01

180

A survey of simulation optimization techniques and procedures

Discrete event simulation optimization is a problem of significant interest to practitioners interested in extracting useful information about an actual (or yet to be designed) system that can be modeled using discrete event simulation. This paper presents a brief survey of the literature on discrete event simulation optimization over the past decade (1988 to the present). Swisher et al. (2000)

James R. Swisher; Paul D. Hyden; S. H. Jacobson; L. W. Schruben

2000-01-01

181

Automated Control Synthesis for an Assembly Line using Discrete Event System Control Theory

1 Automated Control Synthesis for an Assembly Line using Discrete Event System Control Theory even fail at times. Super- visory control theory (SCT) provides a formal approach to logic control synthesis. In order to demonstrate the useful- ness of the supervisory control theory in manufacturing sys

Kumar, Ratnesh

182

MODULAR SUPERVISORY CONTROL OF A CLASS OF CONCURRENT DISCRETE EVENT SYSTEMS

In this paper, we are interested in the control of a particular class of Concurrent Discrete Event Systems defined by a collection of components that interact with each other. We investigate the computation of the supremal controllable language contained in the one of the specification. We do not adopt the decentralized approach. Instead, we have chosen to perform the control

B. Gaudin; H. Marchand

2004-01-01

183

Fault Detection and Isolation in Manufacturing Systems with an Identified Discrete Event Model

Fault Detection and Isolation in Manufacturing Systems with an Identified Discrete Event Model) In this paper a generic method for fault detection and isolation (FDI) in manufacturing systems considered and controller built on the basis of observed fault free system behavior. An identification algorithm known from

Paris-Sud XI, UniversitÃ© de

184

A large scale discrete event model of a distribution center is presented where critical parameters identified were utilized to create a real-time scheduling control policy to improve throughput and redistribute work in process (WIP), reducing bottleneck process and overall WIP. In implementing the control policy, it was found that the data was queried multiple times at the same time stamp

Francesca Schuler; Houshang Darabi

2010-01-01

185

[ATP]i in Limulus photoreceptors: no correlation with responsiveness or discrete event rate.

Firefly luciferin-luciferase was microinjected into single, intact photoreceptor cells of Limulus ventral eyes. In addition, total ATP content of whole end organs was measured. An attempt was made to correlate intracellular ATP levels with either the frequency of discrete events or responsiveness to light. Luciferin-luciferase luminescence emitted from a single photoreceptor cell increased after an injection of ATP or P[NH]P and decreased after injection of apyrase. We conclude that the luciferin-luciferase luminescence is monitoring increases as well as decreases in intracellular ATP concentration (ATPi). The ATPi was between 10(-3) and 10(-4) M. Bathing ventral eyes in vanadate or fluoride increased ATP content of whole end organs, increased ATPi in single photoreceptors, and increased the frequency of discrete events. However, apyrase decreased ATPi but did not affect the frequency of discrete events or responsiveness to light until ATPi was dramatically reduced. We conclude that changes in ATP levels are not strictly correlated with changes in responsiveness to light or to changes in the frequency of discrete events. PMID:2827510

Rubin, L J; Brown, J E

1988-01-01

186

Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems

We study the design of distributed control for discrete-event systems (DES) in the framework of supervisory control theory. We view a DES as comprised of a group of agents, acting independently except for specifications on global (group) behavior. The central problem investigated is how to synthesize local controllers for individual agents such that the resultant controlled behavior is identical with

Kai Cai; W. M. Wonham

2010-01-01

187

All from One, One for All, Failure Diagnosis of Discrete Event Systems Using Representatives

Failure diagnosis in large and complex systems is a crit- ical and challenging task. In the realm of model based diag- nosis on discrete event systems, computing a failure diagno - sis means computing the set of system behaviours that could explain observations. Depending on the diagnosed system, such behaviours can be numerous, so that a problem of rep- resenting

Yannick Pencolé

2003-01-01

188

Diagnosis of asynchronous discrete-event systems: a net unfolding approach

In this paper, we consider the diagnosis of asynchronous discrete event systems. We follow a so-called true concurrency approach, in which no global state and no global time is available. Instead, we use only local states in combination with a partial order model of time. Our basic mathematical tool is that of net unfoldings originating from the Petri net research

ALBERT BENVENISTE; ERIC FABRE; Stefan Haar; Claude Jard

2003-01-01

189

Parallelization of sequential Gaussian, indicator and direct simulation algorithms

NASA Astrophysics Data System (ADS)

Improving the performance and robustness of algorithms on new high-performance parallel computing architectures is a key issue in efficiently performing 2D and 3D studies with large amount of data. In geostatistics, sequential simulation algorithms are good candidates for parallelization. When compared with other computational applications in geosciences (such as fluid flow simulators), sequential simulation software is not extremely computationally intensive, but parallelization can make it more efficient and creates alternatives for its integration in inverse modelling approaches. This paper describes the implementation and benchmarking of a parallel version of the three classic sequential simulation algorithms: direct sequential simulation (DSS), sequential indicator simulation (SIS) and sequential Gaussian simulation (SGS). For this purpose, the source used was GSLIB, but the entire code was extensively modified to take into account the parallelization approach and was also rewritten in the C programming language. The paper also explains in detail the parallelization strategy and the main modifications. Regarding the integration of secondary information, the DSS algorithm is able to perform simple kriging with local means, kriging with an external drift and collocated cokriging with both local and global correlations. SIS includes a local correction of probabilities. Finally, a brief comparison is presented of simulation results using one, two and four processors. All performance tests were carried out on 2D soil data samples. The source code is completely open source and easy to read. It should be noted that the code is only fully compatible with Microsoft Visual C and should be adapted for other systems/compilers.

Nunes, Ruben; Almeida, José A.

2010-08-01

190

Broadband monitoring simulation with massively parallel processors

NASA Astrophysics Data System (ADS)

Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

2011-09-01

191

Improved task scheduling for parallel simulations. Master's thesis

The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.

McNear, A.E.

1991-12-01

192

Direct simulation Monte Carlo analysis on parallel processors

NASA Technical Reports Server (NTRS)

A method is presented for executing a direct simulation Monte Carlo (DSMC) analysis using parallel processing. The method is based on using domain decomposition to distribute the work load among multiple processors, and the DSMC analysis is performed completely in parallel. Message passing is used to transfer molecules between processors and to provide the synchronization necessary for the correct physical simulation. Benchmark problems are described for testing the method and results are presented which demonstrate the performance on two commercially available multicomputers. The results show that reasonable parallel speedup and efficiency can be obtained if the problem is properly sized to the number of processors. It is projected that with a massively parallel system, performance exceeding that of current supercomputers is possible.

Wilmoth, Richard G.

1989-01-01

193

Scalable heterogeneous parallelism for atmospheric modeling and simulation

Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing\\u000a systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale\\u000a scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional\\u000a atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA).

John C. Linford; Adrian Sandu

2011-01-01

194

GTC++: An Object-Oriented, Parallel, Gyrokinetic PIC Simulation

We describe GTC++, a global-cross-section, gyrokinetic, PIC code for simulating tokamak microinstabilities. In GTC++, we use objects to represent physical and numerical abstractions such as the magnetic geometry and the gyrokinetic particle-mesh interactions. The basic software infrastructure is the POOMA Framework (Parallel Object-Oriented Methods and Applications). This C++ class library provides high-level data-parallel programming interfaces for particles and for fields

Timothy J. Williams; James A. Crotinger; Julian C. Cummings; Zhihong Lin

1998-01-01

195

In this paper, we investigate new parallel algorithms for sequential circuit fault simulation using overlapping test set partitions.We propose six parallel algorithms for scalable parallel test set partitioned fault simulation (SPITFIRE). The test setpartitioning inherent in the algorithms overcomes the good circuit logic simulation bottleneck that exists in traditional faultpartitioned approaches to parallel fault simulation. Since the test sequence is

Dilip Krishnaswamy; Elizabeth Rudnick; Prithviraj Banerjee; Janak H. Patel

1997-01-01

196

. A simulation technique for very large-scale data parallel programsis proposed. In our simulation method, a data parallel program isdivided into computation and communication sections. When the controlflow of the parallel program does not depend on the contents of networkmessages, the computation time on each processor is calculated independently.An instrumentation tool called EXCIT is used to calculate theexecution time on

Kazuto Kubota; Ken’ichi Itakura; Mitsuhisa Sato; Taisuke Boku

1998-01-01

197

Fault Diagnosis of Continuous Systems Using Discrete-Event Methods Matthew Daigle, Xenofon.j.daigle,xenofon.koutsoukos,gautam.biswas@vanderbilt.edu Abstract-- Fault diagnosis is crucial for ensuring the safe operation of complex engineering systems fault isolation in systems with complex continuous dynamics. This paper presents a novel discrete- event

Koutsoukos, Xenofon D.

198

A hybrid parallel framework for the cellular Potts model simulations

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

2009-01-01

199

RÃ¶nngren, Michael Liljenstam Johan Montagnat and Rassul Ayani Ecole Normale Superieure de Cachan Email state resto- ration in case of roll back. Furthermore, it is often a requirement that this mechanism in case of roll- back. The implementation of the state saving and restoration mechanism is in many systems

Boyer, Edmond

200

Parallelization of Rocket Engine Simulator Software (PRESS)

NASA Technical Reports Server (NTRS)

We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10/18/99). At the least, the research would need to be done on Windows 95/Windows NT based platforms. Moreover, with the acquisition of Lahey Fortran package for PC platform, and the existing Borland C + + 5. 0, we can do work on C + + wrapper issues. We have carefully studied the blueprint for Space Transportation Propulsion Integrated Design Environment for the next 25 years [13] and found the inclusion of HBCUs in that effort encouraging. Especially in the long period for which a map is provided, there is no doubt that HBCUs will grow and become better equipped to do meaningful research. In the shorter period, as was suggested in our presentation at the HBCU conference, some key decisions regarding the aging Fortran based software for rocket propellants will need to be made. One important issue is whether or not object oriented languages such as C + + or Java should be used for distributed computing. Whether or not "distributed computing" is necessary for the existing software is yet another, larger, question to be tackled with.

Cezzar, Ruknet

1998-01-01

201

A parallel implementation of fault simulation on a cluster of workstations

A cluster of workstations may be employed for reducing fault simulation time greatly. Fault simulation can be parallelized by partitioning fault list, the test vector or both. In this study, parallel fault simulation algorithm called PAUSIM has been developed by parallelizing AUSUM which consists of logic simulation and two steps of fault simulation for sequential logic circuits. Compared to other

Kyunghwan Han; Soo-young Lee

2008-01-01

202

Xyce Parallel Electronic Simulator : reference guide, version 2.0.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

203

Parallelism in interactive operations in finite-element simulation

The parallelizing of interactive operations that are part of the finite-element simulation of electromagnetic fields is examined. The total solution time in finite-element analysis is the time assigned to (1) preprocessing, (2) assembling and solving the matrix equation, and (3) postprocessing the solution. In the analysis the tasks of pre- and postprocessing are interactive, with the user sitting at a

S. R. H. Hoole; G. Mahinthakumar

1990-01-01

204

Efficient Parallel Simulations in Support of Medical Device Design

A parallel solver for incompressible fluid flow simulation, u sed in biomedical device design among other applications, is dis- cussed. The major compute- and communication-intensive portions of the code are described. Using unsteady flow in a comp lex implantable axial blood pump as a model problem, scalability characteristics of the solver are briefly examined. The cod e that ex- hibited

Marek Behr; Mike Nicolai; Markus Probst

2007-01-01

205

Time parallelization of plasma simulations using the parareal algorithm

Simulation of fusion plasmas involve a broad range of timescales. In magnetically confined plasmas, such as in ITER, the timescale associated with the microturbulence responsible for transport and confinement timescales vary by an order of 10^6 10^9. Simulating this entire range of timescales is currently impossible, even on the most powerful supercomputers available. Space parallelization has so far been the most common approach to solve partial differential equations. Space parallelization alone has led to computational saturation for fluid codes, which means that the walltime for computaion does not linearly decrease with the increasing number of processors used. The application of the parareal algorithm to simulations of fusion plasmas ushers in a new avenue of parallelization, namely temporal parallelization. The algorithm has been successfully applied to plasma turbulence simulations, prior to which it has been applied to other relatively simpler problems. This work explores the extension of the applicability of the parareal algorithm to ITER relevant problems, starting with a diffusion-convection model.

Samaddar, D. [ITER Organization, Saint Paul Lez Durance, France; Houlberg, Wayne A [ORNL; Berry, Lee A [ORNL; Elwasif, Wael R [ORNL; Huysmans, G [ITER Organization, Saint Paul Lez Durance, France; Batchelor, Donald B [ORNL

2011-01-01

206

Parallel electronic circuit simulation on the iPSC system

A parallel circuit simulator was implemented on the iPSC system. Concurrent model evaluation, hierarchical BBDF (bordered block diagonal form) reordering, and distributed multifrontal decomposition to solve the sparse matrix are used. A speedup of six times has been achieved on an eight-processor iPSC hypercube system

C.-P. Yuan; R. Lucas; P. Chan; R. Dutton

1988-01-01

207

Verifying Ptolemy II Discrete-Event Models Using Real-Time Maude

This paper shows how Ptolemy II discrete-event (DE) models can be formally analyzed using Real-Time Maude. We formalize in\\u000a Real-Time Maude the semantics of a subset of hierarchical Ptolemy II DE models, and explain how the code generation infrastructure\\u000a of Ptolemy II has been used to automatically synthesize a Real-Time Maude verification model from a Ptolemy II design model.\\u000a This

Kyungmin Bae; Peter Csaba Ölveczky; Thomas Huining Feng; Stavros Tripakis

2009-01-01

208

Manufacturing Cell Supervisory Control - A Modular Timed Discrete-Event System Approach

An approach to the design of modular supervisory control strategies based on a framework for the modeling and supervisory control of discrete-event systems is presented. As for centralized supervision, the approach allows the consideration of logic-based specifications, control-enforcement-related constraints, and temporal and utility optimality behavioral specifications. It yields modular supervisory control strategies that are least restrictive within given specifications, and

B. A. Brandin; W. Murray Wonham; Beno Benhabib

1993-01-01

209

Reusable Component Model Development Approach for Parallel and Distributed Simulation

Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

2014-01-01

210

Reusable component model development approach for parallel and distributed simulation.

Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

2014-01-01

211

PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.

Joshi, Abhijit S [ORNL] [ORNL; Jain, Prashant K [ORNL] [ORNL; Mudrich, Jaime A [ORNL] [ORNL; Popov, Emilian L [ORNL] [ORNL

2012-01-01

212

Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

213

Potts-model grain growth simulations: Parallel algorithms and applications

Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.

Wright, S.A.; Plimpton, S.J.; Swiler, T.P. [and others

1997-08-01

214

Xyce Parallel Electronic Simulator Users Guide Version 6.2.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce%40sandia.gov (outside Sandia) xyce-sandia%40sandia.gov (Sandia only)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

2014-09-01

215

NASA Technical Reports Server (NTRS)

Discrete event-driven simulation makes it possible to model a computer system in detail. However, such simulation models can require a significant time to execute. This is especially true when modeling large parallel or distributed systems containing many processors and a complex communication network. One solution is to distribute the simulation over several processors. If enough parallelism is achieved, large simulation models can be efficiently executed. This study proposes a distributed simulator called DSIM which can run on various architectures. A simulated test environment is used to verify and characterize the performance of DSIM. The results of the experiments indicate that speedup is application-dependent and, in DSIM's case, is also dependent on how the simulation model is distributed among the processors. Furthermore, the experiments reveal that the communication overhead of ethernet-based distributed systems makes it difficult to achieve reasonable speedup unless the simulation model is computation bound.

Goswami, Kumar K.; Iyer, Ravishankar K.

1990-01-01

216

Adaptive domain decomposition for Monte Carlo simulations on parallel processors

NASA Technical Reports Server (NTRS)

A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefield flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.

Wilmoth, Richard G.

1990-01-01

217

Adaptive domain decomposition for Monte Carlo simulations on parallel processors

NASA Technical Reports Server (NTRS)

A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefied flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.

Wilmoth, Richard G.

1991-01-01

218

Simulation optimization: a survey of simulation optimization techniques and procedures

Discrete-event simulation optimization is a problem of significant interest to practitioners interested in extracting useful information about an actual (or yet to be designed) system that can be modeled using discrete-event simulation. This paper presents a brief survey of the literature on discrete-event simulation optimization over the past decade (1988 to the present). Swisher et al. (2000) provides a more

James R. Swisher; Paul D. Hyden; Sheldon H. Jacobson; Lee W. Schruben

2000-01-01

219

Parallel density matrix propagation in spin dynamics simulations.

Several methods for density matrix propagation in parallel computing environments are proposed and evaluated. It is demonstrated that the large communication overhead associated with each propagation step (two-sided multiplication of the density matrix by an exponential propagator and its conjugate) may be avoided and the simulation recast in a form that requires virtually no inter-thread communication. Good scaling is demonstrated on a 128-core (16 nodes, 8 cores each) cluster. PMID:22299862

Edwards, Luke J; Kuprov, Ilya

2012-01-28

220

Parallel algorithms for simulating continuous time Markov chains

NASA Technical Reports Server (NTRS)

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

Nicol, David M.; Heidelberger, Philip

1992-01-01

221

Scene projector and Beowulf parallel microprocessor for modeling and simulation

NASA Astrophysics Data System (ADS)

Embedded training is to enhance and maintain the skill proficiency of fleet/armor personnel in taking advantage of the capabilities built into or added onto operational systems, subsystems, or equipment. Physical Optics Corporation (POC) is developing a new scene projector system (collimating display system for out-the window) for simulation applications, where it can fully be integrated into tanks, automobiles, submarines, and other vehicles. This concept integrates the advanced holographic technology with Beowulf computer-cluster highly parallel microprocessors.

Yu, Kevin H.; Kostrzewski, Andrew A.; Aye, Tin M.; Kupiec, Stephen A.; Jannson, Tomasz P.; Savant, Gajendra D.

2003-09-01

222

WIPPET, a virtual testbed for parallel simulations of wireless networks

We describe the TED\\/C++ implementation of WIPPET, a parallel simulation testbed for evaluating radio resource management algorithms and wireless transport protocols. Versions 0.3 and 0.4 of the testbed model radio propagation (long- and short-scale fading and interference) and protocols for integrated radio resource management in mobile wireless voice networks including the standards based AMPS, NA-TDMA and GSM protocols, and several

Jignesh Panchal; Owen Kelly; Jie Lai; Narayan Mandayam; Andrew T. Ogielski; Roy Yates

1998-01-01

223

NASA Technical Reports Server (NTRS)

The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

1994-01-01

224

Sequential Window Diagnoser for Discrete-Event Systems Under Unreliable Observations

This paper addresses the issue of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). Developed diagnosers referred to as sequential window diagnosers (SWDs) utilize the stochastic diagnoser probability transition matrices developed in [9] along with a resetting mechanism that allows on-line monitoring of special event occurrences. To illustrate their performance, the SWDs are applied to detect and count the occurrence of special events in a particular DEDS. Results show that SWDs are able to accurately track the number of times special events occur.

Wen-Chiao Lin; Humberto E. Garcia; David Thorsley; Tae-Sic Yoo

2009-09-01

225

Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

2005-06-01

226

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.

Rostrup, Scott; De Sterck, Hans

2010-12-01

227

A massively parallel cellular automaton for the simulation of recrystallization

NASA Astrophysics Data System (ADS)

A new implementation of a cellular automaton for the simulation of primary recrystallization in 3D space is presented. In this new approach, a parallel computer architecture is utilized to partition the simulation domain into multiple computational subdomains that can be treated as coupled, gradually coupled or decoupled entities. This enabled us to identify the characteristic growth length associated with the space repartitioning during nucleus growth. In doing so, several communication strategies between the simulation domains were implemented and tested for accuracy and parallel performance. Specifically, the model was applied to investigate the effect of a gradual spatial decoupling on microstructure evolution during oriented growth of random texture components into a deformed Al single crystal. For a domain discretized into one billion cells, it was found that a particular decoupling strategy resulted in faster executions of about two orders of magnitude and highly accurate simulations. Further partition of the domain into isolated entities systematically and negatively impacts microstructure evolution. We investigated this effect quantitatively by geometrical considerations.

Kühbach, M.; Barrales-Mora, L. A.; Gottstein, G.

2014-10-01

228

Long-range interactions & parallel scalability in molecular simulations

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modelling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes - we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e., communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

Michael Patra; Marja T. Hyvonen; Emma Falck; Mohsen Sabouri-Ghomi; Ilpo Vattulainen; Mikko Karttunen

2004-10-08

229

LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

Plimpton, Steve; Thompson, Aidan; Crozier, Paul

230

Conservative parallel simulation of priority class queueing networks

NASA Technical Reports Server (NTRS)

A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

Nicol, David M.

1990-01-01

231

A fast ultrasonic simulation tool based on massively parallel implementations

NASA Astrophysics Data System (ADS)

This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.

Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain

2014-02-01

232

NASA Technical Reports Server (NTRS)

The development process for a large software development project is very complex and dependent on many variables that are dynamic and interrelated. Factors such as size, productivity and defect injection rates will have substantial impact on the project in terms of cost and schedule. These factors can be affected by the intricacies of the process itself as well as human behavior because the process is very labor intensive. The complex nature of the development process can be investigated with software development process models that utilize discrete event simulation to analyze the effects of process changes. The organizational environment and its effects on the workforce can be analyzed with system dynamics that utilizes continuous simulation. Each has unique strengths and the benefits of both types can be exploited by combining a system dynamics model and a discrete event process model. This paper will demonstrate how the two types of models can be combined to investigate the impacts of human resource interactions on productivity and ultimately on cost and schedule.

Mizell, Carolyn Barrett; Malone, Linda

2007-01-01

233

MRISIMUL: a GPU-based parallel approach to MRI simulations.

A new step-by-step comprehensive MR physics simulator (MRISIMUL) of the Bloch equations is presented. The aim was to develop a magnetic resonance imaging (MRI) simulator that makes no assumptions with respect to the underlying pulse sequence and also allows for complex large-scale analysis on a single computer without requiring simplifications of the MRI model. We hypothesized that such a simulation platform could be developed with parallel acceleration of the executable core within the graphic processing unit (GPU) environment. MRISIMUL integrates realistic aspects of the MRI experiment from signal generation to image formation and solves the entire complex problem for densely spaced isochromats and for a densely spaced time axis. The simulation platform was developed in MATLAB whereas the computationally demanding core services were developed in CUDA-C. The MRISIMUL simulator imaged three different computer models: a user-defined phantom, a human brain model and a human heart model. The high computational power of GPU-based simulations was compared against other computer configurations. A speedup of about 228 times was achieved when compared to serially executed C-code on the CPU whereas a speedup between 31 to 115 times was achieved when compared to the OpenMP parallel executed C-code on the CPU, depending on the number of threads used in multithreading (2-8 threads). The high performance of MRISIMUL allows its application in large-scale analysis and can bring the computational power of a supercomputer or a large computer cluster to a single GPU personal computer. PMID:24595337

Xanthis, Christos G; Venetis, Ioannis E; Chalkias, A V; Aletras, Anthony H

2014-03-01

234

Development of magnetron sputtering simulator with GPU parallel computing

NASA Astrophysics Data System (ADS)

Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that initially both simulations are in good agreement; however, differences develop over time due to statistical noise in the PIC-MCC GPGPU model.

Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil

2014-12-01

235

A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

Rao, Hariprasad Nannapaneni

1989-01-01

236

Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

NASA Technical Reports Server (NTRS)

This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.

Kiris, Cetin C.; Kwak, Dochan; Chan, William

2000-01-01

237

Parallel Multiscale Algorithms for Astrophysical Fluid Dynamics Simulations

NASA Technical Reports Server (NTRS)

Our goal is to develop software libraries and applications for astrophysical fluid dynamics simulations in multidimensions that will enable us to resolve the large spatial and temporal variations that inevitably arise due to gravity, fronts and microphysical phenomena. The software must run efficiently on parallel computers and be general enough to allow the incorporation of a wide variety of physics. Cosmological structure formation with realistic gas physics is the primary application driver in this work. Accurate simulations of e.g. galaxy formation require a spatial dynamic range (i.e., ratio of system scale to smallest resolved feature) of 104 or more in three dimensions in arbitrary topologies. We take this as our technical requirement. We have achieved, and in fact, surpassed these goals.

Norman, Michael L.

1997-01-01

238

sik A Micro-Kernel for Parallel/Distributed Simulation Systems Kalyan S. Perumalla

Page 1 Âµsik Â A Micro-Kernel for Parallel/Distributed Simulation Systems Kalyan S. Perumalla kalyan micro-kernel approach to building parallel/distributed simulation systems. Using this approach, we of this interface in Âµsik, which is an efficient parallel/distributed realization of our micro-kernel architecture

Tropper, Carl

239

Conjugate gradient methods for power system dynamic simulation on parallel computers

Parallel processing is a promising technology for the speedup of the dynamic simulations required in power system transient stability analysis. In this paper, three methods for dynamic simulation on parallel computers are described and compared. The methods are based on the concepts of spatial and\\/or time parallelization. In all of them, sets of linear algebraic equations are solved using different

I. C. Decker; D. M. Falcao; E. Kaszkurewicz

1996-01-01

240

The authors present results of tests with a parallel implementation of a power system dynamic simulation methodology for transient stability analysis in a parallel computer. The test system is a planned configuration of the interconnected Brazilian South-Southeastern power system with 616 buses, 995 lines, and 88 generators. The parallel machine used in the computer simulation is a distributed memory multiprocessor

I. C. Decker; D. M. Falcao; E. Kaszkurewicz

1992-01-01

241

Results of tests with a parallel implementation of a power system dynamic simulation methodology for transient stability analysis in a parallel computer are presented. The test system is a planned configuration of the interconnected Brazilian South-Southeastern power system with 616 buses, 995 lines, and 88 generators. The parallel machine used in the computer simulation is a distributed memory multiprocessor arranged

I. C. Decker; D. M. Falcao; E. Kaszkurewicz

1991-01-01

242

Safety analysis of discrete event systems using a simplified Petri net controller.

This paper deals with the problem of forbidden states in discrete event systems based on Petri net models. So, a method is presented to prevent the system from entering these states by constructing a small number of generalized mutual exclusion constraints. This goal is achieved by solving three types of Integer Linear Programming problems. The problems are designed to verify the constraints that some of them are related to verifying authorized states and the others are related to avoiding forbidden states. The obtained constraints can be enforced on the system using a small number of control places. Moreover, the number of arcs related to these places is small, and the controller after connecting them is maximally permissive. PMID:24074873

Zareiee, Meysam; Dideban, Abbas; Asghar Orouji, Ali

2014-01-01

243

Parallel continuous simulated tempering and its applications in large-scale molecular simulations

NASA Astrophysics Data System (ADS)

In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

2014-07-01

244

Parallel grid library for rapid and flexible simulation development

NASA Astrophysics Data System (ADS)

We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.

Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

2013-04-01

245

Ion dynamics at supercritical quasi-parallel shocks: Hybrid simulations

By separating the incident ions into directly transmitted, downstream thermalized, and diffuse ions, we perform one-dimensional (1D) hybrid simulations to investigate ion dynamics at a supercritical quasi-parallel shock. In the simulations, the angle between the upstream magnetic field and shock nominal direction is {theta}{sub Bn}=30 Degree-Sign , and the Alfven Mach number is M{sub A}{approx}5.5. The shock exhibits a periodic reformation process. The ion reflection occurs at the beginning of the reformation cycle. Part of the reflected ions is trapped between the old and new shock fronts for an extended time period. These particles eventually form superthermal diffuse ions after they escape to the upstream of the new shock front at the end of the reformation cycle. The other reflected ions may return to the shock immediately or be trapped between the old and new shock fronts for a short time period. When the amplitude of the new shock front exceeds that of the old shock front and the reformation cycle is finished, these ions become thermalized ions in the downstream. No noticeable heating can be found in the directly transmitted ions. The relevance of our simulations to the satellite observations is also discussed in the paper.

Su Yanqing; Lu Quanming; Gao Xinliang; Huang Can; Wang Shui [CAS Key Laboratory of Basic Plasma Physics, Department of Geophysics and Planetary Science, University of Science and Technology of China, Hefei 230026 (China)

2012-09-15

246

HipGISAXS: A Massively Parallel Code for GISAXS Simulation

NASA Astrophysics Data System (ADS)

Grazing Incidence Small-Angle Scattering (GISAXS) is a valuable experimental technique in probing nanostructures of relevance to polymer science. New high-performance computing algorithms, codes, and software tools have been implemented to analyze GISAXS images generated at synchrotron light sources. We have developed flexible massively parallel GISAXS simulation software ``HipGISAXS'' based on the Distorted Wave Born Approximation (DWBA). The software computes the diffraction pattern for any given superposition of custom shapes or morphologies in a user-defined region of the reciprocal space for all possible grazing incidence angles and sample rotations. This flexibility allows a straightforward study of a wide variety of possible polymer topologies and assemblies whether embedded in a thin film or a multilayered structure. Hence, this code enables guided investigations of the morphological and dynamical properties of relevance in various applications. The current parallel code is capable of computing GISAXS images for highly complex structures and with high resolutions and attaining speedups of 200x on a single-node GPU compared to the sequential code. Moreover, the multi-GPU (CPU) code achieved additional 900x (4000x) speedup on 930 GPU (6000 CPU) nodes.

Chourou, Slim; Sarje, Abhinav; Li, Xiaoye; Chan, Elaine; Hexemer, Alexander

2013-03-01

247

Thermodynamic Properties of Polypeptide Chains. Parallel Tempering Monte Carlo Simulations

NASA Astrophysics Data System (ADS)

A coarse-grained model of polypeptide chains was designed and studied. The chains consisted of united atoms located at the position of alpha carbons and the coordinates of these atoms were restricted to a [310] type lattice. Two kinds of amino acids residues were defined: hydrophilic and hydrophobic ones. The sequence of the residues was assumed to be characteristic for alpha -helical proteins (the helical septet). The force field used consisted of the long-range contact potential between residues and the local potential preferring conformational states, which were characteristic for alpha -helices. In order to study the thermodynamics of our model we employed the Multi-histogram method combined with the Parallel Tempering (the Replica Exchange) Monte Carlo sampling scheme. The optimal set of temperatures for the Parallel Tempering simulations was found by an iterative procedure. The influence of the temperature and the force field on the properties of coil-to-globule transition was studied. It was shown that this method can give more precise results when compared to Metropolis and Replica Exchange methods.

Sikorski, A.; Gront, D.

2007-05-01

248

Wisconsin Wind Tunnel II: A Fast and Portable Parallel Architecture Simulator

The design of future parallel computers requires rapid simulation of target designs running realistic workloads. These simulations have been accelerated using two techniques: direct execution and the use of a parallel host. Historically, these techniques have been considered to have poor portability. This paper identi- fies and describes the implementation of four key oper- ations necessary to make such simulation

Shubhendu S. Mukherjee; Steven K. Reinhardt; Babak Falsafi; Mike Litzkow; Steve Huss-Lederman; Mark D. Hill; James R. Larus; David A. Wood

1997-01-01

249

LARGE-SCALE MOLECULAR DYNAMICS SIMULATION USING VECTOR AND PARALLEL COMPUTERS

LARGE-SCALE MOLECULAR DYNAMICS SIMULATION USING VECTOR AND PARALLEL COMPUTERS D.C. RAPAPORT Physics on vector and parallel architectures for molecular dynamics simulation are described. For simulating systems. .............................................. 4 1.1. The molecular dynamics approach to the N-body problem ........... 4 1.2. Computational

Rapaport, Dennis C.

250

Efficiency of Parallel Machine for Large-Scale Simulation in Computational Physics

In this paper, we report on the efficiency of parallelization for atomistic-level large-scale simulations. Tight-binding and ab-initio molecular dynamics simulations are carried out on a supercomputer HITAC S-3800\\/380 and on a parallel computer HITAC SR2201. We compare the efficiencies of the two different machines based on large scale simulations to investigate advantages and disadvantages of parallel architecture.

Hiroshi Mizuseki; Keivan Esfarjani; Zhi-qiang Li; Kaoru Ohno; Yoko Akiyama; Kyoko Ichinoseki; Yoshiyuki Kawazoe

1997-01-01

251

HUMAN BEHAVIOUR MODELLING FOR DISCRETE EVENT AND AGENT BASED SIMULATION: A CASE STUDY

of both in tackling the human behaviour issues which relates to queuing time and customer satisfaction is to maximise customer satisfaction, for example by minimising their waiting times for the different services

Aickelin, Uwe

252

, Mediterranean landscape. ABSTRACT The Mediterranean Landscape Dynamics (MEDLAND) pro- ject seeks to better The Mediterranean Landscape Dynamics project brings together researchers from diverse disciplines including An agropas- toral land use in the Mediterranean Basin from the begin- nings of agriculture in the Neolithic

253

Closed-loop Load Balancing: Comparison of a Discrete Event Simulation with Experiments

and web services [8][9][10][11]. A queuing theory [12] approach is well-suited to the modeling to an expected waiting time, normalizing to account for differences among CEs, and aggregates the behavior

254

A discrete-event simulation approach to predict power consumption in machining processes

Whereas in the past the sustainable use of resources and the reduction of waste have mainly been looked at from an ecological\\u000a point of view, resource efficiency recently becomes more and more an issue of cost saving as well. In manufacturing engineering\\u000a especially the reduction of power consumption of machine tools and production facilities is in the focus of industry,

Roland Larek; Ekkard Brinksmeier; Daniel Meyer; Thorsten Pawletta; Olaf Hagendorf

255

Analysis of a hospital network transportation system with discrete event simulation

VA New England Healthcare System (VISN1) provides transportation to veterans between eight medical centers and over 35 Community Based Outpatient Clinics across New England. Due to high variation in its geographic area, ...

Kwon, Annie Y. (Annie Yean)

2011-01-01

256

Systems analysis and optimization through discrete event simulation at Amazon.com

The basis for this thesis involved a six and a half month LFM internship at the Amazon.com fulfillment center in the United Kingdom. The fulfillment center management sought insight into the substantial variation across ...

Price, Cameron S. (Cameron Stalker), 1972-

2004-01-01

257

AGENTS IN DISCRETE EVENT SIMULATION A.M. Uhrmacher B. Schattenberg

model design which supports variable structure models is complemented with a distributed, concurrent and a flexible compoÂ sitional construction of experimental frames for multiÂ agent systems, JAMES, a Java Based problems in dynamic environments (Hanks et al., 1993) . The adaptation of test beds to a concrete ap

Biundo, Susanne

258

AGENTS IN DISCRETE EVENT SIMULATION A.M. Uhrmacher B. Schattenberg

model design which supports variable structure models is complemented with a distributed, concurrent of experimental frames (Page, 1994). As does DEVS, JAMES distinguishes between 1 #12;Figure 1: Internal Structure and a exible compo- sitional construction of experimental frames for multi- agent systems, JAMES, a Java Based

Biundo, Susanne

259

Discrete-event simulation of fluid stochastic Petri nets Gianfranco Ciardo1

. Trivedi3 ciardo@cs.wm.edu nicol@cs.dartmouth.edu kst@egr.duke.edu 1 Dept. of Computer Science, College of William and Mary, Williamsburg, VA 23187 2 Dept. of Computer Science, Dartmouth College, Hanover, NH 03755-EEC-94-18765. the FSPN formalism we propose include: Â· Fluid impulses associated with both immediate

Ciardo, Gianfranco

260

Analyzing Skill-Based Routing Call Centers Using Discrete-Event Simulation and Design Experiment

Call center customer service representatives (CSRs) or agents tend to have different skills. Some CSRs can handle one type of call, while other CSRs can handle other types of calls. Advances in automatic call distributors (ACDs) have made it possible to have skill-based routing (SBR) which is the protocol for online routing of incoming calls to the appropriate CSRs. At

Thomas A. Mazzuchi; Rodney B. Wallace

2004-01-01

261

Simulation of Parallel Interacting Faults and Earthquake Predictability

NASA Astrophysics Data System (ADS)

Numerical shear experiments of a granular region using the lattice solid model often exhibit accelerating energy release in the lead-up to large events (Mora et al, 2000) and a growth in correlation lengths in the stress field (Mora and Place, 2002). While these results provide evidence for a Critical Point-like mechanism in elasto-dynamic systems and the possibility of earthquake forecasting but they do not prove such a mechanism occurs in the crust. Cellular automaton models simulations exhibit accelerating energy release prior to large events or unpredictable behaviour in which large events may occur at any time depending on tuning parameters such as dissipation ratio and stress transfer ratio (Weatherley and Mora, 2003). The mean stress plots from the particle simulations are most similar to the CA mean stress plots near the boundary of the predictable and unpredictable regimes suggesting that elasto-dynamic systems may be close to the borderline of predictable and unpredictable. To progress in resolving the question of whether more realistic fault system models exhibit predictable behaviour and to determine whether they also have an unpredictable and predictable regime depending on tuning parameters like that seen in CA simulations, we developed a 2D elasto-dynamic model of parallel interacting faults. The friction is slip weakening until a critical slip distance. Henceforth, the friction is at the dynamic value until the slip rate drops below the value it attained when the critical slip distance was exceeded. As the slip rate continues to drop, the friction increases back to the static value as a function of slip rate. Numerical shear experiments are conducted in a model with 41 parallel interacting faults. Calculations of the inverse metric defined in Klein et al (2000) indicate that the system is non-ergodic. Furthermore, by calculating the correllation between the stress fields at different times we determine that the system exhibits so called ``glassy'' behaviour. This implies that mean field theoretical analysis such as Klein et al, 2000 requires introduction of a memory kernel in order to properly account for the glassy behaviour of interacting fault system models. The elasto-dynamic parallel interacting fault model helps to provide a crucial link between CA maps of phase space and the behaviour of more realistic elasto-dynamic interacting fault system models, and thus, a means to improve understanding of the dynamics and predictability of real fault systems. REFERENCES W. Klein and M. Anghel and C.D. Ferguson and J.B. Rundle and J.S. Sá Martins (2000) Statistical Analysis of a Model for Earthquake Faults with Long-Range Stress Transfer, in: Geocomplexity and the Physics of Earthquakes (Geophysical Monograph series; no. 120), eds. J.B. Rundle and D. L. Turcotte and W. Klein, pp 43-71 (American Geophysical Union, Washington, DC). Mora, P., Place, D., Abe, S. and Jaumé, S. (2000) Lattice solid simulation of the physics of earthquakes: the model, results and directions, in: GeoComplexity and the Physics of Earthquakes (Geophysical Monograph series; no. 120), eds. Rundle, J.B., Turcotte, D.L. &Klein, W., pp 105-125 (American Geophys. Union, Washington, DC). Mora, P., and Place, D. (2002) Stress correlation function evolution in lattice solid elasto-dynamic models of shear and fracture zones and earthquake prediction, Pure Appl. Geophys, 159, 2413-2427. Weatherley, D. and Mora, P. (2003) Accelerating Precursory Activity within a Class of Earthquake Analog Automata, Pure Appl. Geophysics, submitted.

Mora, P.; Weatherley, D.; Klein, B.

2003-04-01

262

NASA Technical Reports Server (NTRS)

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

Hsieh, Shang-Hsien

1993-01-01

263

Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations

Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.

Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia

2010-08-01

264

Physical Simulation for Animation and Visual Effects: Parallelization and Characterization for Chip

Physical Simulation for Animation and Visual Effects: Parallelization and Characterization for Chip- ticular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor, and cloth simulation. They are computation- ally demanding, requiring from a few seconds to several minutes

Liblit, Ben

265

Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

NASA Technical Reports Server (NTRS)

The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

Baganoff, Donald

1996-01-01

266

Parallel Markov Chain Monte Carlo Simulation by Pre-Fetching

burn-in is a serious problem, it is often desirable to use parallelization to speed up generation majority of computers sold now come with built-in ethernet adaptors. Partly as a result of this, parallel the same speed (i.e., they are "balanced"), such an approach leads to an increase in speed by a factor

267

A sweep algorithm for massively parallel simulation of circuit-switched networks

NASA Technical Reports Server (NTRS)

A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

1992-01-01

268

GloMoSim: A Library for Parallel Simulation of Large-Scale Wireless Networks

A number of library-based parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication protocol stack for wireless networks is divided into a set of layers, each with its own API. Models of

Xiang Zeng; Rajive Bagrodia; Mario Gerla

1998-01-01

269

We develop a novel deadlock control policy for modeling concurrent execution of manufacturing assembly processes in FMS through a class of nets, called G-system with limited shared resources, which is a large class of discrete event systems generalizing well-known models presented in the literature. A relevant property of the system behavior is to be non-blocking, i.e., from any reachable state,

Zhiwu Li; Mi Zhao; Rongming Zhu

2005-01-01

270

Parallel computing in enterprise modeling.

This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

2008-08-01

271

Efficient parallelization of molecular dynamics simulations with short-ranged forces

NASA Astrophysics Data System (ADS)

Recently, an alternative strategy for the parallelization of molecular dynamics simulations with short-ranged forces has been proposed. In this work, this algorithm is tested on a variety of multi-core systems using three types of benchmark simulations. The results show that the new algorithm gives consistent speedups which are depending on the properties of the simulated system either comparable or superior to those obtained with spatial decomposition. Comparisons of the parallel speedup on different systems indicates that on multi-core machines the parallel efficiency of the method is mainly limited by memory access speed.

Meyer, Ralf

2014-10-01

272

Parallel Monte-Carlo Tree Search with Simulation Servers

Monte-Carlo tree search is a new best-first tree search algorithm that triggered a revolution in the computer Go world. Developing good parallel Monte-Carlo tree search algorithms is importan because single processor's performance cannot be expected to increase as used to. A novel parallel Monte-Carlo tree search algorithm is proposed. A tree searcher runs on a client computer and multiple Monte-Carlo

Hideki Kato; Ikuo Takeuchi

2010-01-01

273

Parallel climate model (PCM) control and transient simulations

The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3)\\u000a and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory\\u000a Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on\\u000a several distributed and

W. M. Washington; J. W. Weatherly; G. A. Meehl; A. J. Semtner Jr.; T. W. Bettge; A. P. Craig; W. G. Strand Jr.; J. M. Arblaster; V. B. Wayland; R. James; Y. Zhang

2000-01-01

274

Simulation of Parallel Random Access Machines by Circuits

A relationship is established between (i) parallel random-access machines that allow many processors to concurrently read from or write into a common memory including simultaneous reading or writing into the same memory location (CRCW PRAM), and (ii) combinational logic circuits that contain AND's, OR's and NOT's, with no bound placed on the fan-in of AND-gates and OR-gates. Parallel time and

Larry J. Stockmeyer; Uzi Vishkin

1984-01-01

275

A fuzzy discrete event system approach to determining optimal HIV/AIDS treatment regimens.

Treatment decision-making is complex and involves many factors. A systematic decision-making and optimization technology capable of handling variations and uncertainties of patient characteristics and physician's subjectivity is currently unavailable. We recently developed a novel general-purpose fuzzy discrete event systems theory for optimal decision-making. We now apply it to develop an innovative system for medical treatment, specifically for the first round of highly active antiretroviral therapy of human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) patients involving three historically widely used regimens. The objective is to develop such a system whose regimen choice for any given patient will exactly match expert AIDS physician's selection to produce the (anticipated) optimal treatment outcome. Our regimen selection system consists of a treatment objectives classifier, fuzzy finite state machine models for treatment regimens, and a genetic-algorithm-based optimizer. The optimizer enables the system to either emulate an individual doctor's decision-making or generate a regimen that simultaneously satisfies diverse treatment preferences of multiple physicians to the maximum extent. We used the optimizer to automatically learn the values of 26 parameters of the models. The learning was based on the consensus of AIDS specialists A and B on this project, whose exact agreement was only 35%. The performance of the resulting models was first assessed. We then carried out a retrospective study of the entire system using all the qualifying patients treated in our institution's AIDS Clinical Center in 2001. A total of 35 patients were treated by 13 specialists using the regimens (four and eight patients were treated by specialists A and B, respectively). We compared the actually prescribed regimens with those selected by the system using the same available information. The overall exact agreement was 82.9% (29 out of 35), with the exact agreement with specialists A and B both at 100%. The exact agreement for the remaining 11 physicians not involved in the system training was 73.9% (17 out of 23), an impressive result given the fact that expert opinion can be quite divergent for treatment decisions of such complexity. Our specialists also carefully examined the six mismatched cases and deemed that the system actually chose a more appropriate regimen for four of them. In the other two cases, either would be reasonable choices. Our approach has the capabilities of generalizing, learning, and representing knowledge even in the face of weak consensus, and being readily upgradeable to new medical knowledge. These are practically important features to medical applications in general, and HIV/AIDS treatment in particular, as national HIV/AIDS treatment guidelines are modified several times per year. PMID:17044400

Ying, Hao; Lin, Feng; MacArthur, Rodger D; Cohn, Jonathan A; Barth-Jones, Daniel C; Ye, Hong; Crane, Lawrence R

2006-10-01

276

A self-learning fuzzy discrete event system for HIV/AIDS treatment regimen selection.

The U.S. Department of Health and Human Services Human Immunodeficiency Virus (HIV)/Acquired Immune Deficiency Syndrome (AIDS) treatment guidelines are modified several times per year to reflect the rapid evolution of the field (e.g., emergence of new antiretroviral drugs). As such, a treatment-decision support system that is capable of self-learning is highly desirable. Based on the fuzzy discrete event system (FDES) theory that we recently created, we have developed a self-learning HIV/AIDS regimen selection system for the initial round of combination antiretroviral therapy, one of the most complex therapies in medicine. The system consisted of a treatment objectives classifier, fuzzy finite state machine models for treatment regimens, and a genetic-algorithm-based optimizer. Supervised learning was achieved through automatically adjusting the parameters of the models by the optimizer. We focused on the four historically popular regimens with 32 associated treatment objectives involving the four most important clinical variables (potency, adherence, adverse effects, and future drug options). The learning targets for the objectives were produced by two expert AIDS physicians on the project, and their averaged overall agreement rate was 70.6%. The system's learning ability and new regimen suitability prediction capability were tested under various conditions of clinical importance. The prediction accuracy was found between 84.4% and 100%. Finally, we retrospectively evaluated the system using 23 patients treated by 11 experienced nonexpert faculty physicians and 12 patients treated by the two experts at our AIDS Clinical Center in 2001. The overall exact agreement between the 13 physicians' selections and the system's choices was 82.9% with the agreement for the two experts being both 100%. For the seven mismatched cases, the system actually chose more appropriate regimens in four cases and equivalent regimens in another two cases. It made a mistake in one case. These (preliminary) results show that 1) the System outperformed the nonexpert physicians and 2) it performed as well as the expert physicians did. This learning and prediction approach, as well as our original FDESs theory, is general purpose and can be applied to other medical or nonmedical problems. PMID:17702293

Ying, Hao; Lin, Feng; MacArthur, Rodger D; Cohn, Jonathan A; Barth-Jones, Daniel C; Ye, Hong; Crane, Lawrence R

2007-08-01

277

In this paper we compare efficiency of two versions of a parallel algorithm for finite element compressible fluid flow simulations\\u000a on unstructued grids. The first version is based on the explicit model of parallel programming (with message-passing paradigm),\\u000a while the second incorporates the implicit model (in which data-parallel programming is used). Time discretization of the\\u000a compressible Euler equations is organized

Joanna Plazek; Krzysztof Banas; Jacek Kitowski

1998-01-01

278

Adaptive finite element simulation of flow and transport applications on parallel computers

The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design\\/implementation

Benjamin Shelton Kirk

2007-01-01

279

A NEW DIMENSION OF URBAN CLIMATE MODELLING WITH PARALLEL LARGE-EDDY SIMULATION

We introduce the topography version of the parallelized large-eddy simulation (LES) model PALM and describe its new features and methods and its performance on current supercomputers. Validation shows that PALM is in line with experimental and other LES results, i.e. superior to the conventional Reynolds-averaged models. State-of-the-art parallel computing and parallel, on-the-fly graphics processing make the LES technique quick and

Marcus Oliver Letzel; Manabu Kanda; Siegfried Raasch

280

ACCELERATION OF RADIANCE FOR LIGHTING SIMULATION BY USING PARALLEL COMPUTING WITH OPENCL

ACCELERATION OF RADIANCE FOR LIGHTING SIMULATION BY USING PARALLEL COMPUTING WITH OPENCL Wangda Zuo on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications

281

Scalable Time-Parallelization of Molecular Dynamics Simulations in Nano Mechanics

Scalable Time-Parallelization of Molecular Dynamics Simulations in Nano Mechanics Yanan Yu Dept: chandra@eng.fsu.edu. Abstract-- Molecular Dynamics (MD) is an important atomistic simulation technique. Molecular dynamics is widely used to simulate the behavior of physical systems in such applications

Srinivasan, Ashok

282

MHD Simulations on Parallel Computers R.F. Stein \\Lambda Klaus Galsgaard y Ake Nordlund y

MHD Simulations on Parallel Computers R.F. Stein \\Lambda Klaus Galsgaard y Å¡ Ake Nordlund y to the study of driven magnetic reconnection. 1 Data Parallel Model and the Code HD & MHD calculations, except the performance and lay out our arrays in an efficient manner. We believe that compiler efficiency

Stein, Robert

283

Parallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines

of the most efficient iterative methods available for solving the finite-element basis functions of Maxwell. CUDA (Compute Unified Device Architecture) [3] is a parallel programming model and software environmentParallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines Wendell O

Paris-Sud XI, UniversitÃ© de

284

Computationally-efficient and scalable parallel implementation of chemistry in simulations chemistry in parallel LES/PDF computations using in situ adaptive tabulation (ISAT) and x2f_mpi Â a Fortran the strategies, we perform LES/PDF computations of the Sandia Flame D with chemistry rep- resented using (a) a 16

285

LUsim: A Framework for Simulation-Based Performance Modeling and Prediction of Parallel Sparse LU Factorization Pietro Cicotti Xiaoye S. Li Scott B. Baden April 15, 2008 Abstract Sparse parallel factorization framework uses micro-benchmarks to calibrate the parameters of machine characteristics and additional tools

Geddes, Cameron Guy Robinson

286

Modeling and Simulation of a Parallel Mechanical Elbow with 3 DOF

The modeling and simulation of a mechanical elbow of 3 degrees of freedom, is introduced by highlighting the main features of the mechanism related to the design criteria. The mechanical elbow is used as a transhumeral prosthetic part, and it has been built as a parallel topology consisting of electric linear actuators and universal joints. The parallel mechanism has 4

José Rafael Mendoza-Vázquez; Apolo Zeus Escudero-Uribe; Esteban Tlelo-Cuautle

2008-01-01

287

Parallel Object Oriented Implementation of a 2D Bounded Electrostatic Plasma PIC Simulation \\Lambda

Parallel Object Oriented Implementation of a 2D Bounded Electrostatic Plasma PIC Simulation \\Lambda of Technology, Pasadena CA 91109, USA. 1 #12; 2 Norton et al. heat, pressure or electric discharges. Fusion the electrostatic (coloumb) interactions are inc

Bystroff, Chris

288

Parallelization of particle-in-cell simulation modeling Hall-effect thrusters

MIT's fully kinetic particle-in-cell Hall thruster simulation is adapted for use on parallel clusters of computers. Significant computational savings are thus realized with a predicted linear speed up efficiency for certain ...

Fox, Justin M., 1981-

2005-01-01

289

Computer simulation program for parallel SITAN. [Sandia Inertia Terrain-Aided Navigation, in FORTRAN

This computer program simulates the operation of parallel SITAN using digitized terrain data. An actual trajectory is modeled including the effects of inertial navigation errors and radar altimeter measurements.

Andreas, R.D.; Sheives, T.C.

1980-11-01

290

A novel parallel-rotation algorithm for atomistic Monte Carlo simulation of dense polymer systems

We develop and test a new elementary Monte Carlo move for use in the off-lattice simulation of polymer systems. This novel Parallel-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The parallel-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo simulation. The ParRot

S. Santos; U. W. Suter; M. Müller; J. Nievergelt

2001-01-01

291

iPRIDE: a parallel integrated circuit simulator using direct method

A parallel circuit simulator, iPRIDE, which uses a direct solution method and runs on a shared-memory multiprocessor is described. The simulator is based on a multilevel node tearing approach which produces a nested bordered-block-diagonal (BBD) form of the circuit equation matrix. The parallel solution of the nested BBD matrix is described. Its efficiency is shown to depend on how the

Mi-Chang Chang; I. N. Hajj

1988-01-01

292

Parallel Simulation of Ion Recombination in Nonpolar Liquids

Ethernet and the other using Myrinet. On Ethernet, the program suffers from a large communi- cation overhead. Using the Myrinet high-speed network in combination with a programming system (Orca) that is optimized for fast networks, however, the program obtains a high efficiency. Keywords: parallel Monte Carlo

Seinstra, Frank J.

293

NASA Astrophysics Data System (ADS)

In this dissertation, we present two parallelized 3D simulation techniques for three-dimensional acoustic and elastic wave propagation based on the finite integration technique. We demonstrate their usefulness in solving real-world problems with examples in the three very different areas of nondestructive evaluation, medical imaging, and security screening. More precisely, these include concealed weapons detection, periodontal ultrasography, and guided wave inspection of complex piping systems. We have employed these simulation methods to study complex wave phenomena and to develop and test a variety of signal processing and hardware configurations. Simulation results are compared to experimental measurements to confirm the accuracy of the parallel simulation methods.

Rudd, Kevin Edward

294

Cimlib: A Fully Parallel Application For Numerical Simulations Based On Components Assembly

NASA Astrophysics Data System (ADS)

This paper presents CIMLIB with its two main characteristics: an Object Oriented Program and a fully parallel code. CIMLIB aims at providing a set of components that can be organized to build numerical simulation of a certain process. We describe two components: one treats the complex task of parallel remeshing, the other puts the focus on the Finite Element modeling. In a second part, we present some parallel performances and an example of a very large simulation (over a mesh of 25 millions nodes) that begins with the mesh generation and ends up writing results files, all done using 88 processors.

Digonnet, Hugues; Silva, Luisa; Coupez, Thierry

2007-05-01

295

O( N) parallel tight binding molecular dynamics simulation of carbon nanotubes

NASA Astrophysics Data System (ADS)

We report an O( N) parallel tight binding molecular dynamics simulation study of (10×10) structured carbon nanotubes (CNT) at 300 K. We converted a sequential O( N3) TBMD simulation program into an O( N) parallel code, utilizing the concept of parallel virtual machines (PVM). The code is tested in a distributed memory system consisting of a cluster with 8 PC's that run under Linux (Slackware 2.2.13 kernel). Our results on the speed up, efficiency and system size are given.

Özdo?an, Cem; Dereli, Gülay; Ça??n, Tahir

2002-10-01

296

We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

Kothe, D.B.; Turner, J.A.; Mosso, S.J. [Los Alamos National Lab., NM (United States); Ferrell, R.C. [Cambridge Power Computer Assoc. (United States)

1997-03-01

297

Xyce parallel electronic simulator users' guide, Version 6.0.1.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory. [Raytheon, Albuquerque, NM] [Raytheon, Albuquerque, NM

2014-01-01

298

Pelegant: A Parallel Accelerator Simulation Code for Electron Generation and Tracking

NASA Astrophysics Data System (ADS)

elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

Wang, Y.; Borland, M.

2006-11-01

299

A Queue Simulation Tool for a High Performance Scientific Computing Center

NASA Technical Reports Server (NTRS)

The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

Spear, Carrie; McGalliard, James

2007-01-01

300

Parallelization of a Molecular Dynamics Simulation of AN Ion-Surface Collision System:

NASA Astrophysics Data System (ADS)

Parallel molecular dynamics simulation study of the ion-surface collision system is reported. A sequential molecular dynamics simulation program is converted into a parallel code utilizing the concept of parallel virtual machine (PVM). An effective and favorable algorithm is developed. Our parallelization of the algorithm shows that it is more efficient because of the optimal pair listing, linear scaling, and constant behavior of the internode communications. The code is tested in a distributed memory system consisting of a cluster of eight PCs that run under Linux (Debian 2.4.20 kernel). Our results on the collision system are discussed based on the speed up, efficiency and the system size. Furthermore, the code is used for a full simulation of the Ar-Ni(100) collision system and calculated physical quantities are presented.

Ati?, Murat; Özdo?an, Cem; Güvenç, Ziya B.

301

Time Warp Simulation on Clumps

Traditionally, parallel discrete-event simulators based on the Time Warp synchronization protocol have been implemented using either the shared memory programming model or the distributed memory, message passing programming model. This was because the preferred hardware platform was either a shared memory multiprocessor workstation or a network of uniprocessor workstations. However, with the advent of clumps (cluster of shared memory multiprocessors), a change in this dichotomous view becomes necessary. This paper explores the design and implementation issues involved in exploiting this new platform for Time Warp simulations. Specifically, this paper presents two generic strategies for implementing Time Warp simulators on clumps. In addition, we present our experiences in implementing these strategies on an extant distributed memory, message passing Time Warp simulator (warped). Preliminary performance results comparing the modified clump-specific simulation kernel to the unmodified d...

Girindra D. Sharma; Radharamanan Radhakrishnan; Umesh Kumar V. Rajasekaran; Umesh Kumar; V. Rajasekaran; Nael Abu-ghazaleh; Philip A. Wilsey

1999-01-01

302

amatos: Parallel adaptive mesh generator for atmospheric and oceanic simulation

NASA Astrophysics Data System (ADS)

The grid generator amatos has been developed for adaptive modeling of ocean and atmosphere circulation. It features adaptive control of planar, spherical, and volume grids with triangular or tetrahedral elements refined by bisection. The user interface (GRID API), a Fortran 90 module, shields the application programmer from the technical aspects of mesh adaptation like amatos' hierarchical data structure, the OpenMP parallelization, and the effective calculation of a domain decomposition by a space filling curve (SFC) approach. This article presents the basic structure and features of amatos, the powerful SFC ordering and decomposition of data, and two example applications, namely the modeling of tracer advection in the polar vortex and the development of the adaptive finite element atmosphere model PLASMA (parallel large scale model of the atmosphere).

Behrens, Jörn; Rakowsky, Natalja; Hiller, Wolfgang; Handorf, Dörthe; Läuter, Matthias; Päpke, Jürgen; Dethloff, Klaus

303

Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506

Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

2014-01-01

304

Simulation of parallel random access machines by circuits

A relationship is established between (i) parallel random-access machines that allow many processors to concurrently read from or write into a common memory including simultaneous reading or writing into the same memory location (CRCW PRAM), and (ii) combinational logic circuits that contain AND's, OR's and NOT's, with no bound placed on the fan-in of AND-gates and OR-gates. Parallel time and number of processors for CRCW PRAM's are shown to correspond respectively (and simulaneously) to depth and size for circuits, where the time-depth correspondence is to within a constant factor and the processors-size correspondence is to within a polynomial. By applying a recent result of Furst, Saxe and Sipser, the authors obtain the corollary that parity, integer multiplication, graph transitive closure and integer sorting cannot be computed in constant time by a CRCW PRAM with a polynomial number of processors. This is the first nonconstant lower bound on the parallel time required to solve these problems by a CRCW PRAM with a polynomial number of processors. The authors also state and outline the proof of a similar result, due to W. L. Ruzzo and M. Tompa, that relates time and processor bounds for CRCW PRAM's to alternation and space bounds for alternating Turing machines.

Stockmeyer, L.; Vishkin, U.

1984-05-01

305

A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and ?-Hemolysin (?-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and ?-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. PMID:23740647

Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

2013-09-15

306

Uncertainty Analysis for Parallel Car-crash Simulation Results

Small changes in parameters, load cases or model specifications for crash simulation may result in huge changes in the results, characterizing the crash behavior of an automotive design. For a BMW test case, differences between the position of a node in two simulation runs of up to 10 cm were observed, just as a result of round-off differences in the

Liquan Mei; C. A. Thole

307

We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications in this domain model and simulate natural phenomena, and often direct visual components of motion pictures. We study a set of three workloads that exemplify the span

Christopher J. Hughes; Radek Grzeszczuk; Eftychios Sifakis; Daehyun Kim; Sanjeev Kumar; Andrew P. Selle; Jatin Chhugani; Matthew Holliman; Yen-Kuang Chen

2007-01-01

308

Biophysically Accurate Brain Modeling and Simulation using Hybrid MPI/OpenMP Parallel Processing

the simulation efficiency on the targeted parallel platform. Using 32 processors, the proposed hybrid approach, on the other hand, is more efficient than the MPI implementation and is about 31X faster than a serial implementation of the simulator for a network...

Hu, Jingzhen

2012-07-16

309

A Multidimensional Study on the Feasibility of Parallel Switch-Level Circuit Simulation

This paper presents the results of an experimental study to evaluate the effectiveness of multiple synchronization protocols and partitioning algorithms in reducing the execution time of switch-level models of VLSI circuits. Specific contributions of this paper include: (i) parallelizing an existing switch-level simulator such that the model can be executed using conservative and optimistic simulation protocols with minor changes, (ii)

Yu-an Chen; Vikas Jha; Rajive Bagrodia

1997-01-01

310

Distributing computation among multiple processors is one approach to reducing simulation time for large VLSI circuit designs. However, parallel simulation introduces the problem of how to partition the logic gates and system behaviors of the circuit among the available processors in order to obtain maximum speedup. A complicating factor that is often ignored is the effect of the time-synchronization protocol

Kevin L. Kapp; Thomas C. Hartrum; Tom S. Wailes

1995-01-01

311

Eliminating Race Conditions in System-Level Models by using Parallel Simulation Infrastructure

revealed a number of dangerous race conditions in existing embedded multi-media application models often does not reveal such mistakes in the model. Even if the simulation fails due to encountered raceEliminating Race Conditions in System-Level Models by using Parallel Simulation Infrastructure

Doemer, Rainer

312

Parallel Numerical Simulation of Boltzmann Transport in Single-Walled Carbon Nanotubes

NSDL National Science Digital Library

This module teaches the basic principles of semi-classical transport simulation based on the time-dependent Boltzmann transport equation (BTE) formalism with performance considerations for parallel implementations of multi-dimensional transport simulation and the numerical methods for efficient and accurate solution of the BTE for both electronic and thermal transport using the simple finite difference discretization and the stable upwind method.

Aksamija, Zlatan

313

Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

2011-09-06

314

Highly parallel Monte-Carlo simulations of the acousto-optic effect in heterogeneous turbid media.

The development of a highly parallel simulation of the acousto-optic effect is detailed. The simulation supports optically heterogeneous simulation domains under insonification by arbitrary monochromatic ultrasound fields. An adjoint method for acousto-optics is proposed to permit point-source/point-detector simulations. The flexibility and efficiency of this simulation code is demonstrated in the development of spatial absorption sensitivity maps which are in broad agreement with current experimental investigations. The simulation code has the potential to provide guidance in the feasibility and optimization of future studies of the acousto-optic technique, and its speed may permit its use as part of an iterative inversion model. PMID:22559676

Powell, Samuel; Leung, Terence S

2012-04-01

315

Parallel Unsteady Turbopump Flow Simulations for Reusable Launch Vehicles

NASA Technical Reports Server (NTRS)

An efficient solution procedure for time-accurate solutions of Incompressible Navier-Stokes equation is obtained. Artificial compressibility method requires a fast convergence scheme. Pressure projection method is efficient when small time-step is required. The number of sub-iteration is reduced significantly when Poisson solver employed with the continuity equation. Both computing time and memory usage are reduced (at least 3 times). Other work includes Multi Level Parallelism (MLP) of INS3D, overset connectivity for the validation case, experimental measurements, and computational model for boost pump.

Kiris, Cetin; Kwak, Dochan

2000-01-01

316

Parallelized modelling and solution scheme for hierarchically scaled simulations

NASA Technical Reports Server (NTRS)

This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

Padovan, Joe

1995-01-01

317

Parallel Quantum Computer Simulation on the CUDA Architecture

Due to their increasing computational power, modern graphics processing architectures are becoming more and more popular for\\u000a general purpose applications with high performance demands. This is the case of quantum computer simulation, a problem with\\u000a high computational requirements both in memory and processing power. When dealing with such simulations, multiprocessor architectures\\u000a are an almost obliged tool. In this paper we

Eladio Gutiérrez; Sergio Romero; María A. Trenas; Emilio L. Zapata

2008-01-01

318

A parallel FFT accelerated transient field-circuit simulator

A novel fast electromagnetic field-circuit simulator that permits the full-wave modeling of transients in nonlinear microwave circuits is proposed. This time-domain simulator is composed of two components: 1) a full-wave solver that models interactions of electromagnetic fields with conducting surfaces and finite dielectric volumes by solving time-domain surface and volume electric field integral equations, respectively, and 2) a circuit solver

Ali E. Yilmaz; Jian-Ming Jin; Eric Michielssen

2005-01-01

319

-structure interaction simulation of blast and explosions impacting on realistic building structures with a block Verification and validation configurations Blast-driven deformation Detonation-driven deformations Conclusions, blast, detonations) http://www.cacr.caltech.edu/asc Papers: [Deiterding, 2011, Deiterding et al., 2009

Deiterding, Ralf

320

Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)

NASA Astrophysics Data System (ADS)

Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.

Kum, Oyeon

2004-11-01

321

Shared Memory Implementation of a Parallel Switch-Level Circuit Simulator

Circuit simulation is a critical bottleneck in VLSIdesign. This paper describes the implementation ofan existing parallel switch-level simulator called MIRSIMon a shared-memory multiprocessor architecture.The simulator uses a set of three different conservativeprotocols: the null message protocol, the conditionalevent protocol and the accelerated null message protocol,a combinations of the preceding two algorithms.The paper describes the implementation of these protocolsto exploit...

Yu-an Chen; Rajive Bagrodia

1998-01-01

322

Parallel Many--Body Simulations Without All--to--All Communication

Simulations of interacting particles are common in science and engineering, appearing insuch diverse disciplines as astrophysics, fluid dynamics, molecular physics, and materials science.These simulations are often computationally intensive and so natural candidates for massivelyparallel computing. Many--body simulations that directly compute interactions between pairsof particles, be they short--range or long--range interactions, have been parallelized in severalstandard ways. The...

Bruce Hendrickson; Steve Plimpton Sandia

1993-01-01

323

NASA Technical Reports Server (NTRS)

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

Sohn, Andrew; Biswas, Rupak

1996-01-01

324

Characterization of parallelism and deadlocks in distributed digital logic simulation

This paper explores the suitability of the Chandy-Misra algorithm for digital logic simulation. We use four realistic circuits as benchmarks for our analysis, with one of them being the vector-unit controller for the Titan supercomputer from Ardent. Our results show that the average number of logic elements available for concurrent execution ranges from 10 to 111 for the four circuits,

Larry Soulé; Anoop Gupta

1989-01-01

325

Long-range interactions & parallel scalability in molecular simulations

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modelling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and

Michael Patra; Marja T. Hyvonen; Emma Falck; Mohsen Sabouri-Ghomi; Ilpo Vattulainen; Mikko Karttunen

2004-01-01

326

High-Resolution Simulations of Parallel BladeVortex Interactions

contribution coming from the trailing edge. The simulations are then extended to three-dimensional moving by the interaction is seen to primarily radiate from the leading-edge section of the airfoil with a weakerÂvortex interaction computations Introduction A MAJOR source of rotorcraft noise is generated by the rotor blades

Alonso, Juan J.

327

: A Scalable and Transparent System for Simulating MPI Programs

is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, and MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.

Perumalla, Kalyan S [ORNL

2010-01-01

328

Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

NASA Technical Reports Server (NTRS)

The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.

Sawyer, Darren Charles

1994-01-01

329

NASA Astrophysics Data System (ADS)

In this paper features of numerical simulation of the large-scale system artificial satellites motion by parallel computing is discussed per example instantiation program complex "Numerical model of the system artificial satellites motion" in cluster "Skiff Cyberia". It is shown that using of parallel computing allows to implement simultaneously high-precision numerical simulation of the motion of large-scale system artificial satellites. It opens comprehensive facilities in solve direct and regressive problems of dynamics such satellite system as GLONASS and objects of space debris.

Bordovitsyna, T. V.; Avdyushev, V. A.; Chuvashov, I. N.; Aleksandrova, A. G.; Tomilova, I. V.

2009-11-01

330

A new parallel method for molecular dynamics simulation of macromolecular systems

Short-range molecular dynamics simulations of molecular systems are commonly parallelized by replicated-data methods, where each processor stores a copy of all atom positions. This enables computation of bonded 2-, 3-, and 4-body forces within the molecular topology to be partitioned among processors straightforwardly. A drawback to such methods is that the inter-processor communication scales as N, the number of atoms, independent of P, the number of processors. Thus, their parallel efficiency falls off rapidly when large numbers of processors are used. In this paper a new parallel method called force-decomposition for simulating macromolecular or small-molecule systems is presented. Its memory and communication costs scale as N/{radical}P, allowing larger problems to be run faster on greater numbers of processors. Like replicated-data techniques, and in contrast to spatial-decomposition approaches, the new method can be simply load-balanced and performs well even for irregular simulation geometries. The implementation of the algorithm in a prototypical macromolecular simulation code ParBond is also discussed. On a 1024-processor Intel Paragon, ParBond runs a standard benchmark simulation of solvated myoglobin with a parallel efficiency of 61% and at 40 times the speed of a vectorized version of CHARMM running on a single Cray Y-MP processor.

Plimpton, S.; Hendrickson, B.

1994-08-01

331

Parallel simulation of tsunami inundation on a large-scale supercomputer

NASA Astrophysics Data System (ADS)

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

332

NASA Technical Reports Server (NTRS)

This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

Fijany, Amir (inventor); Bejczy, Antal K. (inventor)

1993-01-01

333

A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

Jones, Mark Howard

1987-01-01

334

Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

NASA Technical Reports Server (NTRS)

A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.

Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

2009-01-01

335

IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.

Zhao, Jinkui [ORNL

2011-01-01

336

A parallel computational framework for integrated surface-subsurface flow and transport simulations

NASA Astrophysics Data System (ADS)

HydroGeoSphere is a 3D control-volume finite element hydrologic model describing fully-integrated surface and subsurface water flow and solute and thermal energy transport. Because the model solves tighly-coupled highly-nonlinear partial differential equations, often applied at regional and continental scales (for example, to analyze the impact of climate change on water resources), high performance computing (HPC) is essential. The target parallelization includes the composition of the Jacobian matrix for the iterative linearization method and the sparse-matrix solver, a preconditioned Bi-CGSTAB. The matrix assembly is parallelized by using a coarse-grained scheme in that the local matrix compositions can be performed independently. The preconditioned Bi-CGSTAB algorithm performs a number of LU substitutions, matrix-vector multiplications, and inner products, where the parallelization of the LU substitution is not trivial. The parallelization of the solver is achieved by partitioning the domain into equal-size subdomains, with an efficient reordering scheme. The computational flow of the Bi-CGSTAB solver is also modified to reduce the parallelization overhead and to be suitable for parallel architectures. The parallelized model is tested on several benchmark simulations which include linear and nonlinear flow problems involving various domain sizes and degrees of hydrologic complexities. The performance is evaluated in terms of computational robustness and efficiency, using standard scaling performance measures. The results of simulation profiling indicate that the efficiency becomes higher with an increasing number of nodes/elements in the mesh, for increasingly nonlinear transient simulations, and with domains of irregular geometry. These characteristics are promising for the large-scale analysis water resources problems involved integrated surface/subsurface flow regimes.

Park, Y.; Hwang, H.; Sudicky, E. A.

2010-12-01

337

REQUIREMENT AND USE FOR REMOTE TEACHING OF DISCRETE EVENTS SYSTEMS Pascale MARANGE1

maintenance, monitoring, supervision and follow-up of plants. In this article, we are interested work. In the field of remote use of real system in feed-back control, we note work of (Lunt and al papers concern the D.E.S teaching and the use of real or simulated control/command systems (control part

Boyer, Edmond

338

We developed code in MasPar Fortran (an extension of Fortran 90) to conduct molecular dynamics simulations on a MasPar MP-1 massively parallel computer. The code is portable to other Single-Instruction Multiple-Data (SIMD) platforms with minor modifications. We used a two dimensional grid containing over 220,000 atoms to simulate a high strain -rate fracture growth in Cu-Ni alloys. The atoms are

Willard C. Morrey

1996-01-01

339

Supernova emulators: connecting massively parallel SN Ia radiative transfer simulations to data Simulation Best Fit w/Emulator Emulator 68% CI Forward Emulation of True Parameters -2.8 0.0 2.8 (1041 ergs-1.56 -- MCO (M ) 0.06 -- (cm2 g-1 ) 0.10 -- MEj (M ) 1.17 1.01+0.09 -0.07 SNF20060907-000 Best Fit w/Emulator

Backer, Don

340

GalaxSee HPC Module 1: The N-Body Problem, Serial and Parallel Simulation

NSDL National Science Digital Library

This module introduces the N-body problem, which seeks to account for the dynamics of systems of multiple interacting objects. Galaxy dynamics serves as the motivating example to introduce a variety of computational methods for simulating change and criteria that can be used to check for model accuracy. Finally, the basic issues and ideas that must be considered when developing a parallel implementation of the simulation are introduced.

Joiner, David

341

A hybrid simulation approach is developed to study chemical reactions coupled with long-range mechanical phenomena in materials. The finite-element method for continuum mechanics is coupled with the molecular dynamics method for an atomic system that embeds a cluster of atoms described quantum-mechanically with the electronic density-functional method based on real-space multigrids. The hybrid simulation approach is implemented on parallel computers

Shuji Ogata; Elefterios Lidorikis; Fuyuki Shimojo; Aiichiro Nakano; Priya Vashishta; Rajiv K. Kalia

2001-01-01

342

Scalar and Parallel Optimized Implementation of the Direct Simulation Monte Carlo Method

This paper describes a new concept for the implementation of the direct simulation Monte Carlo (DSMC) method. It uses a localized data structure based on a computational cell to achieve high performance, especially on workstation processors, which can also be used in parallel. Since the data structure makes it possible to freely assign any cell to any processor, a domain

Stefan Dietrich; Iain D. Boyd

1996-01-01

343

LIBOR MARKET MODEL SIMULATION ON AN FPGA PARALLEL MACHINE Xiang Tian and Khaled Benkrid

LIBOR MARKET MODEL SIMULATION ON AN FPGA PARALLEL MACHINE Xiang Tian and Khaled Benkrid derivative based on the LIBOR market model. We implemented this design on the Maxwell FPGA supercomputer of the most popular over-the- counter interest rate option products are: bond options, interest rate caps

Arslan, Tughrul

344

Parallel Simulation of Wireless Networks with TED: Radio Propagation, Mobility and Protocols

We describe the , a parallel simulation testbed for mobile wireless networks. In this article we emphasize the techniques for modeling of radio propagation (long- and short-scale fading and interference) and protocols for integrated radio resource management in mobile wireless voice networks. The testbed includes the standards-based AMPS, NA-TDMA and GSM protocols, and several research-oriented protocol families.

Jignesh Panchal; Owen Kelly; Jie Lai; Narayan B. Mandayam; Andarew T. Ogielski; Roy D. Yates

1998-01-01

345

Three-Dimensional MHD on Cubed-Sphere Grids: Parallel Solution-Adaptive Simulation Framework

Three-Dimensional MHD on Cubed-Sphere Grids: Parallel Solution-Adaptive Simulation Framework L cubed-sphere grid framework is described for simu- lation of magnetohydrodynamic (MHD) space and improves convergence efficiency of the iterative method. The Schwarz preconditioning and block-based data

De Sterck, Hans

346

We explore the emerging application area of physics-based simu- lation for computer animation and visual special effects. I n par- ticular, we examine its parallelization potential and char acterize its behavior on a chip multiprocessor (CMP). Applications in this do- main model and simulate natural phenomena, and often direct vi- sual components of motion pictures. We study a set of

Christopher J. Hughes; Radek Grzeszczuk; Eftychios Sifakis; Daehyun Kim; Sanjeev Kumar; Andrew Selle; Jatin Chhugani; Matthew J. Holliman; Yen-kuang Chen

2007-01-01

347

Acceleration of molecular mechanic simulation by parallelization and fast multipole techniques

Simulations of classical molecular dynamic (MD) systems can be sped up considerably by parallelizing the existing codes for distributed memory machines. In classical MD the CPU time is typically a function of the square of the number of atoms. The size of the molecular system which can be solved is therefore often limited by the CPU available. There are different

Horst Schwichtenberg; G. Winter; H. Wallmeier

1999-01-01

348

SUMMARY The numerical simulation of complex flows demands efficient algorithms and fast computer platforms. The use of adaptive techniques permits adjusting the discretisation according to the analysis requirements, but creates variable computational loads which are difficult to manage in a parallel \\/ vector program. This paper describes the approach we have adopted to implement an adaptive finite element incompressible Navier-Stokes

Álvaro L. G. A. Coutinho

349

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models. PMID:23152751

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

350

Direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

Carroll, C.C.; Owen, J.E.

1988-05-01

351

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×?120× performance gain over various state-of-the-art serial algorithms when simulating different types of models. PMID:23152751

Komarov, Ivan; D'Souza, Roshan M.

2012-01-01

352

Multi-scale molecular simulations of biological systems: Parallelization of RAPTOR for Blue Gene/Q

Multi-scale molecular simulations of biological systems: Parallelization of RAPTOR for Blue Gene with periodic boundary conditions and FFT #12;Target Application Mapping proton Potential of Mean Force in Cc'3+,+"456'3-*,-7' 8#"9'2:' Target Application Mapping proton Potential of Mean Force in CcO with MS-EVB - CcO = 159k

Kemner, Ken

353

This paper presents two new techniques for accelerating circuit simulation. The first technique is an improvement of the parallel Waveform Relaxation Newton (WRN) method. The computations of all the timepoints are executed concurrently. Static task partitioning is shown to be an efficient method to limit the scheduling overhead. The second technique combines in a dynamic way the efficiency of the

Patrick Odent; Luc J. M. Claesen; Hugo De Man

1990-01-01

354

Mobile Agents Based Collective Communication: An Application to a Parallel Plasma Simulation

application. They can also benefit social ability and inter- actions of collaborative agents. Here we present benefit social ability and interactions of collaborative agents. Mobile Agents technology has been widelyMobile Agents Based Collective Communication: An Application to a Parallel Plasma Simulation

Vlad, Gregorio

355

Parallel hp-Finite Element Simulations of 3D Resistivity Logging Instruments

Parallel hp-Finite Element Simulations of 3D Resistivity Logging Instruments M. PaszyÂ´nski1,3 , D-oriented hp-Finite Element Method (FEM) that delivers exponential convergence rates in terms of the quantity at the receiver antenna with minimal number of degrees of freedom in the mesh. The 3D hp finite element mesh

Torres-VerdÃn, Carlos

356

Robust large-scale parallel nonlinear solvers for simulations.

This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.

Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)

2005-11-01

357

Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

358

Relationship between parallel faults and stress field in rock mass based on numerical simulation

NASA Astrophysics Data System (ADS)

Parallel cracks and faults, caused by earthquakes and crustal deformations, are often observed in various scales from regional to laboratory scales. However, the mechanism of formation of these parallel faults has not been quantitatively clarified, yet. Since the stress field plays a key role to the nucleation of parallel faults, it is fundamentally to investigate the failure and the extension of cracks in a large-scale rock mass (not with a laboratory-scale specimen) due to mechanically loaded stress field. In this study, we developed a numerical simulations code for rock mass failures under different loading conditions, and conducted rock failure experiments using this code. We assumed a numerical rock mass consisting of basalt with a rectangular shape for the model. We also assumed the failure of rock mass in accordance with the Mohr-Coulomb criterion, and the distribution of the initial tensile and compressive strength of rock elements to be the Weibull model. In this study, we use the Hamiltonian Particle Method (HPM), one of the particle methods, to represent large deformation and the destruction of materials. Out simulation results suggest that the confining pressure would have dominant influence for the initiation of parallel faults and their conjugates in compressive conditions. We conclude that the shearing force would provoke the propagation of parallel fractures along the shearing direction, but prevent that of fractures to the conjugate direction.

Imai, Y.; Mikada, H.; Goto, T.; Takekawa, J.

2012-12-01

359

This paper first discusses an object-oriented, control architecture and then applies the architecture to produce a real-time software emulator for the Rapid Acquisition of Manufactured Parts (RAMP) flexible manufacturing system (FMS). In specifying the control architecture, the coordinated object is first defined as the primary modeling element. These coordinated objects are then integrated into a Recursive, Object-Oriented Coordination Hierarchy. A new simulation methodology, the Hierarchical Object-Oriented Programmable Logic Simulator, is then employed to model the interactions among the coordinated objects. The final step in implementing the emulator is to distribute the models of the coordinated objects over a network of computers and to synchronize their operation to a real-time clock. The paper then introduces the Hierarchical Subsystem Controller as an intelligent controller for the coordinated object. The proposed approach to intelligent control is then compared to the concept of multiresolutional semiosis that has been developed by Dr. Alex Meystel. Finally, the plans for implementing an intelligent controller for the RAMP FMS are discussed.

Davis, W.J.; Macro, J.G.; Brook, A.L. [Univ. of Illinois, Urbana, IL (United States)] [and others

1996-12-31

360

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

NASA Astrophysics Data System (ADS)

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.

Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji

2013-03-01

361

Application of parallel computing to seismic damage process simulation of an arch dam

NASA Astrophysics Data System (ADS)

The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

Zhong, Hong; Lin, Gao; Li, Jianbo

2010-06-01

362

Midpoint cell method for hybrid (MPI+OpenMP) parallelization of molecular dynamics simulations.

We have developed a new hybrid (MPI+OpenMP) parallelization scheme for molecular dynamics (MD) simulations by combining a cell-wise version of the midpoint method with pair-wise Verlet lists. In this scheme, which we call the midpoint cell method, simulation space is divided into subdomains, each of which is assigned to a MPI processor. Each subdomain is further divided into small cells. The interaction between two particles existing in different cells is computed in the subdomain containing the midpoint cell of the two cells where the particles reside. In each MPI processor, cell pairs are distributed over OpenMP threads for shared memory parallelization. The midpoint cell method keeps the advantages of the original midpoint method, while filtering out unnecessary calculations of midpoint checking for all the particle pairs by single midpoint cell determination prior to MD simulations. Distributing cell pairs over OpenMP threads allows for more efficient shared memory parallelization compared with distributing atom indices over threads. Furthermore, cell grouping of particle data makes better memory access, reducing the number of cache misses. The parallel performance of the midpoint cell method on the K computer showed scalability up to 512 and 32,768 cores for systems of 20,000 and 1 million atoms, respectively. One MD time step for long-range interactions could be calculated within 4.5 ms even for a 1 million atoms system with particle-mesh Ewald electrostatics. PMID:24659253

Jung, Jaewoon; Mori, Takaharu; Sugita, Yuji

2014-05-30

363

NASA Technical Reports Server (NTRS)

This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

Lyons, Daniel T.; Desai, Prasun N.

2005-01-01

364

Parallel 3D Multi-Stage Simulation of a Turbofan Engine

NASA Technical Reports Server (NTRS)

A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.

Turner, Mark G.; Topp, David A.

1998-01-01

365

libMesh: a C++ library for parallel adaptive mesh refinement\\/coarsening simulations

In this paper we describe the \\u000a libMesh\\u000a (http:\\/\\/libmesh.sourceforge.net) framework for parallel adaptive finite element applications. \\u000a libMesh\\u000a is an open-source software library that has been developed to facilitate serial and parallel simulation of multiscale, multiphysics\\u000a applications using adaptive mesh refinement and coarsening strategies. The main software development is being carried out\\u000a in the CFDLab (http:\\/\\/cfdlab.ae.utexas.edu) at the University of Texas, but

Benjamin S. Kirk; John W. Peterson; Roy H. Stogner; Graham F. Carey

2006-01-01

366

Design of a real-time wind turbine simulator using a custom parallel architecture

NASA Technical Reports Server (NTRS)

The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

Hoffman, John A.; Gluck, R.; Sridhar, S.

1995-01-01

367

A new parallel P3M code for very large-scale cosmological simulations

NASA Astrophysics Data System (ADS)

We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

1998-12-01

368

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276

Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C

2010-12-01

369

A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

Bui, Trong T.

1999-01-01

370

Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the S?-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

2014-01-01

371

Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution

We investigate properties of the electric field component parallel to the magnetic field (E/sub /parallel//) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component. We emphasize particularly the spatial location of E/sub /parallel//, the concept of a diffusion zone and the role of E/sub /parallel// in accelerating electrons. We find a localization of the region of enhanced E/sub /parallel// in all space directions with a strong concentration in the z direction. We identify this region as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B/sub y/ implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B/sub y/ field is strong enough to force particles to follow field lines through the diffusion region. We estimate that for a typical net B/sub y/ field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the northern (southern) hemisphere should dominate for duskward (dawnward) net B/sub y/. In addition, we observe a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras. 12 refs., 9 figs.

Hesse, M.; Birn, J.

1989-01-01

372

NASA Astrophysics Data System (ADS)

We study the applicability of parallelized/vectorized Monte Carlo (MC) algorithms to the simulation of domain growth in two-dimensional lattice gas models undergoing an ordering process after a rapid quench below an order-disorder transition temperature. As examples we consider models with 2×1 and c(2×2) equilibrium superstructures on the square and rectangular lattices, respectively. We also study the case of phase separation ("1×1" islands) on the square lattice. A generalized parallel checkerboard algorithm for Kawasaki dynamics is shown to give rise to artificial spatial correlations in all three models. However, only if superstructure domains evolve do these correlations modify the kinetics by influencing the nucleation process and result in a reduced growth exponent compared to the value from the conventional heat bath algorithm with random single-site updates. In order to overcome these artificial modifications, two MC algorithms with a reduced degree of parallelism ("hybrid" and "mask" algorithms, respectively) are presented and applied. As the results indicate, these algorithms are suitable for the simulation of superstructure domain growth on parallel/vector computers.

Schleier, W.; Besold, G.; Heinz, K.

1992-02-01

373

The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

2010-07-15

374

Adaptive finite element simulation of flow and transport applications on parallel computers

NASA Astrophysics Data System (ADS)

The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the design and to demonstrate the capability for resolving complex multiscale processes efficiently and reliably. The first application considered is the simulation of chemotactic biological systems such as colonies of Escherichia coli. This work appears to be the first application of AMR to chemotactic processes. These systems exhibit transient, highly localized features and are important in many biological processes, which make them ideal for simulation with adaptive techniques. A nonlinear reaction-diffusion model for such systems is described and a finite element formulation is developed. The solution methodology is described in detail. Several phenomenological studies are conducted to study chemotactic processes and resulting biological patterns which use the parallel adaptive refinement capability developed in this work. The other application study is much more extensive and deals with fine scale interactions for important hypersonic flows arising in aerospace applications. These flows are characterized by highly nonlinear, convection-dominated flowfields with very localized features such as shock waves and boundary layers. These localized features are well-suited to simulation with adaptive techniques. A novel treatment of the inviscid flux terms arising in a streamline-upwind Petrov-Galerkin finite element formulation of the compressible Navier-Stokes equations is also presented and is found to be superior to the traditional approach. The parallel adaptive finite element formulation is then applied to several complex flow studies, culminating in fully three-dimensional viscous flows about complex geometries such as the Space Shuttle Orbiter. Physical phenomena such as viscous/inviscid interaction, shock wave/boundary layer interaction, shock/shock interaction, and unsteady acoustic-driven flowfield response are considered in detail. A computational investigation of a 25°/55° double cone configuration details the complex multiscale flow features and investigates a potential source of experimentally-observed unsteady flowfield response.

Kirk, Benjamin Shelton

375

Precise evaluation of the facet reflection is highly desirable in design and simulation of optoelectronic devices such as super-luminescent light emitting diodes (SLEDs) and semiconductor optical amplifiers (SOAs) where ultra low facet reflection must be achieved. In this paper, the three-dimensional (3D) finite-difference time-domain (FDTD) method has been implemented on a parallel computing algorithm for the calculation of facet reflection

D. Labukhin; Xun Li

2004-01-01

376

Precise evaluation of facet reflection is highly desirable in the design and simulation of optoelectronic devices such as super-luminescent light emitting diodes (SLEDs) and semiconductor optical amplifiers (SOAs). In this study, the Three-Dimensional (3D) Finite-Difference Time-Domain (FDTD) method was implemented on a parallel computing algorithm for the calculation of facet reflection in optical waveguides. The FDTD provides the versatility necessary

Dmitry Labukhin; Xun Li

2005-01-01

377

NASA Technical Reports Server (NTRS)

An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.

Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas

2000-01-01

378

Massively Parallel Spectral Element Large Eddy Simulation of a Turbulent Channel Using Wall Models

MASSIVELY-PARALLEL SPECTRAL ELEMENT LARGE EDDY SIMULATION OF A TURBULENT CHANNEL USING WALL MODELS A Thesis by JOSHUA IAN RABAU Submitted to the O ce of Graduate Studies of Texas A&M University in partial ful llment of the requirements... for the degree of MASTER OF SCIENCE Approved by: Chair of Committee, Andrew Duggleby Committee Members, Je-Chin Han Jean-Luc Guermond Head of Department, Andreas A. Polycarpou May 2013 Major Subject: Mechanical Engineering Copyright 2013 Joshua Ian Rabau...

Rabau, Joshua I

2013-05-01

379

Parallel simulations of wireless networks with TED: radio propagation, mobility and protocols

We describe the TeD\\/C++ implementation of WiPPET, a parallel simulation testbed for mobile wireless networks. In this article we emphasize the techniques for modeling of radio propagation (long- and short-scale fading and interference) and protocols for integrated radio resource management in mobile wireless voice networks. The testbed includes the standards-based AMPS, NA-TDMA and GSM protocols, and several research-oriented protocol families.

Jignesh Panchal; Owen Kelly; Jie Lai; Narayan Mandayam; Andarew T. Ogielski; Roy Yates

1998-01-01

380

WiPPET, a Virtual Testbed for Parallel Simulations of Wireless Networks

We describe the TED\\/C++implementation of WIPPET,a parallel simulation testbed for evaluating radio resourcemanagement algorithms and wireless transport protocols.Versions 0.3 and 0.4 of the testbed model radio propagation(long-- and short--scale fading and interference) andprotocols for integrated radio resource management inmobile wireless voice networks including the standards--based AMPS, NA--TDMA and GSM protocols, and severalresearch--oriented protocol families. We provide...

Jignesh Panchal; Owen Kelly; Jie Lai; Narayan B. Mandayam; Andrew T. Ogielski; Roy D. Yates

1998-01-01

381

WiPPET, a virtual testbed for parallel simulations of wireless networks

We describe the TED\\/C++ implementation of WIPPET, a parallel simulation testbedfor evaluating radio resource management algorithms and wireless transport protocols. Versions 0.3 and 0.4 of the testbed model radio propagation (long- and short-scale fading and inteqerence) and protocols for integrated radio resource management in mobile wireless voice networks including the stundardsbused AMPS, NA-TDMA and GSM protocols, and several research-orientedprotocolfamilies. We

Jignesh Panchal; Owen Kelly; Jie Lai; Narayan Mandayam; Andrew T. Ogielski; Roy Yates

1998-01-01

382

A scalable parallel algorithm for large-scale reactive force-field molecular dynamics simulations

A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles- based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions (n 4 explicitly and 6 due

Ken-Ichi Nomura; Rajiv K. Kalia; Aiichiro Nakano; Priya Vashishta

2008-01-01

383

Study of fluctuations in advanced MOSFETs using a 3D finite element parallel simulator

Two important new sources of fluctuations in nanoscaled MOSFETs are the polysilicon gates and the introduction of high-? gate\\u000a dielectrics. Using a 3D parallel drift-diffusion device simulator, we study the influence of the polycrystal grains in polysilicon\\u000a and in the high-? dielectric on the device threshold for MOSFETs with gate lengths of 80 and 25 nm. We model the surface

M. Aldegunde; A. J. García-Loureiro; K. Kalna; A. Asenov

2006-01-01

384

Construction of a parallel processor for simulating manipulators and other mechanical systems

NASA Technical Reports Server (NTRS)

This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

Hannauer, George

1991-01-01

385

Parallel Algorithm for Simulation of Circuit and One-Way Quantum Computation Models

In this paper we present the software to simulate circuits and one-way quantum computation models in parallel environments\\u000a build from PC workstations connected by the standard Ethernet network. We describe the main vector state transformation and\\u000a its application to one and multi-qubit gate application process. We also show the realisation of the measurement process in\\u000a non-standard bases. We present a

Marek Sawerwain

2007-01-01

386

NASA Astrophysics Data System (ADS)

Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more readily than smaller ones. Nucleation initiates at the center of the rod and grows outward.

Lee, Nicholas Jabari Ouma

387

Simulation Programming with Python

Chapter 4 Simulation Programming with Python This chapter shows how simulations of some of the examples in Chap. 3 can be programmed using Python and the SimPy simulation library[1]. The goals-based discrete-event simulation library for Python. It is open source and released under the M license. Sim

Nelson, Barry L.

388

NASA Technical Reports Server (NTRS)

This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

Morgan, Philip E.

2004-01-01

389

Efficient parallelization of short-range molecular dynamics simulations on many-core systems.

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark simulations on a typical 12-core machine show that the described algorithm achieves excellent parallel efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These simulations demonstrate that the algorithm scales well to large numbers of cores. PMID:24329381

Meyer, R

2013-11-01

390

The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.

Holkundkar, Amol R. [Department of Physics, Birla Institute of Technology and Science, Pilani-333 031 (India)] [Department of Physics, Birla Institute of Technology and Science, Pilani-333 031 (India)

2013-11-15

391

Modeling of Time in Discrete-Event Simulation of Systems-on-Chip Giovanni Funchal, Matthieu Moy

prototypes range from very high-level, application simula- tors such as that included in the iPhone SDK [1 hardware blocks (typically: CPUs, DMAs, memories, timers). The behavior is described in con- current CPU0 Software CPU1 Fast interconnection Slow interconnection DMA Memory Custom HW block Figure 1

Paris-Sud XI, UniversitÃ© de

392

Heavy industries operate equipment having a long life to generate revenue or perform a mission. These industries must invest in the specialized service parts needed to maintain their equipment, because unlike in other ...

Bradley, Randolph L. (Randolph Lewis)

2012-01-01

393

Modular production systems are an excellent possibility to meet the increasing complexity of production tasks, which result form a rising number of simultaneously produced product variants. These systems consist very often of standard modules, which can be connected randomly via a fixed linked transportation system; so different requirements can be accomplished. This flexibility results in a high degree of complexity

K. Feldmann; M. Weber; W. Wolf; G. Meckl

2004-01-01

394

A Case Study of Web Server Benchmarking Using Parallel WAN Emulation

A Case Study of Web Server Benchmarking Using Parallel WAN Emulation Carey Williamson, Rob Simmonds, AB, Canada T2N1N4 Abstract This paper describes the use of a parallel discrete-event network emulator called the Internet Protocol Traffic and Network Emulator (IP-TNE) for Web server benchmarking

Williamson, Carey

395

Object Oriented Monte Carlo Simulations of Parallel Plate Capacitively Coupled Discharges

NASA Astrophysics Data System (ADS)

Object oriented models programmed using Java are suited to the modeling and simulation of complex plasma processes as complex physics can be incorporated into the code in a straightforward fashion and the coding structure lends itself to parallelization. This paper describes the details of the object oriented based implementation of the Monte Carlo simulation. In general, the discharge gap is broken down into uniform slabs. Each slab corresponds to a stand-alone Monte Carlo simulation of the plasma species in that slab. After a time-step, statistics are collected using a Legendre Polynomial Weighted Sampling object and the simulation continued by an object that does a B-Spline fit to those statistics. A boundary object deals with the particle transmission to walls or to other slab objects. This process will be described using both electronegative and electropositive gas discharges as examples.

Horie, I.; Ventzek, P. L. G.; Kitamori, K.

1998-10-01

396

NASA Astrophysics Data System (ADS)

The Geophysical Finite Element Simulation Tool (GeoFEST) can be used to simulate and produce synthetic observable time-dependent surface deformations over both short and long time scales. Such simulations aid in interpretation of GPS, InSar and other geodetic techniques that will require detailed analysis as increasingly large data volumes from NASA remote sensing programs are developed and deployed. The NASA Earth Science Technology Office Computational Technologies Program (ESTO/CT) has funded extensions to GeoFEST to support larger-scale simulations, adaptive methods, and scalability across a variety of parallel computing systems. The software and hardware technologies applied to make this transition, as well as additional near-term development plans for GeoFEST, will be described.

Norton, C. D.; Lyzenga, G. A.; Parker, J. W.; Tisdale, E. R.

2004-12-01

397

An efficient parallel iterative method for finite element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE). Simple

Kengo Nakajima

2005-01-01

398

Parallel Verlet Neighbor List Algorithm for GPU-Optimized MD Simulations

NASA Astrophysics Data System (ADS)

How biomolecules fold and assemble into well-defined structures that correspond to cellular functions is a fundamental problem in biophysics. Molecular dynamics (MD) simulations provide a molecular-resolution physical description of the folding and assembly processes, but the computational demands of the algorithms restrict the size and the timescales one can simulate. In a recent study, we introduced a parallel neighbor list algorithm that was specifically optimized for MD simulations on GPUs. We now analyze the performance of our MD simulation code that incorporates the algorithm, and we observe that the force calculations and the evaluation of the neighbor list and pair lists constitute a majority of the overall execution time. The overall speedup of the GPU-optimized MD simulations as compared to the CPU-optimized version is N-dependent and ˜ 30x for the full 70s ribosome (10,219 beads). The pair and neighbor list evaluations have performance speedups of ˜ 25x and ˜ 55x, respectively. We then make direct comparisons with the performance of our MD simulation code with that of the SOP model implemented in the simulation code of HOOMD, a leading general particle dynamics simulation package that is specifically optimized for GPUs.

Cho, Samuel

2013-03-01

399

Massively parallel Monte Carlo for many-particle simulations on GPUs

Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.

Anderson, Joshua A.; Jankowski, Eric [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Grubb, Thomas L. [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Engel, Michael [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)] [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Glotzer, Sharon C., E-mail: sglotzer@umich.edu [Department of Chemical Engineering, University of Michigan, Ann Arbor, MI 48109 (United States); Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI 48109 (United States)

2013-12-01

400

a Parallelized Genetic Algorithm Joseph Cornish*, Robert Forder**, Ivan Erill*, Matthias K. Gobbert** *Department a genetic algorithm in parallel using a server-client organization to simulate the evolution are not able to recognize correlation information in binding sites. We implement a genetic algorithm

Gobbert, Matthias K.

401

NASA Astrophysics Data System (ADS)

This presentation deals with a continuation of work on object oriented models programmed using Java. The motivation is to develop a suite of Java-based physics models that lend themselves to parallelization (i.e. execution on distributed systems). Overall, the work is an object oriented based Monte Carlo simulation of a parallel plate discharge in 1D with mobile ions and electrons. The discharge gap is broken down into uniform slabs. Each slab corresponds to a stand-alone Monte Carlo simulation of the plasma species in that slab. After a time-step, statistics are collected using a Legendre Polynomial Weighted Sampling object and the simulation continued by an object that does a B-Spline fit to those statistics. Stable execution of this method in reasonable times depends on the sampling technique. Therefore, the presentation details the development a three-dimensional sampling method that allows better capture of the role of secondary electrons in the discharge. Results illustrating program performance will be shown for rf discharges in silane and argon and compared with fluid simulation results.

Horie, I.; Ohmori, Y.; Ventzek, Plg; Kitamori, K.

1999-10-01

402

A novel parallel-rotation algorithm for atomistic Monte Carlo simulation of dense polymer systems

NASA Astrophysics Data System (ADS)

We develop and test a new elementary Monte Carlo move for use in the off-lattice simulation of polymer systems. This novel Parallel-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The parallel-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo simulation. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo simulation of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice simulation of realistic models of dense polymer systems.

Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.

2001-06-01

403

Parallel, adaptive, multi-object trajectory integrator for space simulation applications

NASA Astrophysics Data System (ADS)

Computer simulation is a very helpful approach for improving results from space born experiments. Initial-value problems (IVPs) can be applied for modeling dynamics of different objects - artificial Earth satellites, charged particles in magnetic and electric fields, charged or non-charged dust particles, space debris. An ordinary differential equations systems (ODESs) integrator based on applying different order embedded Runge-Kutta-Fehlberg methods is developed. These methods enable evaluation of the local error. Instead of step-size control based on local error evaluation, an optimal integration method is selected. Integration while meeting the required local error proceeds with constant-sized steps. This optimal scheme selection reduces the amount of calculation needed for solving the IVPs. In addition, for an implementation on a multi core processor and parallelization based on threads application, we describe how to solve multiple systems of IVPs efficiently in parallel. The proposed integrator allows the application of a different force model for every object in multi-satellite simulation models. Simultaneous application of the integrator toward different kinds of problems in the frames of one combined simulation model is possible too. The basic application of the integrator is solving mechanical IVPs in the context of simulation models and their application in complex multi-satellite space missions and as a design tool for experiments.

Atanassov, Atanas Marinov

2014-10-01

404

Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations

The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.

Perumalla, Kalyan S [ORNL

2009-01-01

405

NASA Astrophysics Data System (ADS)

We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ˜10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm-3, bubble formation and restructuring of the water layers are observed.

Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M. H.; Najafi, Bijan

2012-11-01

406

Parallelization of Particle-Particle, Particle-Mesh Method within N-Body Simulation

NSDL National Science Digital Library

The N-Body problem has become an intricate part of the computational sciences, and there has been rise to many methods to solve and approximate the problem. The solution potentially requires on the order of calculations each time step, therefore efficient performance of these N-Body algorithms is very significant [5]. This work describes the parallelization and optimization of the Particle-Particle, Particle-Mesh (P3M) algorithm within GalaxSeeHPC, an open-source N-Body Simulation code. Upon successful profiling, MPI (Message Passing Interface) routines were implemented into the population of the density grid in the P3M method in GalaxSeeHPC. Each problem size recorded different results, and for a problem set dealing with 10,000 celestial bodies, speedups up to 10x were achieved. However, in accordance to Amdahl's Law, maximum speedups for the code should have been closer to 16x. In order to achieve maximum optimization, additional research is needed and parallelization of the Fourier Transform routines could prove to be rewarding. In conclusion, the GalaxSeeHPC Simulation was successfully parallelized and obtained very respectable results, while further optimization remains possible.

Nocito, Nicholas

407

Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.

Aaby, Brandon G [ORNL; Perumalla, Kalyan S [ORNL; Seal, Sudip K [ORNL

2010-01-01

408

GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358

Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

409

Embedded Microclusters in Zeolites and Cluster Beam Sputtering -- Simulation on Parallel Computers

This report summarizes the research carried out under DOE supported program (DOE/ER/45477) Computer Science--during the course of this project. Large-scale molecular-dynamics (MD) simulations were performed to investigate: (1) sintering of microporous and nanophase Si{sub 3}N{sub 4}; (2) crack-front propagation in amorphous silica; (3) phonons in highly efficient multiscale algorithms and dynamic load-balancing schemes for mapping process, structural correlations, and mechanical behavior including dynamic fracture in graphitic tubules; and (4) amorphization and fracture in nanowires. The simulations were carried out with irregular atomistic simulations on distributed-memory parallel architectures. These research activities resulted in fifty-three publications and fifty-five invited presentations.

Greenwell, Donald L.; Kalia, Rajiv K.; Vashishta, Priya

1996-12-01

410

Xyce parallel electronic simulator design : mathematical formulation, version 2.0.

This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.

Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.

2004-06-01

411

Steepening of parallel propagating hydromagnetic waves into magnetic pulsations - A simulation study

NASA Technical Reports Server (NTRS)

The steepening mechanism of parallel propagating low-frequency MHD-like waves observed upstream of the earth's quasi-parallel bow shock has been investigated by means of electromagnetic hybrid simulations. It is shown that an ion beam through the resonant electromagnetic ion/ion instability excites large-amplitude waves, which consequently pitch angle scatter, decelerate, and eventually magnetically trap beam ions in regions where the wave amplitudes are largest. As a result, the beam ions become bunched in both space and gyrophase. As these higher-density, nongyrotropic beam segments are formed, the hydromagnetic waves rapidly steepen, resulting in magnetic pulsations, with properties generally in agreement with observations. This steepening process operates on the scale of the linear growth time of the resonant ion/ion instability. Many of the pulsations generated by this mechanism are left-hand polarized in the spacecraft frame.

Akimoto, K.; Winske, D.; Onsager, T. G.; Thomsen, M. F.; Gary, S. P.

1991-01-01

412

Superposition-Enhanced Estimation of Optimal Temperature Spacings for Parallel Tempering Simulations

Effective parallel tempering simulations rely crucially on a properly chosen sequence of temperatures. While it is desirable to achieve a uniform exchange acceptance rate across neighboring replicas, finding a set of temperatures that achieves this end is often a difficult task, in particular for systems undergoing phase transitions. Here we present a method for determination of optimal replica spacings, which is based upon knowledge of local minima in the potential energy landscape. Working within the harmonic superposition approximation, we derive an analytic expression for the parallel tempering acceptance rate as a function of the replica temperatures. For a particular system and a given database of minima, we show how this expression can be used to determine optimal temperatures that achieve a desired uniform acceptance rate. We test our strategy for two atomic clusters that exhibit broken ergodicity, demonstrating that our method achieves uniform acceptance as well as significant efficiency gains. PMID:25512744

2014-01-01

413

Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and optimize. Performance depends both on system characteristics such as the floating point rate, the memory hierarchy, and the interconnect performance, as well as input matrix characteristics such as such as the number and location of nonzeros. We present LUsim, a simulation framework for modeling the performance of sparse LU factorization. Our framework uses micro-benchmarks to calibrate the parameters of machine characteristics and additional tools to facilitate real-time performance modeling. We are using LUsim to analyze an existing parallel sparse LU factorization code, and to explore a latency tolerant variant. We developed and validated a model of the factorization in SuperLU_DIST, then we modeled and implemented a new variant of slud, replacing a blocking collective communication phase with a non-blocking asynchronous point-to-point one. Our strategy realized a mean improvement of 11percent over a suite of test matrices.

Univ. of California, San Diego; Li, Xiaoye Sherry; Cicotti, Pietro; Li, Xiaoye Sherry; Baden, Scott B.

2008-04-15

414

FLY. A parallel tree N-body code for cosmological simulations

NASA Astrophysics Data System (ADS)

FLY is a parallel treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: parallel program. The user can select the number of processors >=1 Has the code been vectorized or parallelized?: parallelized Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: Parallel tree N-body code for cosmological simulations Nature of physical problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles simulation on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY uses the one-side communications libraries: the SHMEM library on the Cray T3E system and Sgi Origin system, and the LAPI library on IBM SP system

Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.

2003-10-01

415

Supporting the Development of Resilient Message Passing Applications using Simulation

An emerging aspect of high-performance computing (HPC) hardware/software co-design is investigating performance under failure. The work in this paper extends the Extreme-scale Simulator (xSim), which was designed for evaluating the performance of message passing interface (MPI) applications on future HPC architectures, with fault-tolerant MPI extensions proposed by the MPI Fault Tolerance Working Group. xSim permits running MPI applications with millions of concurrent MPI ranks, while observing application performance in a simulated extreme-scale system using a lightweight parallel discrete event simulation. The newly added features offer user-level failure mitigation (ULFM) extensions at the simulated MPI layer to support algorithm-based fault tolerance (ABFT). The presented solution permits investigating performance under failure and failure handling of ABFT solutions. The newly enhanced xSim is the very first performance tool that supports ULFM and ABFT.

Naughton, III, Thomas J [ORNL; Engelmann, Christian [ORNL] [ORNL; Vallee, Geoffroy R [ORNL] [ORNL; Boehm, Swen [ORNL] [ORNL

2014-01-01

416

NASA Astrophysics Data System (ADS)

I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for parallel programming is also provided by allowing users to select which simulation variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.

Honkonen, I.

2014-07-01

417

NASA Astrophysics Data System (ADS)

Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.

Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles

2007-11-01

418

We present three-dimensional hybrid simulations of collisionless shocks that propagate parallel to the background magnetic field to study the acceleration of protons that forms a high-energy tail on the distribution. We focus on the initial acceleration of thermal protons and compare it with results from one-dimensional simulations. We find that for both one- and three-dimensional simulations, particles that end up in the high-energy tail of the distribution later in the simulation gained their initial energy right at the shock. This confirms previous results but is the first to demonstrate this using fully three-dimensional fields. The result is not consistent with the ''thermal leakage'' model. We also show that the gyrocenters of protons in the three-dimensional simulation can drift away from the magnetic field lines on which they started due to the removal of ignorable coordinates that exist in one- and two-dimensional simulations. Our study clarifies the injection problem for diffusive shock acceleration.

Guo Fan [Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States); Giacalone, Joe, E-mail: guofan.ustc@gmail.com [Department of Planetary Sciences and Lunar and Planetary Laboratory, University of Arizona, 1629 E. University Blvd., Tucson, AZ 85721 (United States)

2013-08-20

419

Xyce parallel electronic simulator users%3CU%2B2019%3E guide, version 6.0.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G. [Raytheon, Albuquerque, NM

2013-08-01

420

NASA Astrophysics Data System (ADS)

The features of high-precision numerical simulation of the Earth satellite motion using parallel computing are discussed on example the implementation of the cluster "Skiff Cyberia" software complex "Numerical model of the motion of system satellites". It is shown that the use of 128 bit word length allows considering weak perturbations from the high-order harmonics in the expansion of the geopotential and the effect of strain geopotential harmonics arising due to the combination of tidal perturbations associated with exposure to the moon and sun on the solid Earth and its oceans.

Chuvashov, I. N.

2010-12-01

421

Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop

Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.

Ghosh, K K

2011-11-07

422

Simple LabVIEW DC Circuit Simulation With Parallel Resistors: Overview

NSDL National Science Digital Library

This is a downloadable simple DC circuit simulation with 2 resistors in parallel with a third resistor. This is useful for studying Ohm's Law. Users can adjust the voltage and the resistors while the current changes in real time, just like the real thing. Users are then asked whether the current increases or decreases as the ohms of the resistors increases. Includes instructions on how to measure DC / AC current. This free program requires Windows 9x, NT, XP or later. Note that this will NOT run on Mac OS.

423

A parallel multigrid preconditioner for the simulation of large fracture networks

Computational modeling of a fracture in disordered materials using discrete lattice models requires the solution of a linear system of equations every time a new lattice bond is broken. Solving these linear systems of equations successively is the most expensive part of fracture simulations using large three-dimensional networks. In this paper, we present a parallel multigrid preconditioned conjugate gradient algorithm to solve these linear systems. Numerical experiments demonstrate that this algorithm performs significantly better than the algorithms previously used to solve this problem.

Sampath, Rahul S [ORNL; Barai, Pallab [ORNL; Nukala, Phani K [ORNL

2010-01-01

424

pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions. PMID:24732497

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

425

Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097

Hammond, G E; Lichtner, P C; Mills, R T

2014-01-01

426

Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069

Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim

2013-01-01

427

Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069

Pesce, Lorenzo L.; Lee, Hyong C.; Stevens, Rick L.

2013-01-01

428

Mechanical analysis of parallel manipulators with simulation, design, and control applications

NASA Astrophysics Data System (ADS)

The kinematics and dynamics for the purposes of analysis, control, simulation, and design of general platform-type parallel manipulators are discussed. Two new methods of direct kinematics for displacement analysis are proposed. Velocity and acceleration analyses for general kinematic architectures are fully studied. Furthermore, kinematic singularities are classified based on their nature. Architecture singularities and architecture conditioning are deeply studied and incorporated into design strategies, along with design examples. Moreover, formulation singularities are also given due attention with case studies. In dynamics modeling, the method of the natural orthogonal complement is applied such that the resulting models are structurally algorithmic, computationally efficient, and numerically robust - essential properties for the implementation of more sophisticated control strategies. Efficient inverse and direct dynamics algorithms are developed based on the dynamics models in both joint space and Cartesian space. The algorithms were implemented with a general software package that is available for dynamics control and motion simulation of platform manipulators. As practical applications, the dynamics modeling and simulation of some commercial flight simulators are included. Finally, the concept of dynamic isotropy is introduced, which allows one to evaluate the motion/force performance of a manipulator with respect to control and simulation. Application strategies of this concept to some robotics problems such as design, trajectory planning, and inverse kinematics are discussed along with examples.

Ma, Ou

1991-02-01

429

Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development

NASA Technical Reports Server (NTRS)

NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.

Mangieri, Mark L.; Hoang, June

2014-01-01

430

Monte Carlo Simulations of Nonlinear Particle Acceleration in Parallel Trans-relativistic Shocks

We present results from a Monte Carlo simulation of a parallel collisionless shock undergoing particle acceleration. Our simulation, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and "smooth" the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-...

Ellison, Donald C; Bykov, Andrei M

2013-01-01

431

We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the ear...

Buntemeyer, Lars; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E

2015-01-01

432

We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.

Deiterding, Ralf [ORNL; Wood, Stephen L [University of Tennessee, Knoxville (UTK)

2013-01-01

433

A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.

Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A., E-mail: bharath@u.northwestern.edu [Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL (United States)

2013-02-15

434

MDSLB: A new static load balancing method for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simulation efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.

Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang

2014-02-01

435

Molecular-dynamics simulations of self-assembled monolayers (SAM) on parallel computers

NASA Astrophysics Data System (ADS)

The purpose of this dissertation is to investigate the properties of self-assembled monolayers, particularly alkanethiols and Poly (ethylene glycol) terminated alkanethiols. These simulations are based on realistic interatomic potentials and require scalable and portable multiresolution algorithms implemented on parallel computers. Large-scale molecular dynamics simulations of self-assembled alkanethiol monolayer systems have been carried out using an all-atom model involving a million atoms to investigate their structural properties as a function of temperature, lattice spacing and molecular chain-length. Results show that the alkanethiol chains tilt from the surface normal by a collective angle of 25° along next-nearest neighbor direction at 300K. At 350K the system transforms to a disordered phase characterized by small tilt angle, flexible tilt direction, and random distribution of backbone planes. With increasing lattice spacing, a, the tilt angle increases rapidly from a nearly zero value at a = 4.7A to as high as 34° at a = 5.3A at 300K. We also studied the effect of end groups on the tilt structure of SAM films. We characterized the system with respect to temperature, the alkane chain length, lattice spacing, and the length of the end group. We found that the gauche defects were predominant only in the tails, and the gauche defects increased with the temperature and number of EG units. Effect of electric field on the structure of poly (ethylene glycol) (PEG) terminated alkanethiol self assembled monolayer (SAM) on gold has been studied using parallel molecular dynamics method. An applied electric field triggers a conformational transition from all-trans to a mostly gauche conformation. The polarity of the electric field has a significant effect on the surface structure of PEG leading to a profound effect on the hydrophilicity of the surface. The electric field applied anti-parallel to the surface normal causes a reversible transition to an ordered state in which the oxygen atoms are exposed. On the other hand, an electric field applied in a direction parallel to the surface normal introduces considerable disorder in the system and the oxygen atoms are buried inside.

Vemparala, Satyavani

436

Application of a 3D, Adaptive, Parallel, MHD Code to Supernova Remnant Simulations

NASA Astrophysics Data System (ADS)

We at Michigan have a computational model, BATS-R-US, which incorporates several modern features that make it suitable for calculations of supernova remnant evolution. In particular, it is a three-dimensional MHD model, using a method called the Multiscale Adaptive Upwind Scheme for MagnetoHydroDynamics (MAUS-MHD). It incorporates a data structure that allows for adaptive refinement of the mesh, even in massively parallel calculations. Its advanced Godunov method, a solution-adaptive, upwind, high-resolution scheme, incorporates a new, flux-based approach to the Riemann solver with improved numerical properties. This code has been successfully applied to several problems, including the simulation of comets and of planetary magnetospheres, in the 3D context of the Heliosphere. The code was developed under a NASA computational grand challenge grant to run very rapidly on parallel platforms. It is also now being used to study time-dependent systems such as the transport of particles and energy from solar coronal mass ejections to the Earth. We are in the process of modifying this code so that it can accommodate the very strong shocks present in supernova remnants. Our test case simulates the explosion of a star of 1.4 solar masses with an energy of 1 foe, in a uniform background medium. We have performed runs of 250,000 to 1 million cells on 8 nodes of an Origin 2000. These relatively coarse grids do not allow fine details of instabilities to become visible. Nevertheless, the macroscopic evolution of the shock is simulated well, with the forward and reverse shocks visible in velocity profiles. We will show our work to date. This work was supported by NASA through its GSRP program.

Kominsky, P.; Drake, R. P.; Powell, K. G.

2001-05-01

437

Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach was evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.

University of Miami; Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.

2013-04-30

438

NASA Astrophysics Data System (ADS)

Molecular dynamics (MD) simulations of RDX is carried out using the ReaxFF force field supplied with the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). Validation of ReaxFF to model RDX is carried out by extracting the (i) crystal unit cell parameters, (ii) bulk modulus and (iii) thermal expansion coefficient and comparing with reported values from both experiments and simulations.

Warrier, M.; Pahari, P.; Chaturvedi, S.

2010-12-01

439

Molecular dynamics (MD) simulations of RDX is carried out using the ReaxFF force field supplied with the Large-scale Atomic\\/Molecular Massively Parallel Simulator (LAMMPS). Validation of ReaxFF to model RDX is carried out by extracting the (i) crystal unit cell parameters, (ii) bulk modulus and (iii) thermal expansion coefficient and comparing with reported values from both experiments and simulations.

M. Warrier; P. Pahari; S. Chaturvedi

2010-01-01

440

This paper presents a generalized, parallel imple- mentation methodology for real-time simulation of ac machine transients in an FPGA-based real-time simulator. The proposed method adopts nanosecond range simulation time-step and exploits the large response time of a rotating machine to: 1) eliminate the need for predictive-corrective action for the machine electrical and mechanical variables, 2) decouple the solution of the

Mahmoud Matar; Reza Iravani

2011-01-01

441

Infrastructure for distributed enterprise simulation

Traditional discrete-event simulations employ an inherently sequential algorithm and are run on a single computer. However, the demands of many real-world problems exceed the capabilities of sequential simulation systems. Often the capacity of a computer`s primary memory limits the size of the models that can be handled, and in some cases parallel execution on multiple processors could significantly reduce the simulation time. This paper describes the development of an Infrastructure for Distributed Enterprise Simulation (IDES) - a large-scale portable parallel simulation framework developed to support Sandia National Laboratories` mission in stockpile stewardship. IDES is based on the Breathing-Time-Buckets synchronization protocol, and maps a message-based model of distributed computing onto an object-oriented programming model. IDES is portable across heterogeneous computing architectures, including single-processor systems, networks of workstations and multi-processor computers with shared or distributed memory. The system provides a simple and sufficient application programming interface that can be used by scientists to quickly model large-scale, complex enterprise systems. In the background and without involving the user, IDES is capable of making dynamic use of idle processing power available throughout the enterprise network. 16 refs., 14 figs.

Johnson, M.M.; Yoshimura, A.S.; Goldsby, M.E. [and others

1998-01-01

442

We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.

Hwang, F-N [Department of Mathematics, National Central University, Jhongli 320, Taiwan (China)], E-mail: hwangf@math.ncu.edu.tw; Wei, Z-H [Department of Mathematics, National Central University, Jhongli 320, Taiwan (China)], E-mail: socrates.wei@gmail.com; Huang, T-M [Department of Mathematics, National Taiwan Normal University, Taipei 116, Taiwan (China)], E-mail: min@math.ntnu.edu.tw; Wang Weichung [Department of Mathematics, National Taiwan University, Taipei 106, Taiwan (China)], E-mail: wwang@math.ntu.edu.tw

2010-04-20

443

NASA Astrophysics Data System (ADS)

The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP-ODC code is ready now for real world petascale earthquake simulations. This GPU-based code has demonstrated excellent weak scaling up to the full Titan scale and achieved 2.3 PetaFLOPs sustained computation performance in single precision. The production simulation demonstrated the first 0-10Hz deterministic rough fault simulation. Using the accelerated AWP-ODC, Southern California Earthquake Center (SCEC) has recently created the physics-based probablistic seismic hazard analysis model of the Los Angeles region, CyberShake 14.2, as of the time of the dissertation writing. The tensor-valued wavefield code based on this GPU research has dramatically reduced time-to-solution, making a statewide hazard model a goal reachable with existing heterogeneous supercomputers.

Zhou, Jun

444

NASA Astrophysics Data System (ADS)

As a result of continual space activity since the 1950s, there are now a large number of man-made Resident Space Objects (RSOs) orbiting the Earth. Because of the large number of items and their relative speeds, the possibility of destructive collisions involving important space assets is now of significant concern to users and operators of space-borne technologies. As a result, a growing number of international agencies are researching methods for improving techniques to maintain Space Situational Awareness (SSA). Computer simulation is a method commonly used by many countries to validate competing methodologies prior to full scale adoption. The use of supercomputing and/or reduced scale testing is often necessary to effectively simulate such a complex problem on todays computers. Recently the authors presented a simulation aimed at reducing the computational burden by selecting the minimum level of fidelity necessary for contrasting methodologies and by utilising multi-core CPU parallelism for increased computational efficiency. The resulting simulation runs on a single PC while maintaining the ability to effectively evaluate competing methodologies. Nonetheless, the ability to control the scale and expand upon the computational demands of the sensor management system is limited. In this paper, we examine the advantages of increasing the parallelism of the simulation by means of General Purpose computing on Graphics Processing Units (GPGPU). As many sub-processes pertaining to SSA management are independent, we demonstrate how parallelisation via GPGPU has the potential to significantly enhance not only research into techniques for maintaining SSA, but also to enhance the level of sophistication of existing space surveillance sensors and sensor management systems. Nonetheless, the use of GPGPU imposes certain limitations and adds to the implementation complexity, both of which require consideration to achieve an effective system. We discuss these challenges and how they can be overcome. We further describe an application of the parallelised system where visibility prediction is used to enhance sensor management. This facilitates significant improvement in maximum catalogue error when RSOs become temporarily unobservable. The objective is to demonstrate the enhanced scalability and increased computational capability of the system.

Hobson, T.; Clarkson, V.

2012-09-01

445

Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices

This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.

Wang Jianguo [Northwest Institute of Nuclear Technology, P.O. Box 69-1, Xi'an, Shaanxi 710024 (China); Key Laboratory of Physical Electronics and Devices of the Ministry of Education, Xi'an Jiaotong University, Xi'an, Shaanxi 710049 (China); Chen Zaigao; Wang Yue; Zhang Dianhui; Qiao Hailiang; Fu Meiyan; Yuan Yuan [Northwest Institute of Nuclear Technology, P.O. Box 69-1, Xi'an, Shaanxi 710024 (China); Liu Chunliang; Li Yongdong; Wang Hongguang [Key Laboratory of Physical Electronics and Devices of the Ministry of Education, Xi'an Jiaotong University, Xi'an, Shaanxi 710049 (China)

2010-07-15

446

Experiences with serial and parallel algorithms for channel routing using simulated annealing

NASA Technical Reports Server (NTRS)

Two algorithms for channel routing using simulated annealing are presented. Simulated annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of simulated annealing. The algorithm was implemented as a serial program for a workstation, and a parallel program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.

Brouwer, Randall Jay

1988-01-01

447

Simulation of Sub-Scale Ion Optics on a Parallel Cluster

NASA Astrophysics Data System (ADS)

Accurate prediction of the cross-over and perveance impingement limits of an ion optics system is very important to determine the feasible operational envelope of an ion thruster. Impingement of beamlet ions on the acceleration grid at either limit results in excessive sputter erosion of the acceleration grid, which is one of the major failure mechanisms. We developed the streamline hybrid-grid immersed-finite-element particle-in-cell (HG-IFE-PIC) model for three-dimensional simulations of plasma flow in ion optics. The model is designed to handle the boundary conditions at grid surfaces accurately while maintaining the computational speed of a standard PIC code. Direct three-dimensional simulations of quarter-subscale gridlets of seven apertures could be performed routinely even using powerful PCs. However, larger subscale gridlets having more apertures were out of reach of a single machine. In this paper, we parallelize the hybrid-grid immersed-finite-element (HG-IFE-IFE) using two-dimensional decomposition of the PIC and IFE physical domains to allow for the direct three-dimensional simulation of subscale gridlets including as many apertures as limited by the number of available computing nodes. Results will be compared against experimental measurements taken for a set of subscale gridlets with 7, 19, and 37 apertures.

Kafafy, Raed; Wang, Joseph

2008-11-01

448

Effect of parallel currents on drift-interchange turbulence: Comparison of simulation and experiment

Two-dimensional (2D) turbulence simulations are reported in which the balancing of the parallel and perpendicular currents is modified by changing the axial boundary condition (BC) to vary the sheath conductivity. The simulations are carried out using the 2D scrape-off-layer turbulence (SOLT) code. The results are compared with recent experiments on the controlled shear de-correlation experiment (CSDX) in which the axial BC was modified by changing the composition of the end plate. Reasonable qualitative agreement is found between the simulations and the experiment. When an insulating axial BC is used, broadband turbulence is obtained and an inverse cascade occurs down to low frequencies and long spatial scales. Robust sheared flows are obtained. By contrast, employing a conducting BC at the plate resulted in coherent (drift wave) modes rather than broadband turbulence, with weaker inverse cascade, and smaller zonal flows. The dependence of the two instability mechanisms (rotationally driven interchange mode and drift waves) on the axial BC is also discussed.

D'Ippolito, D. A.; Russell, D. A.; Myra, J. R. [Lodestar Research Corporation, 2400 Central Avenue, Boulder, Colorado 80301 (United States); Thakur, S. C.; Tynan, G. R.; Holland, C. [Center for Momentum Transport and Flow Organization, University of California at San Diego, San Diego, California 92093 (United States)

2012-10-15

449

A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking

NASA Astrophysics Data System (ADS)

Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.

Hussein, I.; MacMillan, R.

2014-09-01

450

NASA Technical Reports Server (NTRS)

A high-performance platform for development of real-time helicopter flight simulations based on a simulation development and analysis platform combining a parallel simulation development and analysis environment with a scalable multiprocessor computer system is described. Simulation functional decomposition is covered, including the sequencing and data dependency of simulation modules and simulation functional mapping to multiple processors. The multiprocessor-based implementation of a blade-element simulation of the UH-60 helicopter is presented, and a prototype developed for a TC2000 computer is generalized in order to arrive at a portable multiprocessor software architecture. It is pointed out that the proposed approach coupled with a pilot's station creates a setting in which simulation engineers, computer scientists, and pilots can work together in the design and evaluation of advanced real-time helicopter simulations.

Moxon, Bruce C.; Green, John A.

1990-01-01

451

NASA Astrophysics Data System (ADS)

The work is devoted to 3D and 2D parallel numerical computation of pressure and velocity fields around an elastically supported airfoil self-oscillating due to interaction with the airflow. Numerical solution is computed in the OpenFOAM package, an open-source software package based on finite volume method. Movement of airfoil is described by translation and rotation, identified from experimental data. A new boundary condition for the 2DOF motion of the airfoil was implemented. The results of numerical simulations (velocity) are compared with data measured in a wind tunnel, where a physical model of NACA0015 airfoil was mounted and tuned to exhibit the flutter instability. The experimental results were obtained previously in the Institute of Thermomechanics by interferographic measurements in a subsonic wind tunnel in Nový Knín.

?idký, Václav; Šidlof, Petr

2014-03-01

452

Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale

NASA Astrophysics Data System (ADS)

This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains.

Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.

2014-06-01

453

NASA Astrophysics Data System (ADS)

A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-parallel laser pulses is proposed. Particle-in-cell simulation shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 × 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4° is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.

Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu

2014-11-01

454

NASA Astrophysics Data System (ADS)

Oxidation of a flat aluminum (111) s