Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Running Parallel Discrete Event Simulators on Sierra
Barnes, P. D.; Jefferson, D. R.
2015-12-03
In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.
An adaptive synchronization protocol for parallel discrete event simulation
Bisset, K.R.
1998-12-01
Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.
The effects of parallel processing architectures on discrete event simulation
NASA Astrophysics Data System (ADS)
Cave, William; Slatt, Edward; Wassmer, Robert E.
2005-05-01
As systems become more complex, particularly those containing embedded decision algorithms, mathematical modeling presents a rigid framework that often impedes representation to a sufficient level of detail. Using discrete event simulation, one can build models that more closely represent physical reality, with actual algorithms incorporated in the simulations. Higher levels of detail increase simulation run time. Hardware designers have succeeded in producing parallel and distributed processor computers with theoretical speeds well into the teraflop range. However, the practical use of these machines on all but some very special problems is extremely limited. The inability to use this power is due to great difficulties encountered when trying to translate real world problems into software that makes effective use of highly parallel machines. This paper addresses the application of parallel processing to simulations of real world systems of varying inherent parallelism. It provides a brief background in modeling and simulation validity and describes a parameter that can be used in discrete event simulation to vary opportunities for parallel processing at the expense of absolute time synchronization and is constrained by validity. It focuses on the effects of model architecture, run-time software architecture, and parallel processor architecture on speed, while providing an environment where modelers can achieve sufficient model accuracy to produce valid simulation results. It describes an approach to simulation development that captures subject area expert knowledge to leverage inherent parallelism in systems in the following ways: * Data structures are separated from instructions to track which instruction sets share what data. This is used to determine independence and thus the potential for concurrent processing at run-time. * Model connectivity (independence) can be inspected visually to determine if the inherent parallelism of a physical system is properly represented. Models need not be changed to move from a single processor to parallel processor hardware architectures. * Knowledge of the architectural parallelism is stored within the system and used during run time to allocate processors to processes in a maximally efficient way.
Parallel discrete-event simulation of FCFS stochastic queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Parallel discrete event simulation: A shared memory approach
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
The cost of conservative synchronization in parallel discrete event simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Synchronous parallel system for emulation and discrete event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1992-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
Model for the evolution of the time profile in optimistic parallel discrete event simulations
NASA Astrophysics Data System (ADS)
Ziganurova, L.; Novotny, M. A.; Shchur, L. N.
2016-02-01
We investigate synchronisation aspects of an optimistic algorithm for parallel discrete event simulations (PDES). We present a model for the time evolution in optimistic PDES. This model evaluates the local virtual time profile of the processing elements. We argue that the evolution of the time profile is reminiscent of the surface profile in the directed percolation problem and in unrestricted surface growth. We present results of the simulation of the model and emphasise predictive features of our approach.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip
2010-01-01
We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that spatial scattering can often obviate the need for dynamic load balancing.
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1998-01-01
The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.
The IDES framework: A case study in development of a parallel discrete-event simulation system
Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.
1997-12-31
This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.
Distributed discrete event simulation. Final report
De Vries, R.C.
1988-02-01
The presentation given here is restricted to discrete event simulation. The complexity of and time required for many present and potential discrete simulations exceeds the reasonable capacity of most present serial computers. The desire, then, is to implement the simulations on a parallel machine. However, certain problems arise in an effort to program the simulation on a parallel machine. In one category of methods deadlock care arise and some method is required to either detect deadlock and recover from it or to avoid deadlock through information passing. In the second category of methods, potentially incorrect simulations are allowed to proceed. If the situation is later determined to be incorrect, recovery from the error must be initiated. In either case, computation and information passing are required which would not be required in a serial implementation. The net effect is that the parallel simulation may not be much better than a serial simulation. In an effort to determine alternate approaches, important papers in the area were reviewed. As a part of that review process, each of the papers was summarized. The summary of each paper is presented in this report in the hopes that those doing future work in the area will be able to gain insight that might not otherwise be available, and to aid in deciding which papers would be most beneficial to pursue in more detail. The papers are broken down into categories and then by author. Conclusions reached after examining the papers and other material, such as direct talks with an author, are presented in the last section. Also presented there are some ideas that surfaced late in the research effort. These promise to be of some benefit in limiting information which must be passed between processes and in better understanding the structure of a distributed simulation. Pursuit of these ideas seems appropriate.
Optimization of Operations Resources via Discrete Event Simulation Modeling
NASA Technical Reports Server (NTRS)
Joshi, B.; Morris, D.; White, N.; Unal, R.
1996-01-01
The resource levels required for operation and support of reusable launch vehicles are typically defined through discrete event simulation modeling. Minimizing these resources constitutes an optimization problem involving discrete variables and simulation. Conventional approaches to solve such optimization problems involving integer valued decision variables are the pattern search and statistical methods. However, in a simulation environment that is characterized by search spaces of unknown topology and stochastic measures, these optimization approaches often prove inadequate. In this paper, we have explored the applicability of genetic algorithms to the simulation domain. Genetic algorithms provide a robust search strategy that does not require continuity and differentiability of the problem domain. The genetic algorithm successfully minimized the operation and support activities for a space vehicle, through a discrete event simulation model. The practical issues associated with simulation optimization, such as stochastic variables and constraints, were also taken into consideration.
On constructing optimistic simulation algorithms for the discrete event system specification
Nutaro, James J
2008-01-01
This article describes a Time Warp simulation algorithm for discrete event models that are described in terms of the Discrete Event System Specification (DEVS). The article shows how the total state transition and total output function of a DEVS atomic model can be transformed into an event processing procedure for a logical process. A specific Time Warp algorithm is constructed around this logical process, and it is shown that the algorithm correctly simulates a DEVS coupled model that consists entirely of interacting atomic models. The simulation algorithm is presented abstractly; it is intended to provide a basis for implementing efficient and scalable parallel algorithms that correctly simulate DEVS models.
Synchronization of autonomous objects in discrete event simulation
NASA Technical Reports Server (NTRS)
Rogers, Ralph V.
1990-01-01
Autonomous objects in event-driven discrete event simulation offer the potential to combine the freedom of unrestricted movement and positional accuracy through Euclidean space of time-driven models with the computational efficiency of event-driven simulation. The principal challenge to autonomous object implementation is object synchronization. The concept of a spatial blackboard is offered as a potential methodology for synchronization. The issues facing implementation of a spatial blackboard are outlined and discussed.
Reversible Discrete Event Formulation and Optimistic Parallel Execution of Vehicular Traffic Models
Yoginath, Srikanth B; Perumalla, Kalyan S
2009-01-01
Vehicular traffic simulations are useful in applications such as emergency planning and traffic management. High speed of traffic simulations translates to speed of response and level of resilience in those applications. Discrete event formulation of traffic flow at the level of individual vehicles affords both the flexibility of simulating complex scenarios of vehicular flow behavior as well as rapid simulation time advances. However, efficient parallel/distributed execution of the models becomes challenging due to synchronization overheads. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Our approach resolves the challenges that arise in parallel execution of microscopic, vehicular-level models of traffic. We apply a reverse computation-based optimistic execution approach to address the parallel synchronization problem. This is achieved by formulating a reversible version of a discrete event model of vehicular traffic, and by utilizing this reversible model in an optimistic execution setting. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed close to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance. The benefits of optimistic execution are demonstrated, including a speed up of nearly 20 on 32 processors observed on a vehicular network of over 65,000 intersections and over 13 million vehicles.
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models
Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over the $\\mu$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.
Quality Improvement With Discrete Event Simulation: A Primer for Radiologists.
Booker, Michael T; O'Connell, Ryan J; Desai, Bhushan; Duddalwar, Vinay A
2016-04-01
The application of simulation software in health care has transformed quality and process improvement. Specifically, software based on discrete-event simulation (DES) has shown the ability to improve radiology workflows and systems. Nevertheless, despite the successful application of DES in the medical literature, the power and value of simulation remains underutilized. For this reason, the basics of DES modeling are introduced, with specific attention to medical imaging. In an effort to provide readers with the tools necessary to begin their own DES analyses, the practical steps of choosing a software package and building a basic radiology model are discussed. In addition, three radiology system examples are presented, with accompanying DES models that assist in analysis and decision making. Through these simulations, we provide readers with an understanding of the theory, requirements, and benefits of implementing DES in their own radiology practices. PMID:26922594
Advances in Discrete-Event Simulation for MSL Command Validation
NASA Technical Reports Server (NTRS)
Patrikalakis, Alexander; O'Reilly, Taifun
2013-01-01
In the last five years, the discrete event simulator, SEQuence GENerator (SEQGEN), developed at the Jet Propulsion Laboratory to plan deep-space missions, has greatly increased uplink operations capacity to deal with increasingly complicated missions. In this paper, we describe how the Mars Science Laboratory (MSL) project makes full use of an interpreted environment to simulate change in more than fifty thousand flight software parameters and conditional command sequences to predict the result of executing a conditional branch in a command sequence, and enable the ability to warn users whenever one or more simulated spacecraft states change in an unexpected manner. Using these new SEQGEN features, operators plan more activities in one sol than ever before.
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena
Perumalla, Kalyan S; Seal, Sudip K
2011-01-01
In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallel execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.
Metrics for Availability Analysis Using a Discrete Event Simulation Method
Schryver, Jack C; Nutaro, James J; Haire, Marvin Jonathan
2012-01-01
The system performance metric 'availability' is a central concept with respect to the concerns of a plant's operators and owners, yet it can be abstract enough to resist explanation at system levels. Hence, there is a need for a system-level metric more closely aligned with a plant's (or, more generally, a system's) raison d'etre. Historically, availability of repairable systems - intrinsic, operational, or otherwise - has been defined as a ratio of times. This paper introduces a new concept of availability, called endogenous availability, defined in terms of a ratio of quantities of product yield. Endogenous availability can be evaluated using a discrete event simulation analysis methodology. A simulation example shows that endogenous availability reduces to conventional availability in a simple series system with different processing rates and without intermediate storage capacity, but diverges from conventional availability when storage capacity is progressively increased. It is shown that conventional availability tends to be conservative when a design includes features, such as in - process storage, that partially decouple the components of a larger system.
Enhancing Complex System Performance Using Discrete-Event Simulation
Allgood, Glenn O; Olama, Mohammed M; Lake, Joe E
2010-01-01
In this paper, we utilize discrete-event simulation (DES) merged with human factors analysis to provide the venue within which the separation and deconfliction of the system/human operating principles can occur. A concrete example is presented to illustrate the performance enhancement gains for an aviation cargo flow and security inspection system achieved through the development and use of a process DES. The overall performance of the system is computed, analyzed, and optimized for the different system dynamics. Various performance measures are considered such as system capacity, residual capacity, and total number of pallets waiting for inspection in the queue. These metrics are performance indicators of the system's ability to service current needs and respond to additional requests. We studied and analyzed different scenarios by changing various model parameters such as the number of pieces per pallet ratio, number of inspectors and cargo handling personnel, number of forklifts, number and types of detection systems, inspection modality distribution, alarm rate, and cargo closeout time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted performance measures identified effective ways to meet inspection requirements while maintaining or reducing overall operational cost and eliminating any shipping delays associated with any proposed changes in inspection requirements. With this understanding effective operational strategies can be developed to optimally use personnel while still maintaining plant efficiency, reducing process interruptions, and holding or reducing costs.
Discrete event simulation of fuel transfer strategies for defueling a nuclear reactor.
Garcia, H. E.; Houshyar, A.; Engineering Division; Western Michigan Univ.
1998-02-01
Fuel handling and conditioning activities for the decommissioning of the Experimental Breeder Reactor Il are being performed at Argonne National Laboratory. Discrete event simulation and optimization techniques are being investigated to plan, supervise and perform these activities in such a way that productivity can be improved. The idea is to characterize this operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for the given scenarios. This paper describes an application of discrete event simulation to evaluate fuel transfer strategies and identify optimal solutions among distributed resources.
DISCRETE EVENT SIMULATION OF OPTICAL SWITCH MATRIX PERFORMANCE IN COMPUTER NETWORKS
Imam, Neena; Poole, Stephen W
2013-01-01
In this paper, we present application of a Discrete Event Simulator (DES) for performance modeling of optical switching devices in computer networks. Network simulators are valuable tools in situations where one cannot investigate the system directly. This situation may arise if the system under study does not exist yet or the cost of studying the system directly is prohibitive. Most available network simulators are based on the paradigm of discrete-event-based simulation. As computer networks become increasingly larger and more complex, sophisticated DES tool chains have become available for both commercial and academic research. Some well-known simulators are NS2, NS3, OPNET, and OMNEST. For this research, we have applied OMNEST for the purpose of simulating multi-wavelength performance of optical switch matrices in computer interconnection networks. Our results suggest that the application of DES to computer interconnection networks provides valuable insight in device performance and aids in topology and system optimization.
Ghany, Ahmad; Vassanji, Karim; Kuziemsky, Craig; Keshavjee, Karim
2013-01-01
Electronic prescribing (e-prescribing) is expected to bring many benefits to Canadian healthcare, such as a reduction in errors and adverse drug reactions. As there currently is no functioning e-prescribing system in Canada that is completely electronic, we are unable to evaluate the performance of a live system. An alternative approach is to use simulation modeling for evaluation. We developed two discrete-event simulation models, one of the current handwritten prescribing system and one of a proposed e-prescribing system, to compare the performance of these two systems. We were able to compare the number of processes in each model, workflow efficiency, and the distribution of patients or prescriptions. Although we were able to compare these models to each other, using discrete-event simulation software was challenging. We were limited in the number of variables we could measure. We discovered non-linear processes and feedback loops in both models that could not be adequately represented using discrete-event simulation software. Finally, interactions between entities in both models could not be modeled using this type of software. We have come to the conclusion that a more appropriate approach to modeling both the handwritten and electronic prescribing systems would be to use a complex adaptive systems approach using agent-based modeling or systems-based modeling. PMID:23388319
Discrete event simulation of the Defense Waste Processing Facility (DWPF) analytical laboratory
Shanahan, K.L.
1992-02-01
A discrete event simulation of the Savannah River Site (SRS) Defense Waste Processing Facility (DWPF) analytical laboratory has been constructed in the GPSS language. It was used to estimate laboratory analysis times at process analytical hold points and to study the effect of sample number on those times. Typical results are presented for three different simultaneous representing increasing levels of complexity, and for different sampling schemes. Example equipment utilization time plots are also included. SRS DWPF laboratory management and chemists found the simulations very useful for resource and schedule planning.
Using Discrete Event Simulation to predict KPI's at a Projected Emergency Room.
Concha, Pablo; Neriz, Liliana; Parada, Danilo; Ramis, Francisco
2015-01-01
Discrete Event Simulation (DES) is a powerful factor in the design of clinical facilities. DES enables facilities to be built or adapted to achieve the expected Key Performance Indicators (KPI's) such as average waiting times according to acuity, average stay times and others. Our computational model was built and validated using expert judgment and supporting statistical data. One scenario studied resulted in a 50% decrease in the average cycle time of patients compared to the original model, mainly by modifying the patient's attention model. PMID:26262262
A Framework for the Optimization of Discrete-Event Simulation Models
NASA Technical Reports Server (NTRS)
Joshi, B. D.; Unal, R.; White, N. H.; Morris, W. D.
1996-01-01
With the growing use of computer modeling and simulation, in all aspects of engineering, the scope of traditional optimization has to be extended to include simulation models. Some unique aspects have to be addressed while optimizing via stochastic simulation models. The optimization procedure has to explicitly account for the randomness inherent in the stochastic measures predicted by the model. This paper outlines a general purpose framework for optimization of terminating discrete-event simulation models. The methodology combines a chance constraint approach for problem formulation, together with standard statistical estimation and analyses techniques. The applicability of the optimization framework is illustrated by minimizing the operation and support resources of a launch vehicle, through a simulation model.
Discrete-event simulation for the design and evaluation of physical protection systems
Jordan, S.E.; Snell, M.K.; Madsen, M.M.; Smith, J.S.; Peters, B.A.
1998-08-01
This paper explores the use of discrete-event simulation for the design and control of physical protection systems for fixed-site facilities housing items of significant value. It begins by discussing several modeling and simulation activities currently performed in designing and analyzing these protection systems and then discusses capabilities that design/analysis tools should have. The remainder of the article then discusses in detail how some of these new capabilities have been implemented in software to achieve a prototype design and analysis tool. The simulation software technology provides a communications mechanism between a running simulation and one or more external programs. In the prototype security analysis tool, these capabilities are used to facilitate human-in-the-loop interaction and to support a real-time connection to a virtual reality (VR) model of the facility being analyzed. This simulation tool can be used for both training (in real-time mode) and facility analysis and design (in fast mode).
DeMO: An Ontology for Discrete-event Modeling and Simulation
Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S
2011-01-01
Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Basham, Bryan D.
1989-01-01
CONFIG is a modeling and simulation tool prototype for analyzing the normal and faulty qualitative behaviors of engineered systems. Qualitative modeling and discrete-event simulation have been adapted and integrated, to support early development, during system design, of software and procedures for management of failures, especially in diagnostic expert systems. Qualitative component models are defined in terms of normal and faulty modes and processes, which are defined by invocation statements and effect statements with time delays. System models are constructed graphically by using instances of components and relations from object-oriented hierarchical model libraries. Extension and reuse of CONFIG models and analysis capabilities in hybrid rule- and model-based expert fault-management support systems are discussed.
A conceptual modeling framework for discrete event simulation using hierarchical control structures
Furian, N.; O’Sullivan, M.; Walker, C.; Vössner, S.; Neubacher, D.
2015-01-01
Conceptual Modeling (CM) is a fundamental step in a simulation project. Nevertheless, it is only recently that structured approaches towards the definition and formulation of conceptual models have gained importance in the Discrete Event Simulation (DES) community. As a consequence, frameworks and guidelines for applying CM to DES have emerged and discussion of CM for DES is increasing. However, both the organization of model-components and the identification of behavior and system control from standard CM approaches have shortcomings that limit CM’s applicability to DES. Therefore, we discuss the different aspects of previous CM frameworks and identify their limitations. Further, we present the Hierarchical Control Conceptual Modeling framework that pays more attention to the identification of a models’ system behavior, control policies and dispatching routines and their structured representation within a conceptual model. The framework guides the user step-by-step through the modeling process and is illustrated by a worked example. PMID:26778940
A generic discrete-event simulation model for outpatient clinics in a large public hospital.
Weerawat, Waressara; Pichitlamken, Juta; Subsombat, Peerapong
2013-01-01
The orthopedic outpatient department (OPD) ward in a large Thai public hospital is modeled using Discrete-Event Stochastic (DES) simulation. Key Performance Indicators (KPIs) are used to measure effects across various clinical operations during different shifts throughout the day. By considering various KPIs such as wait times to see doctors, percentage of patients who can see a doctor within a target time frame, and the time that the last patient completes their doctor consultation, bottlenecks are identified and resource-critical clinics can be prioritized. The simulation model quantifies the chronic, high patient congestion that is prevalent amongst Thai public hospitals with very high patient-to-doctor ratios. Our model can be applied across five different OPD wards by modifying the model parameters. Throughout this work, we show how DES models can be used as decision-support tools for hospital management. PMID:23778015
Developing Flexible Discrete Event Simulation Models in an Uncertain Policy Environment
NASA Technical Reports Server (NTRS)
Miranda, David J.; Fayez, Sam; Steele, Martin J.
2011-01-01
On February 1st, 2010 U.S. President Barack Obama submitted to Congress his proposed budget request for Fiscal Year 2011. This budget included significant changes to the National Aeronautics and Space Administration (NASA), including the proposed cancellation of the Constellation Program. This change proved to be controversial and Congressional approval of the program's official cancellation would take many months to complete. During this same period an end-to-end discrete event simulation (DES) model of Constellation operations was being built through the joint efforts of Productivity Apex Inc. (PAl) and Science Applications International Corporation (SAIC) teams under the guidance of NASA. The uncertainty in regards to the Constellation program presented a major challenge to the DES team, as to: continue the development of this program-of-record simulation, while at the same time remain prepared for possible changes to the program. This required the team to rethink how it would develop it's model and make it flexible enough to support possible future vehicles while at the same time be specific enough to support the program-of-record. This challenge was compounded by the fact that this model was being developed through the traditional DES process-orientation which lacked the flexibility of object-oriented approaches. The team met this challenge through significant pre-planning that led to the "modularization" of the model's structure by identifying what was generic, finding natural logic break points, and the standardization of interlogic numbering system. The outcome of this work resulted in a model that not only was ready to be easily modified to support any future rocket programs, but also a model that was extremely structured and organized in a way that facilitated rapid verification. This paper discusses in detail the process the team followed to build this model and the many advantages this method provides builders of traditional process-oriented discrete event simulations.
Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling
NASA Technical Reports Server (NTRS)
Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.
2010-01-01
NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand DES capabilities to address KSC's planning needs.
NASA Astrophysics Data System (ADS)
Andreev, Victor P.; Head, Trajen; Johnson, Neil; Deo, Sapna K.; Daunert, Sylvia; Goldschmidt-Clermont, Pascal J.
2013-05-01
Sudden Cardiac Death (SCD) is responsible for at least 180,000 deaths a year and incurs an average cost of $286 billion annually in the United States alone. Herein, we present a novel discrete event simulation model of SCD, which quantifies the chains of events associated with the formation, growth, and rupture of atheroma plaques, and the subsequent formation of clots, thrombosis and on-set of arrhythmias within a population. The predictions generated by the model are in good agreement both with results obtained from pathological examinations on the frequencies of three major types of atheroma, and with epidemiological data on the prevalence and risk of SCD. These model predictions allow for identification of interventions and importantly for the optimal time of intervention leading to high potential impact on SCD risk reduction (up to 8-fold reduction in the number of SCDs in the population) as well as the increase in life expectancy.
The effects of indoor environmental exposures on pediatric asthma: a discrete event simulation model
2012-01-01
Background In the United States, asthma is the most common chronic disease of childhood across all socioeconomic classes and is the most frequent cause of hospitalization among children. Asthma exacerbations have been associated with exposure to residential indoor environmental stressors such as allergens and air pollutants as well as numerous additional factors. Simulation modeling is a valuable tool that can be used to evaluate interventions for complex multifactorial diseases such as asthma but in spite of its flexibility and applicability, modeling applications in either environmental exposures or asthma have been limited to date. Methods We designed a discrete event simulation model to study the effect of environmental factors on asthma exacerbations in school-age children living in low-income multi-family housing. Model outcomes include asthma symptoms, medication use, hospitalizations, and emergency room visits. Environmental factors were linked to percent predicted forced expiratory volume in 1 second (FEV1%), which in turn was linked to risk equations for each outcome. Exposures affecting FEV1% included indoor and outdoor sources of NO2 and PM2.5, cockroach allergen, and dampness as a proxy for mold. Results Model design parameters and equations are described in detail. We evaluated the model by simulating 50,000 children over 10 years and showed that pollutant concentrations and health outcome rates are comparable to values reported in the literature. In an application example, we simulated what would happen if the kitchen and bathroom exhaust fans were improved for the entire cohort, and showed reductions in pollutant concentrations and healthcare utilization rates. Conclusions We describe the design and evaluation of a discrete event simulation model of pediatric asthma for children living in low-income multi-family housing. Our model simulates the effect of environmental factors (combustion pollutants and allergens), medication compliance, seasonality, and medical history on asthma outcomes (symptom-days, medication use, hospitalizations, and emergency room visits). The model can be used to evaluate building interventions and green building construction practices on pollutant concentrations, energy savings, and asthma healthcare utilization costs, and demonstrates the value of a simulation approach for studying complex diseases such as asthma. PMID:22989068
Discrete event simulation tool for analysis of qualitative models of continuous processing systems
NASA Technical Reports Server (NTRS)
Malin, Jane T. (Inventor); Basham, Bryan D. (Inventor); Harris, Richard A. (Inventor)
1990-01-01
An artificial intelligence design and qualitative modeling tool is disclosed for creating computer models and simulating continuous activities, functions, and/or behavior using developed discrete event techniques. Conveniently, the tool is organized in four modules: library design module, model construction module, simulation module, and experimentation and analysis. The library design module supports the building of library knowledge including component classes and elements pertinent to a particular domain of continuous activities, functions, and behavior being modeled. The continuous behavior is defined discretely with respect to invocation statements, effect statements, and time delays. The functionality of the components is defined in terms of variable cluster instances, independent processes, and modes, further defined in terms of mode transition processes and mode dependent processes. Model construction utilizes the hierarchy of libraries and connects them with appropriate relations. The simulation executes a specialized initialization routine and executes events in a manner that includes selective inherency of characteristics through a time and event schema until the event queue in the simulator is emptied. The experimentation and analysis module supports analysis through the generation of appropriate log files and graphics developments and includes the ability of log file comparisons.
Towards High Performance Discrete-Event Simulations of Smart Electric Grids
Perumalla, Kalyan S; Nutaro, James J; Yoginath, Srikanth B
2011-01-01
Future electric grid technology is envisioned on the notion of a smart grid in which responsive end-user devices play an integral part of the transmission and distribution control systems. Detailed simulation is often the primary choice in analyzing small network designs, and the only choice in analyzing large-scale electric network designs. Here, we identify and articulate the high-performance computing needs underlying high-resolution discrete event simulation of smart electric grid operation large network scenarios such as the entire Eastern Interconnect. We focus on the simulator's most computationally intensive operation, namely, the dynamic numerical solution for the electric grid state, for both time-integration as well as event-detection. We explore solution approaches using general-purpose dense and sparse solvers, and propose a scalable solver specialized for the sparse structures of actual electric networks. Based on experiments with an implementation in the THYME simulator, we identify performance issues and possible solution approaches for smart grid experimentation in the large.
NASA Astrophysics Data System (ADS)
Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia
The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
StratBAM: A Discrete-Event Simulation Model to Support Strategic Hospital Bed Capacity Decisions.
Devapriya, Priyantha; Strömblad, Christopher T B; Bailey, Matthew D; Frazier, Seth; Bulger, John; Kemberling, Sharon T; Wood, Kenneth E
2015-10-01
The ability to accurately measure and assess current and potential health care system capacities is an issue of local and national significance. Recent joint statements by the Institute of Medicine and the Agency for Healthcare Research and Quality have emphasized the need to apply industrial and systems engineering principles to improving health care quality and patient safety outcomes. To address this need, a decision support tool was developed for planning and budgeting of current and future bed capacity, and evaluating potential process improvement efforts. The Strategic Bed Analysis Model (StratBAM) is a discrete-event simulation model created after a thorough analysis of patient flow and data from Geisinger Health System's (GHS) electronic health records. Key inputs include: timing, quantity and category of patient arrivals and discharges; unit-level length of care; patient paths; and projected patient volume and length of stay. Key outputs include: admission wait time by arrival source and receiving unit, and occupancy rates. Electronic health records were used to estimate parameters for probability distributions and to build empirical distributions for unit-level length of care and for patient paths. Validation of the simulation model against GHS operational data confirmed its ability to model real-world data consistently and accurately. StratBAM was successfully used to evaluate the system impact of forecasted patient volumes and length of stay in terms of patient wait times, occupancy rates, and cost. The model is generalizable and can be appropriately scaled for larger and smaller health care settings. PMID:26310949
Discrete event simulation for exploring strategies: an urban water management case.
Huang, Dong-Bin; Scholz, Roland W; Gujer, Willi; Chitwood, Derek E; Loukopoulos, Peter; Schertenleib, Roland; Siegrist, Hansruedi
2007-02-01
This paper presents a model structure aimed at offering an overview of the various elements of a strategy and exploring their multidimensional effects through time in an efficient way. It treats a strategy as a set of discrete events planned to achieve a certain strategic goal and develops a new form of causal networks as an interfacing component between decision makers and environment models, e.g., life cycle inventory and material flow models. The causal network receives a strategic plan as input in a discrete manner and then outputs the updated parameter sets to the subsequent environmental models. Accordingly, the potential dynamic evolution of environmental systems caused by various strategies can be stepwise simulated. It enables a way to incorporate discontinuous change in models for environmental strategy analysis, and enhances the interpretability and extendibility of a complex model by its cellular constructs. It is exemplified using an urban water management case in Kunming, a major city in Southwest China. By utilizing the presented method, the case study modeled the cross-scale interdependencies of the urban drainage system and regional water balance systems, and evaluated the effectiveness of various strategies for improving the situation of Dianchi Lake. PMID:17328203
Discrete event simulation for healthcare organizations: a tool for decision making.
Hamrock, Eric; Paige, Kerrie; Parks, Jennifer; Scheulen, James; Levin, Scott
2013-01-01
Healthcare organizations face challenges in efficiently accommodating increased patient demand with limited resources and capacity. The modern reimbursement environment prioritizes the maximization of operational efficiency and the reduction of unnecessary costs (i.e., waste) while maintaining or improving quality. As healthcare organizations adapt, significant pressures are placed on leaders to make difficult operational and budgetary decisions. In lieu of hard data, decision makers often base these decisions on subjective information. Discrete event simulation (DES), a computerized method of imitating the operation of a real-world system (e.g., healthcare delivery facility) over time, can provide decision makers with an evidence-based tool to develop and objectively vet operational solutions prior to implementation. DES in healthcare commonly focuses on (1) improving patient flow, (2) managing bed capacity, (3) scheduling staff, (4) managing patient admission and scheduling procedures, and (5) using ancillary resources (e.g., labs, pharmacies). This article describes applicable scenarios, outlines DES concepts, and describes the steps required for development. An original DES model developed to examine crowding and patient flow for staffing decision making at an urban academic emergency department serves as a practical example. PMID:23650696
Discrete-event simulation of a wide-area health care network.
McDaniel, J G
1995-01-01
OBJECTIVE: Predict the behavior and estimate the telecommunication cost of a wide-area message store-and-forward network for health care providers that uses the telephone system. DESIGN: A tool with which to perform large-scale discrete-event simulations was developed. Network models for star and mesh topologies were constructed to analyze the differences in performances and telecommunication costs. The distribution of nodes in the network models approximates the distribution of physicians, hospitals, medical labs, and insurers in the Province of Saskatchewan, Canada. Modeling parameters were based on measurements taken from a prototype telephone network and a survey conducted at two medical clinics. Simulation studies were conducted for both topologies. RESULTS: For either topology, the telecommunication cost of a network in Saskatchewan is projected to be less than $100 (Canadian) per month per node. The estimated telecommunication cost of the star topology is approximately half that of the mesh. Simulations predict that a mean end-to-end message delivery time of two hours or less is achievable at this cost. A doubling of the data volume results in an increase of less than 50% in the mean end-to-end message transfer time. CONCLUSION: The simulation models provided an estimate of network performance and telecommunication cost in a specific Canadian province. At the expected operating point, network performance appeared to be relatively insensitive to increases in data volume. Similar results might be anticipated in other rural states and provinces in North America where a telephone-based network is desired. PMID:7583646
Wilke, Jeremiah J; Kenny, Joseph P.
2015-02-01
Discrete event simulation provides a powerful mechanism for designing and testing new extreme- scale programming models for high-performance computing. Rather than debug, run, and wait for results on an actual system, design can first iterate through a simulator. This is particularly useful when test beds cannot be used, i.e. to explore hardware or scales that do not yet exist or are inaccessible. Here we detail the macroscale components of the structural simulation toolkit (SST). Instead of depending on trace replay or state machines, the simulator is architected to execute real code on real software stacks. Our particular user-space threading framework allows massive scales to be simulated even on small clusters. The link between the discrete event core and the threading framework allows interesting performance metrics like call graphs to be collected from a simulated run. Performance analysis via simulation can thus become an important phase in extreme-scale programming model and runtime system design via the SST macroscale components.
Using Discrete Event Computer Simulation to Improve Patient Flow in a Ghanaian Acute Care Hospital
Best, Allyson M.; Dixon, Cinnamon A.; Kelton, W. David; Lindsell, Christopher J.
2014-01-01
Objectives Crowding and limited resources have increased the strain on acute care facilities and emergency departments (EDs) worldwide. These problems are particularly prevalent in developing countries. Discrete event simulation (DES) is a computer-based tool that can be used to estimate how changes to complex healthcare delivery systems, such as EDs, will affect operational performance. Using this modality, our objective was to identify operational interventions that could potentially improve patient throughput of one acute care setting in a developing country. Methods We developed a simulation model of acute care at a district level hospital in Ghana to test the effects of resource-neutral (e.g. modified staff start times and roles) and resource-additional (e.g. increased staff) operational interventions on patient throughput. Previously captured, de-identified time-and-motion data from 487 acute care patients were used to develop and test the model. The primary outcome was the modeled effect of interventions on patient length of stay (LOS). Results The base-case (no change) scenario had a mean LOS of 292 minutes (95% CI 291, 293). In isolation, neither adding staffing, changing staff roles, nor varying shift times affected overall patient LOS. Specifically, adding two registration workers, history takers, and physicians resulted in a 23.8 (95% CI 22.3, 25.3) minute LOS decrease. However, when shift start-times were coordinated with patient arrival patterns, potential mean LOS was decreased by 96 minutes (95% CI 94, 98); and with the simultaneous combination of staff roles (Registration and History-taking) there was an overall mean LOS reduction of 152 minutes (95% CI 150, 154). Conclusions Resource-neutral interventions identified through DES modeling have the potential to improve acute care throughput in this Ghanaian municipal hospital. DES offers another approach to identifying potentially effective interventions to improve patient flow in emergency and acute care in resource-limited settings. PMID:24953788
NASA Technical Reports Server (NTRS)
Dubos, Gregory F.; Cornford, Steven
2012-01-01
While the ability to model the state of a space system over time is essential during spacecraft operations, the use of time-based simulations remains rare in preliminary design. The absence of the time dimension in most traditional early design tools can however become a hurdle when designing complex systems whose development and operations can be disrupted by various events, such as delays or failures. As the value delivered by a space system is highly affected by such events, exploring the trade space for designs that yield the maximum value calls for the explicit modeling of time.This paper discusses the use of discrete-event models to simulate spacecraft development schedule as well as operational scenarios and on-orbit resources in the presence of uncertainty. It illustrates how such simulations can be utilized to support trade studies, through the example of a tool developed for DARPA's F6 program to assist the design of "fractionated spacecraft".
Aggarwal, S.; Ryland, S.; Peck, R.
1980-06-19
This report outlines a methodology to study the effects of disruptive events on nuclear waste material in stable geologic sites. The methodology is based upon developing a discrete events model that can be simulated on the computer. This methodology allows a natural development of simulation models that use computer resources in an efficient manner. Accurate modeling in this area depends in large part upon accurate modeling of ion transport behavior in the storage media. Unfortunately, developments in this area are not at a stage where there is any consensus on proper models for such transport. Consequently, our work is directed primarily towards showing how disruptive events can be properly incorporated in such a model, rather than as a predictive tool at this stage. When and if proper geologic parameters can be determined, then it would be possible to use this as a predictive model. Assumptions and their bases are discussed, and the mathematical and computer model are described.
Muzy,; Jammalamadaka, Rajanikanth; Zeigler, Bernard P; Nutaro, James J
2011-01-01
From a modeling and simulation perspective, studying dynamic systems consists of focusing on changes in states. According to the precision of state changes, generic algorithms can be developed to track the activity of sub-systems. This paper aims at describing and applying this more natural and intuitive way to describe and implement dynamic systems. Activity is defined mathematically. A generic application case of diffusion is experimented with to compare the efficiency of quantized state methods using this new approach with traditional methods which do not focus computations on active areas. Our goal is to demonstrate that the concept of activity can estimate the computational effort required by a quantized state method. Specifically, when properly designed, a discrete-event simulator for such a method achieves a reduction in the number of state transitions that more than compensates for the overhead it imposes.
NASA Technical Reports Server (NTRS)
Leonard, Daniel; Parsons, Jeremy W.; Cates, Grant
2014-01-01
In May 2013, NASA's GSDO Program requested a study to develop a discrete event simulation (DES) model that analyzes the launch campaign process of the Space Launch System (SLS) from an integrated commodities perspective. The scope of the study includes launch countdown and scrub turnaround and focuses on four core launch commodities: hydrogen, oxygen, nitrogen, and helium. Previously, the commodities were only analyzed individually and deterministically for their launch support capability, but this study was the first to integrate them to examine the impact of their interactions on a launch campaign as well as the effects of process variability on commodity availability. The study produced a validated DES model with Rockwell Arena that showed that Kennedy Space Center's ground systems were capable of supporting a 48-hour scrub turnaround for the SLS. The model will be maintained and updated to provide commodity consumption analysis of future ground system and SLS configurations.
2013-01-01
Background Computer simulation studies of the emergency department (ED) are often patient driven and consider the physician as a human resource whose primary activity is interacting directly with the patient. In many EDs, physicians supervise delegates such as residents, physician assistants and nurse practitioners each with different skill sets and levels of independence. The purpose of this study is to present an alternative approach where physicians and their delegates in the ED are modeled as interacting pseudo-agents in a discrete event simulation (DES) and to compare it with the traditional approach ignoring such interactions. Methods The new approach models a hierarchy of heterogeneous interacting pseudo-agents in a DES, where pseudo-agents are entities with embedded decision logic. The pseudo-agents represent a physician and delegate, where the physician plays a senior role to the delegate (i.e. treats high acuity patients and acts as a consult for the delegate). A simple model without the complexity of the ED is first created in order to validate the building blocks (programming) used to create the pseudo-agents and their interaction (i.e. consultation). Following validation, the new approach is implemented in an ED model using data from an Ontario hospital. Outputs from this model are compared with outputs from the ED model without the interacting pseudo-agents. They are compared based on physician and delegate utilization, patient waiting time for treatment, and average length of stay. Additionally, we conduct sensitivity analyses on key parameters in the model. Results In the hospital ED model, comparisons between the approach with interaction and without showed physician utilization increase from 23% to 41% and delegate utilization increase from 56% to 71%. Results show statistically significant mean time differences for low acuity patients between models. Interaction time between physician and delegate results in increased ED length of stay and longer waits for beds. Conclusion This example shows the importance of accurately modeling physician relationships and the roles in which they treat patients. Neglecting these relationships could lead to inefficient resource allocation due to inaccurate estimates of physician and delegate time spent on patient related activities and length of stay. PMID:23692710
SimPackJ/S: a web-oriented toolkit for discrete event simulation
NASA Astrophysics Data System (ADS)
Park, Minho; Fishwick, Paul A.
2002-07-01
SimPackJ/S is the JavaScript and Java version of SimPack, which means SimPackJ/S is a collection of JavaScript and Java libraries and executable programs for computer simulations. The main purpose of creating SimPackJ/S is that we allow existing SimPack users to expand simulation areas and provide future users with a freeware simulation toolkit to simulate and model a system in web environments. One of the goals for this paper is to introduce SimPackJ/S. The other goal is to propose translation rules for converting C to JavaScript and Java. Most parts demonstrate the translation rules with examples. In addition, we discuss a 3D dynamic system model and overview an approach to 3D dynamic systems using SimPackJ/S. We explain an interface between SimPackJ/S and the 3D language--Virtual Reality Modeling Language (VRML). This paper documents how to translate C to JavaScript and Java and how to utilize SimPackJ/S within a 3D web environment.
Discrete event simulation of a proton therapy facility: a case study.
Corazza, Uliana; Filippini, Roberto; Setola, Roberto
2011-06-01
Proton therapy is a type of particle therapy which utilizes a beam of protons to irradiate diseased tissue. The main difference with respect to conventional radiotherapy (X-rays, γ-rays) is the capability to target tumors with extreme precision, which makes it possible to treat deep-seated tumors and tumors affecting noble tissues as brain, eyes, etc. However, proton therapy needs high-energy cyclotrons and this requires sophisticated control-supervision schema to guarantee, further than the prescribed performance, the safety of the patients and of the operators. In this paper we present the modeling and simulation of the irradiation process of the PROSCAN facility at the Paul Scherrer Institut. This is a challenging task because of the complexity of the operation scenario, which consists of deterministic and stochastic processes resulting from the coordination-interaction among diverse entities such as distributed automatic control systems, safety protection systems, and human operators. PMID:20675013
Forest biomass supply logistics for a power plant using the discrete-event simulation approach
Mobini, Mahdi; Sowlati, T.; Sokhansanj, Shahabaddine
2011-04-01
This study investigates the logistics of supplying forest biomass to a potential power plant. Due to the complexities in such a supply logistics system, a simulation model based on the framework of Integrated Biomass Supply Analysis and Logistics (IBSAL) is developed in this study to evaluate the cost of delivered forest biomass, the equilibrium moisture content, and carbon emissions from the logistics operations. The model is applied to a proposed case of 300 MW power plant in Quesnel, BC, Canada. The results show that the biomass demand of the power plant would not be met every year. The weighted average cost of delivered biomass to the gate of the power plant is about C$ 90 per dry tonne. Estimates of equilibrium moisture content of delivered biomass and CO2 emissions resulted from the processes are also provided.
2012-01-01
Background Previous cost-effectiveness studies of cholinesterase inhibitors have modeled Alzheimer's disease (AD) progression and treatment effects through single or global severity measures, or progression to "Full Time Care". This analysis evaluates the cost-effectiveness of donepezil versus memantine or no treatment in Germany by considering correlated changes in cognition, behavior and function. Methods Rates of change were modeled using trial and registry-based patient level data. A discrete event simulation projected outcomes for three identical patient groups: donepezil 10 mg, memantine 20 mg and no therapy. Patient mix, mortality and costs were developed using Germany-specific sources. Results Treatment of patients with mild to moderately severe AD with donepezil compared to no treatment was associated with 0.13 QALYs gained per patient, and 0.01 QALYs gained per caregiver and resulted in average savings of €7,007 and €9,893 per patient from the healthcare system and societal perspectives, respectively. In patients with moderate to moderately-severe AD, donepezil compared to memantine resulted in QALY gains averaging 0.01 per patient, and savings averaging €1,960 and €2,825 from the healthcare system and societal perspective, respectively. In probabilistic sensitivity analyses, donepezil dominated no treatment in most replications and memantine in over 70% of the replications. Donepezil leads to savings in 95% of replications versus memantine. Conclusions Donepezil is highly cost-effective in patients with AD in Germany, leading to improvements in health outcomes and substantial savings compared to no treatment. This holds across a variety of sensitivity analyses. PMID:22316501
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Simulating Billion-Task Parallel Programs
Perumalla, Kalyan S; Park, Alfred J
2014-01-01
In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.
Analysis hierarchical model for discrete event systems
NASA Astrophysics Data System (ADS)
Ciortea, E. M.
2015-11-01
The This paper presents the hierarchical model based on discrete event network for robotic systems. Based on the hierarchical approach, Petri network is analysed as a network of the highest conceptual level and the lowest level of local control. For modelling and control of complex robotic systems using extended Petri nets. Such a system is structured, controlled and analysed in this paper by using Visual Object Net ++ package that is relatively simple and easy to use, and the results are shown as representations easy to interpret. The hierarchical structure of the robotic system is implemented on computers analysed using specialized programs. Implementation of hierarchical model discrete event systems, as a real-time operating system on a computer network connected via a serial bus is possible, where each computer is dedicated to local and Petri model of a subsystem global robotic system. Since Petri models are simplified to apply general computers, analysis, modelling, complex manufacturing systems control can be achieved using Petri nets. Discrete event systems is a pragmatic tool for modelling industrial systems. For system modelling using Petri nets because we have our system where discrete event. To highlight the auxiliary time Petri model using transport stream divided into hierarchical levels and sections are analysed successively. Proposed robotic system simulation using timed Petri, offers the opportunity to view the robotic time. Application of goods or robotic and transmission times obtained by measuring spot is obtained graphics showing the average time for transport activity, using the parameters sets of finished products. individually.
Nutaro, James J.; Kuruganti, Phani Teja; Protopopescu, Vladimir A.; Shankar, Mallikarjun
2012-02-08
The efficient and accurate management of time in simulations of hybrid models is an outstanding engineering problem. General a priori knowledge about the dynamic behavior of the hybrid system (i.e. essentially continuous, essentially discrete, or 'truly hybrid') facilitates this task. Indeed, for essentially discrete and essentially continuous systems, existing software packages can be conveniently used to perform quite sophisticated and satisfactory simulations. The situation is different for 'truly hybrid' systems, for which direct application of existing software packages results in a lengthy design process, cumbersome software assemblies, inaccurate results, or some combination of these independent of the designer's a priori knowledge about the system's structure and behavior. The main goal of this paper is to provide a methodology whereby simulation designers can use a priori knowledge about the hybrid model's structure to build a straightforward, efficient, and accurate simulator with existing software packages. The proposed methodology is based on a formal decomposition and re-articulation of the hybrid system; this is the main theoretical result of the paper. To set the result in the right perspective, we briefly review the essentially continuous and essentially discrete approaches, which are illustrated with typical examples. Then we present our new, split system approach, first in a general formal context, then in three more specific guises that reflect the viewpoints of three main communities of hybrid system researchers and practitioners. For each of these variants we indicate an implementation path. Our approach is illustrated with an archetypal problem of power grid control.
Erenay, Fatih Safa; Alagoz, Oguzhan; Banerjee, Ritesh; Cima, Robert R.
2011-01-01
Objectives Some aspects of the natural history of metachronous colorectal cancer (MCRC), such as the rate of progression from adenomatous polyp to MCRC, are unknown. The objective of this study is to estimate a set of parameters revealing some of these unknown characteristics of MCRC. Methods The authors developed a computer simulation model that mimics the progression of MCRC for a 5-year period following the treatment of primary colorectal cancer (CRC). They obtained the inputs of the simulation model using longitudinal data for 284 CRC patients from the Mayo Clinic, Rochester. Results Five-year MCRC incidence and all-cause mortality were 7.4% and 12.7% in the patient cohort, respectively. Statistical analysis showed that 5-year MCRC incidence was associated with gender (P = 0.05), whereas both all-cause and CRC-related mortalities were associated with age (P < 0.001 and P = 0.01). Estimated annual probabilities of progression from adenomatous polyp to MCRC and from MCRC to metastatic MCRC were 0.14 and 0.28, respectively. Annual probabilities of mortality after MCRC and metastatic MCRC treatments were estimated to be 0.06 and 0.26, respectively. The estimated annual probability of mortality due to undetected MCRC was 0.16. Conclusions The results imply that MCRC, especially in women, may be more common than suggested by previous studies. In addition, statistics derived from the clinical data and results of the simulation model indicate that gender and age affect the progression of MCRC. PMID:21212440
Comas, Mercè; Arrospide, Arantzazu; Mar, Javier; Sala, Maria; Vilaprinyó, Ester; Hernández, Cristina; Cots, Francesc; Martínez, Juan; Castells, Xavier
2014-01-01
Objective To assess the budgetary impact of switching from screen-film mammography to full-field digital mammography in a population-based breast cancer screening program. Methods A discrete-event simulation model was built to reproduce the breast cancer screening process (biennial mammographic screening of women aged 50 to 69 years) combined with the natural history of breast cancer. The simulation started with 100,000 women and, during a 20-year simulation horizon, new women were dynamically entered according to the aging of the Spanish population. Data on screening were obtained from Spanish breast cancer screening programs. Data on the natural history of breast cancer were based on US data adapted to our population. A budget impact analysis comparing digital with screen-film screening mammography was performed in a sample of 2,000 simulation runs. A sensitivity analysis was performed for crucial screening-related parameters. Distinct scenarios for recall and detection rates were compared. Results Statistically significant savings were found for overall costs, treatment costs and the costs of additional tests in the long term. The overall cost saving was 1,115,857€ (95%CI from 932,147 to 1,299,567) in the 10th year and 2,866,124€ (95%CI from 2,492,610 to 3,239,638) in the 20th year, representing 4.5% and 8.1% of the overall cost associated with screen-film mammography. The sensitivity analysis showed net savings in the long term. Conclusions Switching to digital mammography in a population-based breast cancer screening program saves long-term budget expense, in addition to providing technical advantages. Our results were consistent across distinct scenarios representing the different results obtained in European breast cancer screening programs. PMID:24832200
On extending parallelism to serial simulators
NASA Technical Reports Server (NTRS)
Nicol, David; Heidelberger, Philip
1994-01-01
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
Inflated speedups in parallel simulations via malloc()
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.
2014-01-01
Background Osteoporotic fractures cause a large health burden and substantial costs. This study estimated the expected fracture numbers and costs for the remaining lifetime of postmenopausal women in Germany. Methods A discrete event simulation (DES) model which tracks changes in fracture risk due to osteoporosis, a previous fracture or institutionalization in a nursing home was developed. Expected lifetime fracture numbers and costs per capita were estimated for postmenopausal women (aged 50 and older) at average osteoporosis risk (AOR) and for those never suffering from osteoporosis. Direct and indirect costs were modeled. Deterministic univariate and probabilistic sensitivity analyses were conducted. Results The expected fracture numbers over the remaining lifetime of a 50 year old woman with AOR for each fracture type (% attributable to osteoporosis) were: hip 0.282 (57.9%), wrist 0.229 (18.2%), clinical vertebral 0.206 (39.2%), humerus 0.147 (43.5%), pelvis 0.105 (47.5%), and other femur 0.033 (52.1%). Expected discounted fracture lifetime costs (excess cost attributable to osteoporosis) per 50 year old woman with AOR amounted to €4,479 (€1,995). Most costs were accrued in the hospital €1,743 (€751) and long-term care sectors €1,210 (€620). Univariate sensitivity analysis resulted in percentage changes between -48.4% (if fracture rates decreased by 2% per year) and +83.5% (if fracture rates increased by 2% per year) compared to base case excess costs. Costs for women with osteoporosis were about 3.3 times of those never getting osteoporosis (€7,463 vs. €2,247), and were markedly increased for women with a previous fracture. Conclusion The results of this study indicate that osteoporosis causes a substantial share of fracture costs in postmenopausal women, which strongly increase with age and previous fractures. PMID:24981316
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Parallel Atomistic Simulations
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Scaling Time Warp-based Discrete Event Execution to 10^{4} Processors on Blue Gene Supercomputer
Perumalla, Kalyan S
2007-01-01
Lately, important large-scale simulation applications, such as emergency/event planning and response, are emerging that are based on discrete event models. The applications are characterized by their scale (several millions of simulated entities), their fine-grained nature of computation (microseconds per event), and their highly dynamic inter-entity event interactions. The desired scale and speed together call for highly scalable parallel discrete event simulation (PDES) engines. However, few such parallel engines have been designed or tested on platforms with thousands of processors. Here an overview is given of a unique PDES engine that has been designed to support Time Warp-style optimistic parallel execution as well as a more generalized mixed, optimistic-conservative synchronization. The engine is designed to run on massively parallel architectures with minimal overheads. A performance study of the engine is presented, including the first results to date of PDES benchmarks demonstrating scalability to as many as 16,384 processors, on an IBM Blue Gene supercomputer. The results show, for the first time, the promise of effectively sustaining very large scale discrete event execution on up to 10^{4} processors.
Xyce parallel electronic simulator.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Parallel Dislocation Simulator
Energy Science and Technology Software Center (ESTSC)
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
An algebra of discrete event processes
NASA Technical Reports Server (NTRS)
Heymann, Michael; Meyer, George
1991-01-01
This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.
Optimal Discrete Event Supervisory Control of Aircraft Gas Turbine Engines
NASA Technical Reports Server (NTRS)
Litt, Jonathan (Technical Monitor); Ray, Asok
2004-01-01
This report presents an application of the recently developed theory of optimal Discrete Event Supervisory (DES) control that is based on a signed real measure of regular languages. The DES control techniques are validated on an aircraft gas turbine engine simulation test bed. The test bed is implemented on a networked computer system in which two computers operate in the client-server mode. Several DES controllers have been tested for engine performance and reliability.
Discrete Events as Units of Perceived Time
ERIC Educational Resources Information Center
Liverence, Brandon M.; Scholl, Brian J.
2012-01-01
In visual images, we perceive both space (as a continuous visual medium) and objects (that inhabit space). Similarly, in dynamic visual experience, we perceive both continuous time and discrete events. What is the relationship between these units of experience? The most intuitive answer may be similar to the spatial case: time is perceived as an…
Discrete Events as Units of Perceived Time
ERIC Educational Resources Information Center
Liverence, Brandon M.; Scholl, Brian J.
2012-01-01
In visual images, we perceive both space (as a continuous visual medium) and objects (that inhabit space). Similarly, in dynamic visual experience, we perceive both continuous time and discrete events. What is the relationship between these units of experience? The most intuitive answer may be similar to the spatial case: time is perceived as an
Nonlinear Control and Discrete Event Systems
NASA Technical Reports Server (NTRS)
Meyer, George; Null, Cynthia H. (Technical Monitor)
1995-01-01
As the operation of large systems becomes ever more dependent on extensive automation, the need for an effective solution to the problem of design and validation of the underlying software becomes more critical. Large systems possesses much detailed structure, typically hierarchical, and they are hybrid. Information processing at the top of the hierarchy is by means of formal logic and sentences; on the bottom it is by means of simple scalar differential equations and functions of time; and in the middle it is by an interacting mix of nonlinear multi-axis differential equations and automata, and functions of time and discrete events. The lecture will address the overall problem as it relates to flight vehicle management, describe the middle level, and offer a design approach that is based on Differential Geometry and Discrete Event Dynamic Systems Theory.
Multiple Autonomous Discrete Event Controllers for Constellations
NASA Technical Reports Server (NTRS)
Esposito, Timothy C.
2003-01-01
The Multiple Autonomous Discrete Event Controllers for Constellations (MADECC) project is an effort within the National Aeronautics and Space Administration Goddard Space Flight Center's (NASA/GSFC) Information Systems Division to develop autonomous positioning and attitude control for constellation satellites. It will be accomplished using traditional control theory and advanced coordination algorithms developed by the Johns Hopkins University Applied Physics Laboratory (JHU/APL). This capability will be demonstrated in the discrete event control test-bed located at JHU/APL. This project will be modeled for the Leonardo constellation mission, but is intended to be adaptable to any constellation mission. To develop a common software architecture. the controllers will only model very high-level responses. For instance, after determining that a maneuver must be made. the MADECC system will output B (Delta)V (velocity change) value. Lower level systems must then decide which thrusters to fire and for how long to achieve that (Delta)V.
Parallel implementation of VHDL simulations on the Intel iPSC/2 hypercube. Master's thesis
Comeau, R.C.
1991-12-01
VHDL models are executed sequentially in current commercial simulators. As chip designs grow larger and more complex, simulations must run faster. One approach to increasing simulation speed is through parallel processors. This research transforms the behavioral and structural models created by Intermetrics' sequential VHDL simulator into models for parallel execution. The models are simulated on an Intel iPSC/2 hypercube with synchronization of the nodes being achieved by utilizing the Chandy Misra paradigm for discrete-event simulations. Three eight-bit adders, the ripple carry, the carry save, and the carry-lookahead, are each run through the parallel simulator. Simulation time is cut in at least half for all three test cases over the sequential Intermetrics model. Results with regard to speedup are given to show effects of different mappings, varying workloads per node, and overhead due to output messages.
Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores
Perumalla, Kalyan S; Park, Alfred J; Tipparaju, Vinod
2014-01-01
Global virtual time (GVT) computation is a key determinant of the efficiency and runtime dynamics of parallel discrete event simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of up to 54 billion events executed per second is registered. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.
Planning and supervision of reactor defueling using discrete event techniques
Garcia, H.E.; Imel, G.R.; Houshyar, A.
1995-12-31
New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.
Parallel Power Grid Simulation Toolkit
Energy Science and Technology Software Center (ESTSC)
2015-09-14
ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
LAN attack detection using Discrete Event Systems.
Hubballi, Neminath; Biswas, Santosh; Roopa, S; Ratti, Ritesh; Nandi, Sukumar
2011-01-01
Address Resolution Protocol (ARP) is used for determining the link layer or Medium Access Control (MAC) address of a network host, given its Internet Layer (IP) or Network Layer address. ARP is a stateless protocol and any IP-MAC pairing sent by a host is accepted without verification. This weakness in the ARP may be exploited by malicious hosts in a Local Area Network (LAN) by spoofing IP-MAC pairs. Several schemes have been proposed in the literature to circumvent these attacks; however, these techniques either make IP-MAC pairing static, modify the existing ARP, patch operating systems of all the hosts etc. In this paper we propose a Discrete Event System (DES) approach for Intrusion Detection System (IDS) for LAN specific attacks which do not require any extra constraint like static IP-MAC, changing the ARP etc. A DES model is built for the LAN under both a normal and compromised (i.e., spoofed request/response) situation based on the sequences of ARP related packets. Sequences of ARP events in normal and spoofed scenarios are similar thereby rendering the same DES models for both the cases. To create different ARP events under normal and spoofed conditions the proposed technique uses active ARP probing. However, this probing adds extra ARP traffic in the LAN. Following that a DES detector is built to determine from observed ARP related events, whether the LAN is operating under a normal or compromised situation. The scheme also minimizes extra ARP traffic by probing the source IP-MAC pair of only those ARP packets which are yet to be determined as genuine/spoofed by the detector. Also, spoofed IP-MAC pairs determined by the detector are stored in tables to detect other LAN attacks triggered by spoofing namely, man-in-the-middle (MiTM), denial of service etc. The scheme is successfully validated in a test bed. PMID:20804980
Analytic Perturbation Analysis of Discrete Event Dynamic Systems
Uryasev, S.
1994-09-01
This paper considers a new Analytic Perturbation Analysis (APA) approach for Discrete Event Dynamic Systems (DEDS) with discontinuous sample-path functions with respect to control parameters. The performance functions for DEDS usually are formulated as mathematical expectations, which can be calculated only numerically. APA is based on new analytic formulas for the gradients of expectations of indicator functions; therefore, it is called an analytic perturbation analysis. The gradient of performance function may not coincide with the expectation of a gradient of sample-path function (i.e., the interchange formula for the gradient and expectation sign may not be valid). Estimates of gradients can be obtained with one simulation run of the models.
Parallel Event-Driven Global Magnetospheric Hybrid Simulations
NASA Astrophysics Data System (ADS)
Omelchenko, Y. A.; Karimabadi, H.; Saule, E.; Catalyurek, U. V.
2010-12-01
Global MHD/Hall-MHD magnetospheric models are not able to capture the full diversity of scales and processes that control the Earth's magnetosphere. In order to significantly improve the predictive capabilities of global space weather models, new CPU-efficient algorithms are needed, which could properly account for ion kinetic effects in a large computational domain over long simulation times. To achieve this much expected breakthrough in hybrid (particle ions and fluid electrons) simulations we developed a novel asynchronous time integration technique known as Discrete-Event Simulation (DES). DES replaces conventional time stepping with event processing, which allows to update macro-particles and grid-based fields on their own timescales. This unique capability of DES removes the traditional CFL constraint on the global timestep and enables natural (event-driven) coupling of multi-physics components in a global application model. We report first-ever parallel 2D hybrid DES (HYPERS) runs and compare them with similar time-stepped simulations. We also discuss our undergoing efforts on developing efficient load-balancing strategies for future 3D HYPERS runs on petascale architectures.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Modelling machine ensembles with discrete event dynamical system theory
NASA Technical Reports Server (NTRS)
Hunter, Dan
1990-01-01
Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).
Discrete Event Supervisory Control Applied to Propulsion Systems
NASA Technical Reports Server (NTRS)
Litt, Jonathan S.; Shah, Neerav
2005-01-01
The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offs in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.
Hierarchical Discrete Event Supervisory Control of Aircraft Propulsion Systems
NASA Technical Reports Server (NTRS)
Yasar, Murat; Tolani, Devendra; Ray, Asok; Shah, Neerav; Litt, Jonathan S.
2004-01-01
This paper presents a hierarchical application of Discrete Event Supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the mission objectives. Each engine is operated under a continuously varying control system that maintains the specified performance and a local discrete-event supervisor for condition monitoring and life extending control. A global upper level DES controller is designed for load balancing and overall health management of the propulsion system.
CAISSON: Interconnect Network Simulator
NASA Technical Reports Server (NTRS)
Springer, Paul L.
2006-01-01
Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
Hierarchical, modular discrete-event modelling in an object-oriented environment
Zeigler, B.P.
1987-11-01
Hierarchical, modular specification of discrete-event models offers a basis for reusable model bases and hence for enhanced simulation of truly varied design alternatives. The authors describe an environment which realizes the DEVS formalism developed for hierarchical, modular models. It is implemented in PC-Scheme, a powerful Lisp dialect for microcomputers containing an object-oriented programming subsystem. Since both the implementation and the underlying language are accessible to the user, the result is a capable medium for combining simulation modelling and artificial intelligence techniques.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-01-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Stochastic Parallel PARticle Kinetic Simulator
Plimpton, Steve; Thompson, Aidan; Slepoy, Alex
2008-07-01
SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invoked simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.
Stochastic Parallel PARticle Kinetic Simulator
Energy Science and Technology Software Center (ESTSC)
2008-07-01
SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore » simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less
Simulating the scheduling of parallel supercomputer applications
Seager, M.K.; Stichnoth, J.M.
1989-09-19
An Event Driven Simulator for Evaluating Multiprocessing Scheduling (EDSEMS) disciplines is presented. The simulator is made up of three components: machine model; parallel workload characterization ; and scheduling disciplines for mapping parallel applications (many processes cooperating on the same computation) onto processors. A detailed description of how the simulator is constructed, how to use it and how to interpret the output is also given. Initial results are presented from the simulation of parallel supercomputer workloads using Dog-Eat-Dog,'' Family'' and Gang'' scheduling disciplines. These results indicate that Gang scheduling is far better at giving the number of processors that a job requests than Dog-Eat-Dog or Family scheduling. In addition, the system throughput and turnaround time are not adversely affected by this strategy. 10 refs., 8 figs., 1 tab.
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Parallel processing of a rotating shaft simulation
NASA Technical Reports Server (NTRS)
Arpasi, Dale J.
1989-01-01
A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.
Particle simulations on massively parallel machines
Plimpton, S.
1993-06-01
A wide variety of physical phenomena can be modeled with particles. Such simulations pose interesting challenges for parallel machines since the computations are often difficult to load-balance and can require irregular communication. We discuss the size of problems that can be simulated today, obstacles to higher performance, and areas where algorithmic improvements are need. The relevant issues are illustrated with two prototypical simulations: a Monte Carlo model of low-density fluid flow and molecular dynamics.
The Xyce Parallel Electronic Simulator - An Overview
HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.
2000-12-08
The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.
Parallel and Distributed System Simulation
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.
Xyce parallel electronic simulator release notes.
Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
A parallel computational model for GATE simulations.
Rannou, F R; Vega-Acevedo, N; El Bitar, Z
2013-12-01
GATE/Geant4 Monte Carlo simulations are computationally demanding applications, requiring thousands of processor hours to produce realistic results. The classical strategy of distributing the simulation of individual events does not apply efficiently for Positron Emission Tomography (PET) experiments, because it requires a centralized coincidence processing and large communication overheads. We propose a parallel computational model for GATE that handles event generation and coincidence processing in a simple and efficient way by decentralizing event generation and processing but maintaining a centralized event and time coordinator. The model is implemented with the inclusion of a new set of factory classes that can run the same executable in sequential or parallel mode. A Mann-Whitney test shows that the output produced by this parallel model in terms of number of tallies is equivalent (but not equal) to its sequential counterpart. Computational performance evaluation shows that the software is scalable and well balanced. PMID:24070545
Parallel Performance of a Combustion Chemistry Simulation
Skinner, Gregg; Eigenmann, Rudolf
1995-01-01
We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Parallel software simulation using PS-nets
Markov, N.G.; Miroshnichenko, E.A.; Saraikin, A.V.
1995-09-01
The requirements of techniques for parallel software simulation are discussed. According to these requirements, techniques on the basis of PS-nets are suggested. The fundamentals of program system modeling by PS-nets are given. The tools developed for modeling are described.
Performance limitations in parallel processor simulations
NASA Technical Reports Server (NTRS)
O'Grady, E. Pearse; Wang, Chung-Hsien
1987-01-01
A jet-engine model is partitioned and simulated on a parallel processor system consisting of five 8086/8087 floating-point computers. The simulation uses Heun's integration method. A near-optimal parallel simulation (in the sense of minimum execution time) achieves speedup of only 2.13 and efficiency of 42.6 percent, in effect wasting 57.4 percent of the available processing power. A detailed analysis identifies and graphically demonstrates why the system fails to achieve ideal performance (viz., speedup of 5 and efficiency of 100 percent). Inherent characteristics of the problem equations and solution algorithm account for the loss of nearly half of the available processing power. Overheads associated with interprocessor communication and processor synchronization account for only a small fraction of the lost processing power. The effects of these and other factors which limit parallel processor performance are illustrated through real-time timing-analyzer tracers describing the run/idle status of the parallel processors during the simulation.
Safety Discrete Event Models for Holonic Cyclic Manufacturing Systems
NASA Astrophysics Data System (ADS)
Ciufudean, Calin; Filote, Constantin
In this paper the expression “holonic cyclic manufacturing systems” refers to complex assembly/disassembly systems or fork/join systems, kanban systems, and in general, to any discrete event system that transforms raw material and/or components into products. Such a system is said to be cyclic if it provides the same sequence of products indefinitely. This paper considers the scheduling of holonic cyclic manufacturing systems and describes a new approach using Petri nets formalism. We propose an approach to frame the optimum schedule of holonic cyclic manufacturing systems in order to maximize the throughput while minimize the work in process. We also propose an algorithm to verify the optimum schedule.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Parallel algorithm strategies for circuit simulation.
Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard
2010-01-01
Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.
Parallel node placement method by bubble simulation
NASA Astrophysics Data System (ADS)
Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang
2014-03-01
An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.
Xyce parallel electronic simulator : reference guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.
Parallel-distributed mobile robot simulator
NASA Astrophysics Data System (ADS)
Okada, Hiroyuki; Sekiguchi, Minoru; Watanabe, Nobuo
1996-06-01
The aim of this project is to achieve an autonomous learning and growth function based on active interaction with the real world. It should also be able to autonomically acquire knowledge about the context in which jobs take place, and how the jobs are executed. This article describes a parallel distributed movable robot system simulator with an autonomous learning and growth function. The autonomous learning and growth function which we are proposing is characterized by its ability to learn and grow through interaction with the real world. When the movable robot interacts with the real world, the system compares the virtual environment simulation with the interaction result in the real world. The system then improves the virtual environment to match the real-world result more closely. This the system learns and grows. It is very important that such a simulation is time- realistic. The parallel distributed movable robot simulator was developed to simulate the space of a movable robot system with an autonomous learning and growth function. The simulator constructs a virtual space faithful to the real world and also integrates the interfaces between the user, the actual movable robot and the virtual movable robot. Using an ultrafast CG (computer graphics) system (FUJITSU AG series), time-realistic 3D CG is displayed.
NASA Technical Reports Server (NTRS)
Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.
1994-01-01
Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.
Xyce() Parallel Electronic Simulator
Energy Science and Technology Software Center (ESTSC)
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less
Xyce() Parallel Electronic Simulator
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.
Parallelism extraction and program restructuring for parallel simulation of digital systems
Vellandi, B.L.
1990-01-01
Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.
Fracture simulations via massively parallel molecular dynamics
Holian, B.L.; Abraham, F.F.; Ravelo, R.
1993-09-01
Fracture simulations at the atomistic level have heretofore been carried out for relatively small systems of particles, typically 10,000 or less. In order to study anything approaching a macroscopic system, massively parallel molecular dynamics (MD) must be employed. In two spatial dimensions (2D), it is feasible to simulate a sample that is 0.1 {mu}m on a side. We report on recent MD simulations of mode I crack extension under tensile loading at high strain rates. The method of uniaxial, homogeneously expanding periodic boundary conditions was employed to represent tensile stress conditions near the crack tip. The effects of strain rate, temperature, material properties (equation of state and defect energies), and system size were examined. We found that, in order to mimic a bulk sample, several tricks (in addition to expansion boundary conditions) need to be employed: (1) the sample must be pre-strained to nearly the condition at which the crack will spontaneously open; (2) to relieve the stresses at free surfaces, such as the initial notch, annealing by kinetic-energy quenching must be carried out to prevent unwanted rarefactions; (3) sound waves emitted as the crack tip opens and dislocations emitted from the crack tip during blunting must be absorbed by special reservoir regions. The tricks described briefly in this paper will be especially important to carrying out feasible massively parallel 3D simulations via MD.
Parallel Strategies for Crash and Impact Simulations
Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.
1998-12-07
We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.
Massively Parallel Direct Simulation of Multiphase Flow
COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.
2000-08-10
The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.
Optimal Parametric Discrete Event Control: Problem and Solution
Griffin, Christopher H
2008-01-01
We present a novel optimization problem for discrete event control, similar in spirit to the optimal parametric control problem common in statistical process control. In our problem, we assume a known finite state machine plant model $G$ defined over an event alphabet $\\Sigma$ so that the plant model language $L = \\LanM(G)$ is prefix closed. We further assume the existence of a \\textit{base control structure} $M_K$, which may be either a finite state machine or a deterministic pushdown machine. If $K = \\LanM(M_K)$, we assume $K$ is prefix closed and that $K \\subseteq L$. We associate each controllable transition of $M_K$ with a binary variable $X_1,\\dots,X_n$ indicating whether the transition is enabled or not. This leads to a function $M_K(X_1,\\dots,X_n)$, that returns a new control specification depending upon the values of $X_1,\\dots,X_n$. We exhibit a branch-and-bound algorithm to solve the optimization problem $\\min_{X_1,\\dots,X_n}\\max_{w \\in K} C(w)$ such that $M_K(X_1,\\dots,X_n) \\models \\Pi$ and $\\LanM(M_K(X_1,\\dots,X_n)) \\in \\Con(L)$. Here $\\Pi$ is a set of logical assertions on the structure of $M_K(X_1,\\dots,X_n)$, and $M_K(X_1,\\dots,X_n) \\models \\Pi$ indicates that $M_K(X_1,\\dots,X_n)$ satisfies the logical assertions; and, $\\Con(L)$ is the set of controllable sublanguages of $L$.
Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype
ERIC Educational Resources Information Center
Sanchez, A.; Bucio, J.
2012-01-01
This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a…
Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype
ERIC Educational Resources Information Center
Sanchez, A.; Bucio, J.
2012-01-01
This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a
Parallel Numerical Simulations of Water Reservoirs
NASA Astrophysics Data System (ADS)
Torres, Pedro; Mangiavacchi, Norberto
2010-11-01
The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
Empirical study of parallel LRU simulation algorithms
NASA Technical Reports Server (NTRS)
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
A polymorphic reconfigurable emulator for parallel simulation
NASA Technical Reports Server (NTRS)
Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.
1980-01-01
Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.
Parallel Proximity Detection for Computer Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1998-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel Proximity Detection for Computer Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1997-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066
Parallel multiscale simulations of a brain aneurysm
NASA Astrophysics Data System (ADS)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
A parallel algorithm for implicit depletant simulations
NASA Astrophysics Data System (ADS)
Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.
2015-11-01
We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ϕc < 0.50, and additionally enables simulation of the fluid-solid transition.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1997-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be encapsulated in C + + code thereby facilitating reuse without undue development effort. The details are covered in the aforementioned section of the interim report filed on April 28, 1997.
Improving the performance of molecular dynamics simulations on parallel clusters.
Borstnik, Urban; Hodoscek, Milan; Janezic, Dusanka
2004-01-01
In this article a procedure is derived to obtain a performance gain for molecular dynamics (MD) simulations on existing parallel clusters. Parallel clusters use a wide array of interconnection technologies to connect multiple processors together, often at different speeds, such as multiple processor computers and networking. It is demonstrated how to configure existing programs for MD simulations to efficiently handle collective communication on parallel clusters with processor interconnections of different speeds. PMID:15032512
Partitioning strategies for parallel KIVA-4 engine simulations
Torres, D J; Kong, S C
2008-01-01
Parallel KIVA-4 is described and simulated in four different engine geometries. The Message Passing-Interface (MPl) was used to parallelize KIVA-4. Par itioning strategies ar accesed in light of the fact that cells can become deactivated and activated during the course of an engine simulation which will affect the load balance between processors.
Discrete event command and control for networked teams with multiple missions
NASA Astrophysics Data System (ADS)
Lewis, Frank L.; Hudas, Greg R.; Pang, Chee Khiang; Middleton, Matthew B.; McMurrough, Christopher
2009-05-01
During mission execution in military applications, the TRADOC Pamphlet 525-66 Battle Command and Battle Space Awareness capabilities prescribe expectations that networked teams will perform in a reliable manner under changing mission requirements, varying resource availability and reliability, and resource faults. In this paper, a Command and Control (C2) structure is presented that allows for computer-aided execution of the networked team decision-making process, control of force resources, shared resource dispatching, and adaptability to change based on battlefield conditions. A mathematically justified networked computing environment is provided called the Discrete Event Control (DEC) Framework. DEC has the ability to provide the logical connectivity among all team participants including mission planners, field commanders, war-fighters, and robotic platforms. The proposed data management tools are developed and demonstrated on a simulation study and an implementation on a distributed wireless sensor network. The results show that the tasks of multiple missions are correctly sequenced in real-time, and that shared resources are suitably assigned to competing tasks under dynamically changing conditions without conflicts and bottlenecks.
Parallel methods for dynamic simulation of multiple manipulator systems
NASA Technical Reports Server (NTRS)
Mcmillan, Scott; Sadayappan, P.; Orin, David E.
1993-01-01
In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but message passing MIMD systems seem to be best suited for large miltiblock applications.
Parallel-Processing Test Bed For Simulation Software
NASA Technical Reports Server (NTRS)
Blech, Richard; Cole, Gary; Townsend, Scott
1996-01-01
Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.
1990-01-01
Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Parallel architecture for real-time simulation. Master's thesis
Cockrell, C.D.
1989-01-01
This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel architecture based upon The University of Alabama's OPERA architecture. Each component of this system is discussed and rationale presented for its selection. The problem selected, real-time simulation of the Space Shuttle Main Engine for the test and evaluation of the proposed architecture, is explored, identifying the areas where parallelism can be exploited and parallel processing applied. Results from the test and evaluation phase are presented and compared with the results of the same problem that has been processed on a uniprocessor system.
Xyce parallel electronic simulator : users' guide. Version 5.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
Large nonadiabatic quantum molecular dynamics simulations on parallel computers
NASA Astrophysics Data System (ADS)
Shimojo, Fuyuki; Ohmura, Satoshi; Mou, Weiwei; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya
2013-01-01
We have implemented a quantum molecular dynamics simulation incorporating nonadiabatic electronic transitions on massively parallel computers to study photoexcitation dynamics of electrons and ions. The nonadiabatic quantum molecular dynamics (NAQMD) simulation is based on Casida's linear response time-dependent density functional theory to describe electronic excited states and Tully's fewest-switches surface hopping approach to describe nonadiabatic electron-ion dynamics. To enable large NAQMD simulations, a series of techniques are employed for efficiently calculating long-range exact exchange correction and excited-state forces. The simulation program is parallelized using hybrid spatial and band decomposition, and is tested for various materials.
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Traffic simulations on parallel computers using domain decomposition techniques
Hanebutte, U.R.; Tentner, A.M.
1995-12-31
Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
The virtual marathon: parallel computing supports crowd simulations.
Yilmaz, Erdal; Isler, Veysi; Cetin, Yasemin Yardimci
2009-01-01
To be realistic, an urban model must include appropriate numbers of pedestrians, vehicles, and other dynamic entities. Using a parallel-computing architecture, researchers simulated a marathon with more than a million participants. To simulate participant behavior, they used fuzzy logic on a GPU to perform millions of inferences in real time. PMID:19798860
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10/18/99). At the least, the research would need to be done on Windows 95/Windows NT based platforms. Moreover, with the acquisition of Lahey Fortran package for PC platform, and the existing Borland C + + 5. 0, we can do work on C + + wrapper issues. We have carefully studied the blueprint for Space Transportation Propulsion Integrated Design Environment for the next 25 years [13] and found the inclusion of HBCUs in that effort encouraging. Especially in the long period for which a map is provided, there is no doubt that HBCUs will grow and become better equipped to do meaningful research. In the shorter period, as was suggested in our presentation at the HBCU conference, some key decisions regarding the aging Fortran based software for rocket propellants will need to be made. One important issue is whether or not object oriented languages such as C + + or Java should be used for distributed computing. Whether or not "distributed computing" is necessary for the existing software is yet another, larger, question to be tackled with.
Parallel Optimization with Large Eddy Simulations
NASA Astrophysics Data System (ADS)
Talnikar, Chaitanya; Blonigan, Patrick; Bodart, Julien; Wang, Qiqi; Alex Gorodetsky Collaboration; Jasper Snoek Collaboration
2014-11-01
For design optimization results to be useful, the model used must be trustworthy. For turbulent flows, Large Eddy Simulations (LES) can capture separation and other phenomena that traditional models such as RANS struggle with. However, optimization with LES can be challenging because of noisy objective function evaluations. This noise is a consequence of the sampling error of turbulent statistics, or long time averaged quantities of interest, such as the drag of an airfoil or heat transfer to a turbine blade. The sampling error causes the objective function to vary noisily with respect to design parameters for finite time simulations. Furthermore, the noise decays very slowly as computational time increases. Therefore, robustness with noisy objective functions is a crucial prerequisite to optimization candidates for LES. One way of dealing with noisy objective functions is to filter the noise using a surrogate model. Bayesian optimization, which uses Gaussian processes as surrogates, has shown promise in optimizing expensive objective functions. The following talk presents a new approach for optimization with LES incorporating these ideas. Applications to flow control of a turbulent channel and the design of a turbine blade trailing edge are also discussed.
Improved task scheduling for parallel simulations. Master's thesis
McNear, A.E.
1991-12-01
The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.
Parallel kinetic Monte Carlo simulation of Coulomb glasses
NASA Astrophysics Data System (ADS)
Ferrero, E. E.; Kolton, A. B.; Palassini, M.
2014-08-01
We develop a parallel rejection algorithm to tackle the problem of low acceptance in Monte Carlo methods, and apply it to the simulation of the hopping conduction in Coulomb glasses using Graphics Processing Units, for which we also parallelize the update of local energies. In two dimensions, our parallel code achieves speedups of up to two orders of magnitude in computing time over an equivalent serial code. We find numerical evidence of a scaling relation for the relaxation of the conductivity at different temperatures.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.
A hybrid parallel framework for the cellular Potts model simulations
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
On the hierarchical parallelization of ab initio simulations
NASA Astrophysics Data System (ADS)
Ruiz-Barragan, Sergi; Ishimura, Kazuya; Shiga, Motoyuki
2016-02-01
A hierarchical parallelization has been implemented in a new unified code PIMD-SMASH for ab initio simulation where the replicas and the Born-Oppenheimer forces are parallelized. It is demonstrated that ab initio path integral molecular dynamics simulations can be carried out very efficiently for systems up to a few tens of water molecules. The code was then used to study a Diels-Alder reaction of cyclopentadiene and butenone by ab initio string method. A reduction in the reaction energy barrier is found in the presence of hydrogen-bonded water, in accordance with experiment.
Parallel runway requirement analysis study. Volume 2: Simulation manual
NASA Technical Reports Server (NTRS)
Ebrahimi, Yaghoob S.; Chun, Ken S.
1993-01-01
This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
A network of discrete events for the representation and analysis of diffusion dynamics.
Pintus, Alberto M; Pazzona, Federico G; Demontis, Pierfranco; Suffritti, Giuseppe B
2015-11-14
We developed a coarse-grained description of the phenomenology of diffusive processes, in terms of a space of discrete events and its representation as a network. Once a proper classification of the discrete events underlying the diffusive process is carried out, their transition matrix is calculated on the basis of molecular dynamics data. This matrix can be represented as a directed, weighted network where nodes represent discrete events, and the weight of edges is given by the probability that one follows the other. The structure of this network reflects dynamical properties of the process of interest in such features as its modularity and the entropy rate of nodes. As an example of the applicability of this conceptual framework, we discuss here the physics of diffusion of small non-polar molecules in a microporous material, in terms of the structure of the corresponding network of events, and explain on this basis the diffusivity trends observed. A quantitative account of these trends is obtained by considering the contribution of the various events to the displacement autocorrelation function. PMID:26567654
A network of discrete events for the representation and analysis of diffusion dynamics
NASA Astrophysics Data System (ADS)
Pintus, Alberto M.; Pazzona, Federico G.; Demontis, Pierfranco; Suffritti, Giuseppe B.
2015-11-01
We developed a coarse-grained description of the phenomenology of diffusive processes, in terms of a space of discrete events and its representation as a network. Once a proper classification of the discrete events underlying the diffusive process is carried out, their transition matrix is calculated on the basis of molecular dynamics data. This matrix can be represented as a directed, weighted network where nodes represent discrete events, and the weight of edges is given by the probability that one follows the other. The structure of this network reflects dynamical properties of the process of interest in such features as its modularity and the entropy rate of nodes. As an example of the applicability of this conceptual framework, we discuss here the physics of diffusion of small non-polar molecules in a microporous material, in terms of the structure of the corresponding network of events, and explain on this basis the diffusivity trends observed. A quantitative account of these trends is obtained by considering the contribution of the various events to the displacement autocorrelation function.
Efficient parallel CFD-DEM simulations using OpenMP
NASA Astrophysics Data System (ADS)
Amritkar, Amit; Deb, Surya; Tafti, Danesh
2014-01-01
The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.
Dynamics and optimal number of replicas in parallel tempering simulations
NASA Astrophysics Data System (ADS)
Nadler, Walter; Hansmann, Ulrich H. E.
2007-12-01
We study the dynamics of parallel tempering simulations, also known as the replica exchange technique, which has become the method of choice for simulation of proteins and other complex systems. Recent results for the optimal choice of the control parameter discretization allow a treatment independent of the system in question. By analyzing mean first passage times across the control parameter space, we find an expression for the optimal number of replicas in simulations covering a given temperature range. Our results suggest a particular protocol to optimize the number of replicas in actual simulations.
Xyce Parallel Electronic Simulator : reference guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Xyce parallel electronic simulator reference guide, version 6.0.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.
2013-08-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].
Xyce parallel electronic simulator reference guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Parallelization Strategies for Large Particle Simulations in Astrophysics
NASA Astrophysics Data System (ADS)
Pattabiraman, Bharath
The modeling of collisional N-body stellar systems is a topic of great current interest in several branches of astrophysics and cosmology. These systems are dominated by the physics of relaxation, the collective effect of many weak, random gravitational encounters between stars. They connect directly to our understanding of star clusters, and to the formation of exotic objects such as X-ray binaries, pulsars, and massive black holes. As a prototypical multi-physics, multi-scale problem, the numerical simulation of such systems is computationally intensive, and can only be achieved through high-performance computing. The goal of this thesis is to present parallelization and optimization strategies that can be used to develop efficient computational tools for simulating collisional N-body systems. This leads to major advances: 1) From an astrophysics perspective, these tools enable the study of new physical regimes out of reach by previous simulations. They also lead to much more complete parameter space exploration, allowing direct comparison of numerical results to observational data. 2) On the high-performance computing front, efficient parallelization of a multi-component application requires the meticulous redesign of the various components, as well as innovative parallelization techniques. Many of the challenges faced in this process lie at the very heart of high-performance computing research, including achieving optimal load balancing, maximizing utilization of computational resources, and making effective use of different parallel platforms. For modeling collisional N-body systems, a Monte Carlo approach provides ideal balance between speed and accuracy, as opposed to the more accurate but less scalable direct N-body method. We describe the development of a new version of the Cluster Monte Carlo (CMC) code capable of simulating systems with a realistic number of stars, while accounting for all important physical processes. This efficient and scalable parallel version of CMC runs on both GPUs and distributed-memory architectures. We introduce various parallelization and optimization strategies that include the use of best-suited data structures, adaptive data partitioning schemes, parallel random number generation, parallel I/O, and optimized parallel algorithms, resulting in a very desirable scalability of the run-time with the processor number.
The parallel subdomain-levelset deflation method in reservoir simulation
NASA Astrophysics Data System (ADS)
van der Linden, J. H.; Jönsthövel, T. B.; Lukyanov, A. A.; Vuik, C.
2016-01-01
Extreme and isolated eigenvalues are known to be harmful to the convergence of an iterative solver. These eigenvalues can be produced by strong heterogeneity in the underlying physics. We can improve the quality of the spectrum by 'deflating' the harmful eigenvalues. In this work, deflation is applied to linear systems in reservoir simulation. In particular, large, sudden differences in the permeability produce extreme eigenvalues. The number and magnitude of these eigenvalues is linked to the number and magnitude of the permeability jumps. Two deflation methods are discussed. Firstly, we state that harmonic Ritz eigenvector deflation, which computes the deflation vectors from the information produced by the linear solver, is unfeasible in modern reservoir simulation due to high costs and lack of parallelism. Secondly, we test a physics-based subdomain-levelset deflation algorithm that constructs the deflation vectors a priori. Numerical experiments show that both methods can improve the performance of the linear solver. We highlight the fact that subdomain-levelset deflation is particularly suitable for a parallel implementation. For cases with well-defined permeability jumps of a factor 104 or higher, parallel physics-based deflation has potential in commercial applications. In particular, the good scalability of parallel subdomain-levelset deflation combined with the robust parallel preconditioner for deflated system suggests the use of this method as an alternative for AMG.
Random number generators for massively parallel simulations on GPU
NASA Astrophysics Data System (ADS)
Manssen, M.; Weigel, M.; Hartmann, A. K.
2012-08-01
High-performance streams of (pseudo) random numbers are crucial for the efficient implementation of countless stochastic algorithms, most importantly, Monte Carlo simulations and molecular dynamics simulations with stochastic thermostats. A number of implementations of random number generators has been discussed for GPU platforms before and some generators are even included in the CUDA supporting libraries. Nevertheless, not all of these generators are well suited for highly parallel applications where each thread requires its own generator instance. For this specific situation encountered, for instance, in simulations of lattice models, most of the high-quality generators with large states such as Mersenne twister cannot be used efficiently without substantial changes. We provide a broad review of existing CUDA variants of random-number generators and present the CUDA implementation of a new massively parallel high-quality, high-performance generator with a small memory load overhead.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods
Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A; Popov, Emilian L
2012-01-01
At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.
Parallelization of Program to Optimize Simulated Trajectories (POST3D)
NASA Technical Reports Server (NTRS)
Hammond, Dana P.; Korte, John J. (Technical Monitor)
2001-01-01
This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Reusable component model development approach for parallel and distributed simulation.
Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng
2014-01-01
Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Reusable Component Model Development Approach for Parallel and Distributed Simulation
Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng
2014-01-01
Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Potts-model grain growth simulations: Parallel algorithms and applications
Wright, S.A.; Plimpton, S.J.; Swiler, T.P.
1997-08-01
Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.
Xyce Parallel Electronic Simulator Users Guide Version 6.2.
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2014-09-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce%40sandia.gov (outside Sandia) xyce-sandia%40sandia.gov (Sandia only)
Noise simulation in cone beam CT imaging with parallel computing
Tu, Shu-Ju; Shaw, Chris C; Chen, Lingyun
2007-01-01
We developed a computer noise simulation model for cone beam computed tomography imaging using a general purpose PC cluster. This model uses a mono-energetic x-ray approximation and allows us to investigate three primary performance components, specifically quantum noise, detector blurring and additive system noise. A parallel random number generator based on the Weyl sequence was implemented in the noise simulation and a visualization technique was accordingly developed to validate the quality of the parallel random number generator. In our computer simulation model, three-dimensional (3D) phantoms were mathematically modelled and used to create 450 analytical projections, which were then sampled into digital image data. Quantum noise was simulated and added to the analytical projection image data, which were then filtered to incorporate flat panel detector blurring. Additive system noise was generated and added to form the final projection images. The Feldkamp algorithm was implemented and used to reconstruct the 3D images of the phantoms. A 24 dual-Xeon PC cluster was used to compute the projections and reconstructed images in parallel with each CPU processing 10 projection views for a total of 450 views. Based on this computer simulation system, simulated cone beam CT images were generated for various phantoms and technique settings. Noise power spectra for the flat panel x-ray detector and reconstructed images were then computed to characterize the noise properties. As an example among the potential applications of our noise simulation model, we showed that images of low contrast objects can be produced and used for image quality evaluation. PMID:16481694
Niehof, Jonathan T.; Morley, Steven K.
2012-01-01
We review and develop techniques to determine associations between series of discrete events. The bootstrap, a nonparametric statistical method, allows the determination of the significance of associations with minimal assumptions about the underlying processes. We find the key requirement for this method: one of the series must be widely spaced in time to guarantee the theoretical applicability of the bootstrap. If this condition is met, the calculated significance passes a reasonableness test. We conclude with some potential future extensions and caveats on the applicability of these methods. The techniques presented have been implemented in a Python-based software toolkit.
Sequential Window Diagnoser for Discrete-Event Systems Under Unreliable Observations
Wen-Chiao Lin; Humberto E. Garcia; David Thorsley; Tae-Sic Yoo
2009-09-01
This paper addresses the issue of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). Developed diagnosers referred to as sequential window diagnosers (SWDs) utilize the stochastic diagnoser probability transition matrices developed in [9] along with a resetting mechanism that allows on-line monitoring of special event occurrences. To illustrate their performance, the SWDs are applied to detect and count the occurrence of special events in a particular DEDS. Results show that SWDs are able to accurately track the number of times special events occur.
Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems
NASA Astrophysics Data System (ADS)
Cai, K.; Wonham, W. M.
2009-03-01
A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the 'easiest' and 'hardest' boundary cases of localization are discussed.
NASA Technical Reports Server (NTRS)
Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad
1994-01-01
The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.
Parallel algorithms for simulating continuous time Markov chains
NASA Technical Reports Server (NTRS)
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
Time parallelization of advanced operation scenario simulations of ITER plasma
Samaddar, D.; Casper, T. A.; Kim, S. H.; Berry, Lee A; Elwasif, Wael R; Batchelor, Donald B; Houlberg, Wayne A
2013-01-01
This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA - an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.
Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.
2005-06-01
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6
Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations
NASA Astrophysics Data System (ADS)
Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman
2014-11-01
Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
A massively parallel cellular automaton for the simulation of recrystallization
NASA Astrophysics Data System (ADS)
Kühbach, M.; Barrales-Mora, L. A.; Gottstein, G.
2014-10-01
A new implementation of a cellular automaton for the simulation of primary recrystallization in 3D space is presented. In this new approach, a parallel computer architecture is utilized to partition the simulation domain into multiple computational subdomains that can be treated as coupled, gradually coupled or decoupled entities. This enabled us to identify the characteristic growth length associated with the space repartitioning during nucleus growth. In doing so, several communication strategies between the simulation domains were implemented and tested for accuracy and parallel performance. Specifically, the model was applied to investigate the effect of a gradual spatial decoupling on microstructure evolution during oriented growth of random texture components into a deformed Al single crystal. For a domain discretized into one billion cells, it was found that a particular decoupling strategy resulted in faster executions of about two orders of magnitude and highly accurate simulations. Further partition of the domain into isolated entities systematically and negatively impacts microstructure evolution. We investigated this effect quantitatively by geometrical considerations.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Mapping a battlefield simulation onto message-passing parallel architectures
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
Conservative parallel simulation of priority class queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David
1992-01-01
A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.
Conservative parallel simulation of priority class queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.
Xyce Parallel Electronic Simulator Users Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
Development of magnetron sputtering simulator with GPU parallel computing
NASA Astrophysics Data System (ADS)
Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil
2014-12-01
Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that initially both simulations are in good agreement; however, differences develop over time due to statistical noise in the PIC-MCC GPGPU model.
Numerical Simulation of Flow Field Within Parallel Plate Plastometer
NASA Technical Reports Server (NTRS)
Antar, Basil N.
2002-01-01
Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
High Performance Parallel Methods for Space Weather Simulations
NASA Technical Reports Server (NTRS)
Hunter, Paul (Technical Monitor); Gombosi, Tamas I.
2003-01-01
This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.
CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation
NASA Astrophysics Data System (ADS)
Schneider, Evan E.; Robertson, Brant E.
2015-04-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
Massively parallel algorithms for trace-driven cache simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.
1991-01-01
Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
Exception handling controllers: An application of pushdown systems to discrete event control
Griffin, Christopher H
2008-01-01
Recent work by the author has extended the Supervisory Control Theory to include the class of control languages defined by pushdown machines. A pushdown machine is a finite state machine extended by an infinite stack memory. In this paper, we define a specific type of deterministic pushdown machine that is particularly useful as a discrete event controller. Checking controllability of pushdown machines requires computing the complement of the controller machine. We show that Exception Handling Controllers have the property that algorithms for taking their complements and determining their prefix closures are nearly identical to the algorithms available for finite state machines. Further, they exhibit an important property that makes checking for controllability extremely simple. Hence, they maintain the simplicity of the finite state machine, while providing the extra power associated with a pushdown stack memory. We provide an example of a useful control specification that cannot be implemented using a finite state machine, but can be implemented using an Exception Handling Controller.
NASA Technical Reports Server (NTRS)
Mizell, Carolyn Barrett; Malone, Linda
2007-01-01
The development process for a large software development project is very complex and dependent on many variables that are dynamic and interrelated. Factors such as size, productivity and defect injection rates will have substantial impact on the project in terms of cost and schedule. These factors can be affected by the intricacies of the process itself as well as human behavior because the process is very labor intensive. The complex nature of the development process can be investigated with software development process models that utilize discrete event simulation to analyze the effects of process changes. The organizational environment and its effects on the workforce can be analyzed with system dynamics that utilizes continuous simulation. Each has unique strengths and the benefits of both types can be exploited by combining a system dynamics model and a discrete event process model. This paper will demonstrate how the two types of models can be combined to investigate the impacts of human resource interactions on productivity and ultimately on cost and schedule.
Did a discrete event 200,000-100,000 years ago produce modern humans?
Weaver, Timothy D
2012-07-01
Scenarios for modern human origins are often predicated on the assumption that modern humans arose 200,000-100,000 years ago in Africa. This assumption implies that something 'special' happened at this point in time in Africa, such as the speciation that produced Homo sapiens, a severe bottleneck in human population size, or a combination of the two. The common thread is that after the divergence of the modern human and Neandertal evolutionary lineages ∼400,000 years ago, there was another discrete event near in time to the Middle-Late Pleistocene boundary that produced modern humans. Alternatively, modern human origins could have been a lengthy process that lasted from the divergence of the modern human and Neandertal evolutionary lineages to the expansion of modern humans out of Africa, and nothing out of the ordinary happened 200,000-100,000 years ago in Africa. Three pieces of biological (fossil morphology and DNA sequences) evidence are typically cited in support of discrete event models. First, living human mitochondrial DNA haplotypes coalesce ∼200,000 years ago. Second, fossil specimens that are usually classified as 'anatomically modern' seem to appear shortly afterward in the African fossil record. Third, it is argued that these anatomically modern fossils are morphologically quite different from the fossils that preceded them. Here I use theory from population and quantitative genetics to show that lengthy process models are also consistent with current biological evidence. That this class of models is a viable option has implications for how modern human origins is conceptualized. PMID:22658331
Plimpton, Steve; Thompson, Aidan; Crozier, Paul
LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.
Financial simulations on a massively parallel Connection Machine
Hutchinson, J.M.; Zenios, S.A. )
1991-01-01
This paper reports on the valuation of complex financial instruments that appear in the banking and insurance industries which requires simulations of their cashflow behavior in a volatile interest rate environment. These simulations are complex and computationally intensive. Their use, thus far, has been limited to intra-day analysis and planning. Researchers at the Wharton School and Thinking Machines Corporation have developed model formulations for massively parallel architectures, like the Connection Machine CM-2. A library of financial modeling primitives has been designed and used to implement a model for the valuation of mortgage-backed securities. Analyzing a portfolio of these securities-which would require 2 days on a large mainframe-is carried out in 1 hour on a CM-2a.
Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines
NASA Technical Reports Server (NTRS)
Kiris, Cetin C.; Kwak, Dochan; Chan, William
2000-01-01
This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.
Generating scenario trees: A parallel integrated simulation-optimization approach
NASA Astrophysics Data System (ADS)
Beraldi, Patrizia; de Simone, Francesco; Violi, Antonio
2010-03-01
A crucial issue for addressing decision-making problems under uncertainty is the approximate representation of multivariate stochastic processes in the form of scenario tree. This paper proposes a scenario generation approach based on the idea of integrating simulation and optimization techniques. In particular, simulation is used to generate outcomes associated with the nodes of the scenario tree which, in turn, provide the input parameters for an optimization model aimed at determining the scenarios' probabilities matching some prescribed targets. The approach relies on the moment-matching technique originally proposed in [K. Høyland, S.W. Wallace, Generating scenario trees for multistage decision problems, Manag. Sci. 47 (2001) 295-307] and further refined in [K. Høyland, M. Kaut, S.W. Wallace, A heuristic for moment-matching scenario generation, Comput. Optim. Appl. 24 (2003) 169-185]. By taking advantage of the iterative nature of our approach, a parallel implementation has been designed and extensively tested on financial data. Numerical results show the efficiency of the parallel algorithm and the improvement in accuracy and effectiveness.
Roadmap for efficient parallelization of breast anatomy simulation
NASA Astrophysics Data System (ADS)
Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.
2012-03-01
A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.
NASA Astrophysics Data System (ADS)
Shimizu, Futoshi; Kimizuka, Hajime; Kaburaki, Hideo
2002-08-01
A new parallel computing environment, called as ``Parallel Molecular Dynamics Stencil'', has been developed to carry out a large-scale short-range molecular dynamics simulation of solids. The stencil is written in C language using MPI for parallelization and designed successfully to separate and conceal parts of the programs describing cutoff schemes and parallel algorithms for data communication. This has been made possible by introducing the concept of image atoms. Therefore, only a sequential programming of the force calculation routine is required for executing the stencil in parallel environment. Typical molecular dynamics routines, such as various ensembles, time integration methods, and empirical potentials, have been implemented in the stencil. In the presentation, the performance of the stencil on parallel computers of Hitachi, IBM, SGI, and PC-cluster using the models of Lennard-Jones and the EAM type potentials for fracture problem will be reported.
Parallel grid library for rapid and flexible simulation development
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2013-04-01
We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.
Parallel finite element simulation of large ram-air parachutes
NASA Astrophysics Data System (ADS)
Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.
1997-06-01
In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations
Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia
2010-08-01
Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-28
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
NASA Astrophysics Data System (ADS)
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-01
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-01-01
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887
A massively parallel solution strategy for efficient thermal radiation simulation
NASA Astrophysics Data System (ADS)
Nguyen, P. D.; Moureau, V.; Vervisch, L.; Perret, N.
2012-06-01
A novel and efficient methodology to solve the Radiative Transfer Equations (RTE) in thermal radiation is discussed. The BiCGStab(2) iterative solution method, as designed for the non-symmetric linear equation systems, is used to solve the discretized RTE. The numerical upwind and central schemes are blended to provide a stable numerical scheme (MUCS) for interpolation of the cell facial radiation intensities in finite volume formulation. The combination of the BiCGStab(2) and MUCS methods proved to be very efficient when coupling with the DOM approach to solve the RTE. A cost-effective tabulation technique for the gaseous radiative property model SNB-FSCK using 7-point Gauss-Labatto quadrature scheme is also introduced. The whole methodology is implemented into a massively parallel unstructured CFD code where the radiative and fluid flow solutions share the same domain decomposition, which is the bottleneck in current radiative solvers. The dual mesh decomposition at the cell groups level and processors level is adopted to optimize the CFD code for massively parallel computing. The whole method is applied to simulate the radiation heat-transfer in a 3D rectangular enclosure containing non-isothermal CO2 and H2O mixtures. Two test cases are studied for homogeneous and inhomogeneous distributions of CO2 and H2O in the enclosure. The result is reported for the heat flux and radiation energy source and the comparison is also made between the present methodology BiCGStab(2)/MUCS/tabulated SNB-FSCK, the benchmark method SNB-CK (implemented at 25cm-1 narrow-band) and some other methods available in the literature. The present method (BiCGStab(2)/MUCS/tabulated SNB-FSCK) yields more accurate predictions particularly for the radiation source term. When comparing with the benchmark solution, the relative error of the radiation source term is remarkably reduced to less than 4% and the CPU time is drastically diminished.
Xyce Parallel Electronic Simulator Reference Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
Humans can integrate feedback of discrete events in their sensorimotor control of a robotic hand.
Cipriani, Christian; Segil, Jacob L; Clemente, Francesco; ff Weir, Richard F; Edin, Benoni
2014-11-01
Providing functionally effective sensory feedback to users of prosthetics is a largely unsolved challenge. Traditional solutions require high band-widths for providing feedback for the control of manipulation and yet have been largely unsuccessful. In this study, we have explored a strategy that relies on temporally discrete sensory feedback that is technically simple to provide. According to the Discrete Event-driven Sensory feedback Control (DESC) policy, motor tasks in humans are organized in phases delimited by means of sensory encoded discrete mechanical events. To explore the applicability of DESC for control, we designed a paradigm in which healthy humans operated an artificial robot hand to lift and replace an instrumented object, a task that can readily be learned and mastered under visual control. Assuming that the central nervous system of humans naturally organizes motor tasks based on a strategy akin to DESC, we delivered short-lasting vibrotactile feedback related to events that are known to forcefully affect progression of the grasp-lift-and-hold task. After training, we determined whether the artificial feedback had been integrated with the sensorimotor control by introducing short delays and we indeed observed that the participants significantly delayed subsequent phases of the task. This study thus gives support to the DESC policy hypothesis. Moreover, it demonstrates that humans can integrate temporally discrete sensory feedback while controlling an artificial hand and invites further studies in which inexpensive, noninvasive technology could be used in clever ways to provide physiologically appropriate sensory feedback in upper limb prosthetics with much lower band-width requirements than with traditional solutions. PMID:24992899
Particle/Continuum Hybrid Simulation in a Parallel Computing Environment
NASA Technical Reports Server (NTRS)
Baganoff, Donald
1996-01-01
The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.
NASA Astrophysics Data System (ADS)
Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan
2016-01-01
3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
A natural partitioning scheme for parallel simulation of multibody systems
NASA Technical Reports Server (NTRS)
Chiou, J. C.; Park, K. C.; Farhat, C.
1993-01-01
A parallel partitioning scheme based on physical-co-ordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure and a Schur complement based parallel preconditioned conjugate gradient numerical algorithm.
Parallel climate model (PCM) control and transient simulations
NASA Astrophysics Data System (ADS)
Washington, W. M.; Weatherly, J. W.; Meehl, G. A.; Semtner, A. J., Jr.; Bettge, T. W.; Craig, A. P.; Strand, W. G., Jr.; Arblaster, J.; Wayland, V. B.; James, R.; Zhang, Y.
The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3) and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on several distributed and shared memory computer systems. The coupling method is similar to that used in the NCAR Climate System Model (CSM) in that a flux coupler ties the components together, with interpolations between the different grids of the component models. Flux adjustments are not used in the PCM. The ocean component has 2/3° average horizontal grid spacing with 32 vertical levels and a free surface that allows calculation of sea level changes. Near the equator, the grid spacing is approximately 1/2° in latitude to better capture the ocean equatorial dynamics. The North Pole is rotated over northern North America thus producing resolution smaller than 2/3° in the North Atlantic where the sinking part of the world conveyor circulation largely takes place. Because this ocean model component does not have a computational point at the North Pole, the Arctic Ocean circulation systems are more realistic and similar to the observed. The elastic viscous plastic sea ice model has a grid spacing of 27km to represent small-scale features such as ice transport through the Canadian Archipelago and the East Greenland current region. Results from a 300year present-day coupled climate control simulation are presented, as well as for a transient 1% per year compound CO2 increase experiment which shows a global warming of 1.27°C for a 10year average at the doubling point of CO2 and 2.89°C at the quadrupling point. There is a gradual warming beyond the doubling and quadrupling points with CO2 held constant. Globally averaged sea level rise at the time of CO2 doubling is approximately 7cm and at the time of quadrupling it is 23cm. Some of the regional sea level changes are larger and reflect the adjustments in the temperature, salinity, internal ocean dynamics, surface heat flux, and wind stress on the ocean. A 0.5% per year CO2 increase experiment also was performed showing a global warming of 1.5°C around the time of CO2 doubling and a similar warming pattern to the 1% CO2 per year increase experiment. El Niño and La Niña events in the tropical Pacific show approximately the observed frequency distribution and amplitude, which leads to near observed levels of variability on interannual time scales.
Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution
Yoginath, Srikanth B; Perumalla, Kalyan S
2008-01-01
Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.
Large Scale Parallel 3D Simulation of Regional Wave Propagation Using the Earth Simulator
NASA Astrophysics Data System (ADS)
Furumura, T.
2003-12-01
This paper presents an efficient parallel code for seismic wave propagation in 3D heterogeneous structures developed for implementation on the Earth Simulator (5120 CPUs, 40 TFLOPS)_@at the JAMSTEC Yokohama Institute, a high-performance vector parallel system suitable for large-scale simulations. The equations of motion for the 3D wavefield are solved using a higher-order (8,16, 32 etc.) staggered-grid finite-difference method (FDM) in the horizontal (x,y) directions and a conventional fourth-order FDM in the vertical (z) direction. Compared to traditional Fourier pseudospectral method (PSM), the higher-order FDM achieves very good performance on vector processors as well as on the latest high-performance microprocessors (Intel Pentium 4, Itanium 2 etc.). The parallel computing is based on partition of the computational domain, with each subregion assigned to a node of the Earth Simulator. Message passing interface (MPI) inter-node communication is employed for data exchange between subregions. Small-scale heterogeneities such as low-velocity sedimentary basins are accounted for in large-scale models by adopting a multi-grid approach that combines a coarse mesh model with embedded finer mesh model. An accurate interpolation procedure, based on the fast Fourier transform (FFT), is used to combine the wavefield in the different grids. The results of application of the multi-grid, parallel FDM code on the Earth Simulator for modeling strong ground motions from recent large earthquakes are also presented. Events such as the 1993 Kushiro (Mj7.8) and 2000 Tottori-ken Seibu (Mj7.3) earthquakes were simulated using 3D structural models of northern and western Japan. The subsurface structures in Japan were derived by combining data from a number of reflection and refraction experiments, Bouguer anomaly data, and travel-time tomography studies of P and S waves. The scale of the 3D model is about 500 km by 1000 km by 350 km, which is divided into grid intervals of 0.5 to 1 km. The simulation required 128 to 364 Gb of computation memory, and computation took 1 to 2 h using 256 to 1408 processors of the Earth Simulator. Assuming a minimum shear wave velocity of Vs = 1.7 km/s, the modeling is capable of treating high-frequency seismic wave propagations of over 1 to 2 Hz. The volume rendering technique was employed to illuminate the 3D wavefield, and a set of snapshots was combined into a video sequence. The high-resolution 3D simulations for frequencies over 1 Hz provide a good representation of wave propagation in Japan during the large earthquakes. The computer simulation also matches the observations by the dense seismic array (K-Net and KiK-net, over 1700 stations) well, demonstrating the effectiveness of the simulation model. The combined studies of high-resolution computer simulation and dense seismic observation can therefore be expected to be highly valuable in understanding the complex seismic behavior associated with heterogeneities in the subsurface structure, and for predicting the pattern of ground motions expected for future earthquake scenarios.
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
ANNarchy: a code generation approach to neural simulations on parallel hardware.
Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Comparison of serial and parallel simulations of a corridor fire using FDS
NASA Astrophysics Data System (ADS)
Valasek, L.
2015-09-01
Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.
amatos: Parallel adaptive mesh generator for atmospheric and oceanic simulation
NASA Astrophysics Data System (ADS)
Behrens, Jörn; Rakowsky, Natalja; Hiller, Wolfgang; Handorf, Dörthe; Läuter, Matthias; Päpke, Jürgen; Dethloff, Klaus
The grid generator amatos has been developed for adaptive modeling of ocean and atmosphere circulation. It features adaptive control of planar, spherical, and volume grids with triangular or tetrahedral elements refined by bisection. The user interface (GRID API), a Fortran 90 module, shields the application programmer from the technical aspects of mesh adaptation like amatos' hierarchical data structure, the OpenMP parallelization, and the effective calculation of a domain decomposition by a space filling curve (SFC) approach. This article presents the basic structure and features of amatos, the powerful SFC ordering and decomposition of data, and two example applications, namely the modeling of tracer advection in the polar vortex and the development of the adaptive finite element atmosphere model PLASMA (parallel large scale model of the atmosphere).
Parallel computing in enterprise modeling.
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0
NASA Astrophysics Data System (ADS)
Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin
2016-04-01
This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
Parallel Unsteady Turbopump Flow Simulations for Reusable Launch Vehicles
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2000-01-01
An efficient solution procedure for time-accurate solutions of Incompressible Navier-Stokes equation is obtained. Artificial compressibility method requires a fast convergence scheme. Pressure projection method is efficient when small time-step is required. The number of sub-iteration is reduced significantly when Poisson solver employed with the continuity equation. Both computing time and memory usage are reduced (at least 3 times). Other work includes Multi Level Parallelism (MLP) of INS3D, overset connectivity for the validation case, experimental measurements, and computational model for boost pump.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
NASA Astrophysics Data System (ADS)
Rudd, Kevin Edward
In this dissertation, we present two parallelized 3D simulation techniques for three-dimensional acoustic and elastic wave propagation based on the finite integration technique. We demonstrate their usefulness in solving real-world problems with examples in the three very different areas of nondestructive evaluation, medical imaging, and security screening. More precisely, these include concealed weapons detection, periodontal ultrasography, and guided wave inspection of complex piping systems. We have employed these simulation methods to study complex wave phenomena and to develop and test a variety of signal processing and hardware configurations. Simulation results are compared to experimental measurements to confirm the accuracy of the parallel simulation methods.
Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers
Vashishta, Priya; Kalia, Rajiv
2005-02-24
Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.
Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.
1997-03-01
We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.
Xyce parallel electronic simulator users guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users' guide, Version 6.0.1.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Vector and parallel Monte Carlo radiative heat transfer simulation
Burns, P.J. . Dept. of Mechanical Engineering); Pryor, D.V. )
1989-01-01
A fully vectorized version of a Monte Carlo algorithm of radiative heat transfer in two-dimensional geometries is presented. This algorithm differs from previous applications in that its capabilities are more extensive, with arbitrary numbers of surfaces, arbitrary numbers of material properties, and surface characteristics that include transmission, specular reflection, and diffuse reflection (all of which may be functions of the angle of incidence). The algorithm is applied to an irregular, experimental geometry and implemented on a Cyber 205. A speedup factor of approximately 16, for this combination of geometry and material properties, is achieved for the vector version over the scalar code. Issues related to the details of vectorization, including heavy use of bit addressability, the maintaining of long vector lengths, and gather/scatter use, are discussed. The parallel application of this algorithm is straightforward and is discussed in light of architectural differences among several current supercomputers.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
Toward parallel, adaptive mesh refinement for chemically reacting flow simulations
Devine, K.D.; Shadid, J.N.; Salinger, A.G. Hutchinson, S.A.; Hennigan, G.L.
1997-12-01
Adaptive numerical methods offer greater efficiency than traditional numerical methods by concentrating computational effort in regions of the problem domain where the solution is difficult to obtain. In this paper, the authors describe progress toward adding mesh refinement to MPSalsa, a computer program developed at Sandia National laboratories to solve coupled three-dimensional fluid flow and detailed reaction chemistry systems for modeling chemically reacting flow on large-scale parallel computers. Data structures that support refinement and dynamic load-balancing are discussed. Results using uniform refinement with mesh sequencing to improve convergence to steady-state solutions are also presented. Three examples are presented: a lid driven cavity, a thermal convection flow, and a tilted chemical vapor deposition reactor.
Virtual reality visualization of parallel molecular dynamics simulation
Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.
1995-12-31
When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.
Event Based Simulator for Parallel Computing over the Wide Area Network for Real Time Visualization
NASA Astrophysics Data System (ADS)
Sundararajan, Elankovan; Harwood, Aaron; Kotagiri, Ramamohanarao; Satria Prabuwono, Anton
As the computational requirement of applications in computational science continues to grow tremendously, the use of computational resources distributed across the Wide Area Network (WAN) becomes advantageous. However, not all applications can be executed over the WAN due to communication overhead that can drastically slowdown the computation. In this paper, we introduce an event based simulator to investigate the performance of parallel algorithms executed over the WAN. The event based simulator known as SIMPAR (SIMulator for PARallel computation), simulates the actual computations and communications involved in parallel computation over the WAN using time stamps. Visualization of real time applications require steady stream of processed data flow for visualization purposes. Hence, SIMPAR may prove to be a valuable tool to investigate types of applications and computing resource requirements to provide uninterrupted flow of processed data for real time visualization purposes. The results obtained from the simulation show concurrence with the expected performance using the L-BSP model.
Modular high-temperature gas-cooled reactor simulation using parallel processors
Ball, S.J.; Conklin, J.C.
1989-01-01
The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulation routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (''operator'') interaction. The benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses. 3 refs., 3 figs.
Parallelization of a Molecular Dynamics Simulation of AN Ion-Surface Collision System:
NASA Astrophysics Data System (ADS)
Atiş, Murat; Özdoğan, Cem; Güvenç, Ziya B.
Parallel molecular dynamics simulation study of the ion-surface collision system is reported. A sequential molecular dynamics simulation program is converted into a parallel code utilizing the concept of parallel virtual machine (PVM). An effective and favorable algorithm is developed. Our parallelization of the algorithm shows that it is more efficient because of the optimal pair listing, linear scaling, and constant behavior of the internode communications. The code is tested in a distributed memory system consisting of a cluster of eight PCs that run under Linux (Debian 2.4.20 kernel). Our results on the collision system are discussed based on the speed up, efficiency and the system size. Furthermore, the code is used for a full simulation of the Ar-Ni(100) collision system and calculated physical quantities are presented.
Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang
2014-01-01
Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506
Parallel kinetic Monte Carlo simulations of Ag(111) island coarsening using a large database.
Nandipati, Giridhar; Shim, Yunsic; Amar, Jacques G; Karim, Altaf; Kara, Abdelkader; Rahman, Talat S; Trushin, Oleg
2009-02-25
The results of parallel kinetic Monte Carlo (KMC) simulations of the room-temperature coarsening of Ag(111) islands carried out using a very large database obtained via self-learning KMC simulations are presented. Our results indicate that, while cluster diffusion and coalescence play an important role for small clusters and at very early times, at late time the coarsening proceeds via Ostwald ripening, i.e. large clusters grow while small clusters evaporate. In addition, an asymptotic analysis of our results for the average island size S(t) as a function of time t leads to a coarsening exponent n = 1/3 (where S(t)?t(2n)), in good agreement with theoretical predictions. However, by comparing with simulations without concerted (multi-atom) moves, we also find that the inclusion of such moves significantly increases the average island size. Somewhat surprisingly we also find that, while the average island size increases during coarsening, the scaled island-size distribution does not change significantly. Our simulations were carried out both as a test of, and as an application of, a variety of different algorithms for parallel kinetic Monte Carlo including the recently developed optimistic synchronous relaxation (OSR) algorithm as well as the semi-rigorous synchronous sublattice (SL) algorithm. A variation of the OSR algorithm corresponding to optimistic synchronous relaxation with pseudo-rollback (OSRPR) is also proposed along with a method for improving the parallel efficiency and reducing the number of boundary events via dynamic boundary allocation (DBA). A variety of other methods for enhancing the efficiency of our simulations are also discussed. We note that, because of the relatively high temperature of our simulations, as well as the large range of energy barriers (ranging from 0.05 to 0.8eV), developing an efficient algorithm for parallel KMC and/or SLKMC simulations is particularly challenging. However, by using DBA to minimize the number of boundary events, we have achieved significantly improved parallel efficiencies for the OSRPR and SL algorithms. Finally, we note that, among the three parallel algorithms which we have tested here, the semi-rigorous SL algorithm with DBA led to the highest parallel efficiencies. As a result, we have obtained reasonable parallel efficiencies in our simulations of room-temperature Ag(111) island coarsening for a small number of processors (e.g.N(p) = 2 and 4). Since the SL algorithm scales with system size for fixed processor size, we expect that comparable and/or even larger parallel efficiencies should be possible for parallel KMC and/or SLKMC simulations of larger systems with larger numbers of processors. PMID:21817366
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and ?-Hemolysin (?-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and ?-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. PMID:23740647
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Parallel finite element simulation of mooring forces on floating objects
NASA Astrophysics Data System (ADS)
Aliabadi, S.; Abedi, J.; Zellars, B.
2003-03-01
The coupling between the equations governing the free-surface flows, the six degrees of freedom non-linear rigid body dynamics, the linear elasticity equations for mesh-moving and the cables has resulted in a fluid-structure interaction technology capable of simulating mooring forces on floating objects. The finite element solution strategy is based on a combination approach derived from fixed-mesh and moving-mesh techniques. Here, the free-surface flow simulations are based on the Navier-Stokes equations written for two incompressible fluids where the impact of one fluid on the other one is extremely small. An interface function with two distinct values is used to locate the position of the free-surface. The stabilized finite element formulations are written and integrated in an arbitrary Lagrangian-Eulerian domain. This allows us to handle the motion of the time dependent geometries. Forces and momentums exerted on the floating object by both water and hawsers are calculated and used to update the position of the floating object in time. In the mesh moving scheme, we assume that the computational domain is made of elastic materials. The linear elasticity equations are solved to obtain the displacements for each computational node. The non-linear rigid body dynamics equations are coupled with the governing equations of fluid flow and are solved simultaneously to update the position of the floating object. The numerical examples includes a 3D simulation of water waves impacting on a moored floating box and a model boat and simulation of floating object under water constrained with a cable.
Exploiting Quantum Parallelism to Simulate Quantum Random Many-Body Systems
Paredes, B.; Cirac, J.I.; Verstraete, F.
2005-09-30
We present an algorithm that exploits quantum parallelism to simulate randomness in a quantum system. In our scheme, all possible realizations of the random parameters are encoded quantum mechanically in a superposition state of an auxiliary system. We show how our algorithm allows for the efficient simulation of dynamics of quantum random spin chains with known numerical methods. We propose an experimental realization based on atoms in optical lattices in which disorder could be simulated in parallel and in a controlled way through the interaction with another atomic species.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Parallel simulation of subsonic fluid dynamics on a cluster of workstations
NASA Astrophysics Data System (ADS)
Skordos, Panayotis A.
1994-11-01
An effective approach of simulating fluid dynamics on a cluster of non-dedicated workstations is presented. The approach uses local interaction algorithms, small communication capacity, and automatic migration of parallel processes from busy hosts to free hosts. The approach is well-suited for simulating subsonic flow problems which involve both hydrodynamics and acoustic waves, for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed measurements of the parallel efficiency of 2D and 3D simulations are presented, and a theoretical model of efficiency is developed which fits closely the measurements. Two numerical methods of fluid dynamics are tested: explicit finite differences, and the lattice Boltzmann method.
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Time-partitioning simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Milner, Edward J.; Blech, Richard A.; Chima, Rodrick V.
1987-01-01
A technique allowing time-staggered solution of partial differential equations is presented in this report. Using this technique, called time-partitioning, simulation execution speedup is proportional to the number of processors used because all processors operate simultaneously, with each updating of the solution grid at a different time point. The technique is limited by neither the number of processors available nor by the dimension of the solution grid. Time-partitioning was used to obtain the flow pattern through a cascade of airfoils, modeled by the Euler partial differential equations. An execution speedup factor of 1.77 was achieved using a two processor Cray X-MP/24 computer.
Partitioning and packing mathematical simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Arpasi, D. J.; Milner, E. J.
1986-01-01
The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS
R. RYNE; J. QIANG
2000-08-01
In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.
Simulation of optically encoded multiplexing for parallel multipoint sensing.
Babu Rao, C; Chelliah, Pandian; Sahoo, Trilochan
2015-06-20
Spectral emission/absorption-based sensors are commonly used to monitor explosives, narcotics, and other restricted materials in high-security zones such as airports. Monitoring a broad range of spectral wavelengths with high spectral resolution would increase the repertoire of chemicals that can be monitored. However, a portable unit will have limitations in meeting these requirements. Optical fibers can be employed for collecting and transmitting spectral signals from portable sensor heads (PSHs) to a sensitive central spectral analyzer. However, simultaneous detection by sensors in multiple PSHs needs to be differentiated for identifying individual PSHs. An optical encoding method is presented in this paper for use of a portable unit for highly sensitive measurement. The methodology is demonstrated through a simulation using MATLAB Simulink. PMID:26193007
A parallel algorithm for transient solid dynamics simulations with contact detection
Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.
1996-06-01
Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations.
Robust large-scale parallel nonlinear solvers for simulations.
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.
NASA Astrophysics Data System (ADS)
Vaezi, Payam; Holland, Christopher; Tynan, George; Chakraborty Thakur, Saikat; Brandt, Christian
2014-10-01
Previous 2D numerical simulations of collisional drift-wave turbulence in the linear Controlled Shear Decorrelation Experiment (CSDX) device were unable to reproduce experimental observations at magnetic fields above 1.4 kG at either the quantitative or qualitative level. Experimental observations suggest that dynamics of previously neglected ion parallel velocity and associated parallel shear-flow driven instabilities become important at the higher fields. In this poster, we present comparisons of new 3D simulations performed with the BOUT++ framework which include parallel ion velocity dynamics, as well as self-consistent electron temperature fluctuations, to the CSDX observations at multiple magnetic field strengths. We compare the simulated scalings of density and potential fluctuation spectra with magnetic field, as well as radial particle flux and Reynolds stress to 2D results and experimental observations. The comparisons are made using synthetic probe and fast camera diagnostics that incorporate both the electron density and temperature dynamics.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Parallelization issues of a code for physically-based simulation of fabrics
NASA Astrophysics Data System (ADS)
Romero, Sergio; Gutiérrez, Eladio; Romero, Luis F.; Plata, Oscar; Zapata, Emilio L.
2004-10-01
The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis
NASA Technical Reports Server (NTRS)
Sawyer, Darren Charles
1994-01-01
The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
Parallel computation for reservoir thermal simulation: An overlapping domain decomposition approach
NASA Astrophysics Data System (ADS)
Wang, Zhongxiao
2005-11-01
In this dissertation, we are involved in parallel computing for the thermal simulation of multicomponent, multiphase fluid flow in petroleum reservoirs. We report the development and applications of such a simulator. Unlike many efforts made to parallelize locally the solver of a linear equations system which affects the performance the most, this research takes a global parallelization strategy by decomposing the computational domain into smaller subdomains. This dissertation addresses the domain decomposition techniques and, based on the comparison, adopts an overlapping domain decomposition method. This global parallelization method hands over each subdomain to a single processor of the parallel computer to process. Communication is required when handling overlapping regions between subdomains. For this purpose, MPI (message passing interface) is used for data communication and communication control. A physical and mathematical model is introduced for the reservoir thermal simulation. Numerical tests on two sets of industrial data of practical oilfields indicate that this model and the parallel implementation match the history data accurately. Therefore, we expect to use both the model and the parallel code to predict oil production and guide the design, implementation and real-time fine tuning of new well operating schemes. A new adaptive mechanism to synchronize processes on different processors has been introduced, which not only ensures the computational accuracy but also improves the time performance. To accelerate the convergence rate of iterative solution of the large linear equations systems derived from the discretization of governing equations of our physical and mathematical model in space and time, we adopt the ORTHOMIN method in conjunction with an incomplete LU factorization preconditioning technique. Important improvements have been made in both ORTHOMIN method and incomplete LU factorization in order to enhance time performance without affecting the convergence rate of iterative solution. More importantly, the parallel implementation may serve as a working platform for any further research, for example, building and testing new physical and mathematical models, developing and testing new solver of pertinent linear equations system, etc.
A conflict-free, path-level parallelization approach for sequential simulation algorithms
NASA Astrophysics Data System (ADS)
Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.
2015-07-01
Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.
NASA Technical Reports Server (NTRS)
Krosel, S. M.; Milner, E. J.
1982-01-01
The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
Parallel object oriented implementation of a 2D bounded electrostatic plasma PIC simulation
Norton, C.D.; Szymanski, B.K.; Decyk, V.K.
1995-12-01
We discuss the software development issues involved in designing parallel programs using object oriented techniques. Simulations involving 1D and 2D Particle In Cell plasma codes illustrate how C++ programs can effectively describe complex simulations while performing with reasonable efficiency when compared to the equivalent Fortran programs. The scalable object oriented modeling techniques closely match the physical view of the problem, thus supporting modifiability and portability of the code. Selection of a parallel programming paradigm must consider the important factors of efficiency of the computation and the programming implementation effort. C++ and Fortran implementation paradigms are compared and discussed from this point of view.
A parallel finite volume algorithm for large-eddy simulation of turbulent flows
NASA Astrophysics Data System (ADS)
Bui, Trong Tri
1998-11-01
A parallel unstructured finite volume algorithm is developed for large-eddy simulation of compressible turbulent flows. Major components of the algorithm include piecewise linear least-square reconstruction of the unknown variables, trilinear finite element interpolation for the spatial coordinates, Roe flux difference splitting, and second-order MacCormack explicit time marching. The computer code is designed from the start to take full advantage of the additional computational capability provided by the current parallel computer systems. Parallel implementation is done using the message passing programming model and message passing libraries such as the Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The development of the numerical algorithm is presented in detail. The parallel strategy and issues regarding the implementation of a flow simulation code on the current generation of parallel machines are discussed. The results from parallel performance studies show that the algorithm is well suited for parallel computer systems that use the message passing programming model. Nearly perfect parallel speedup is obtained on MPP systems such as the Cray T3D and IBM SP2. Performance comparison with the older supercomputer systems such as the Cray YMP show that the simulations done on the parallel systems are approximately 10 to 30 times faster. The results of the accuracy and performance studies for the current algorithm are reported. To validate the flow simulation code, a number of Euler and Navier-Stokes simulations are done for internal duct flows. Inviscid Euler simulation of a very small amplitude acoustic wave interacting with a shock wave in a quasi-1D convergent-divergent nozzle shows that the algorithm is capable of simultaneously tracking the very small disturbances of the acoustic wave and capturing the shock wave. Navier-Stokes simulations are made for fully developed laminar flow in a square duct, developing laminar flow in a rectangular duct, and developing laminar flow in a 90-degree square bend. The Navier-Stokes solutions show good agreements with available analytical solutions and experimental data. To validate the flow simulation code for turbulence simulation, LES of fully-developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. The accuracy of the above algorithm for turbulence simulations is evaluated by comparison with the DNS solution. The effects of grid resolution, upwind numerical dissipation, and subgrid scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux difference splitting dissipation adversely affect the accuracy of the turbulence simulation. This problem is unique to the turbulence simulation, since it does not occur in the Euler and laminar Navier-Stokes simulations using the same code. For accurate turbulence simulation, it is found that only three to five percent of the standard Roe flux difference splitting dissipation is needed.
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
NASA Astrophysics Data System (ADS)
Perumalla, K.; Fujimoto, R.; Pande, S.; Karimabadi, H.; Driscoll, J.; Omelchenko, Y.
2005-12-01
Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code.
Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J.
1996-09-01
Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.
Parallelization of ALICE simulation - a jump through the looking-glass
NASA Astrophysics Data System (ADS)
Tadel, Matevž; Carminati, Federico
2010-04-01
HEP computing is approaching the end of an era when simulation parallelization could be performed simply by running one instance of full simulation per core. The increasing number of cores and appearance of hardware-thread support both pose a severe limitation on memory and memory-bandwidth available to each execution unit. Typical simulation and reconstruction jobs of AliROOT (offline framework of the ALICE experiment at LHC) do not differ significantly in memory usage - but the input/output rate of reconstruction is approximately three times higher. This makes simulation a more natural candidate for parallelization, especially since the simulation code is relatively stable while the reconstruction code is not expected to settle until the detector is fully calibrated with real data and understood under stable running conditions. We have chosen to use multi-threading solution with one primary particle and all its secondaries being tracked by a given thread. This model corresponds well to Pb-Pb ion collision simulation where 60,000 primary particles need to be transported. After the MC processing of a primary particle is completed, the same thread also performs output serialization. Modifications of ROOT, AliROOT and GEANT3 that were required to perform this task are discussed. Performance of the parallelized version of simulation under varying running conditions is presented.
Parallel implementation of molecular dynamics simulation for short-ranged interaction
NASA Astrophysics Data System (ADS)
Wu, Jong-Shinn; Hsu, Yu-Lin; Lee, Yun-Min
2005-08-01
A parallel molecular dynamics simulation method, designed for large-scale problems, employing dynamic spatial domain decomposition for short-ranged molecular interactions is proposed. In this parallel cellular molecular dynamics (PCMD) simulation method, the link-cell data structure is used to reduce the searching time required for forming the cut-off neighbor list as well as for domain decomposition, which utilizes the multi-level graph-partitioning technique. A simple threshold scheme (STS), in which workload imbalance is monitored and compared with some threshold value during the runtime, is proposed to decide the proper time for repartitioning the domain. The simulation code is implemented and tested on the memory-distributed parallel machine, e.g., PC-cluster system. Parallel performance is studied using approximately one million L-J atoms in the condensed, vaporized and supercritical states. Results show that fairly good parallel efficiency at 49 processors can be obtained for the condensed and supercritical states (˜60%), while it is comparably lower for the vaporized state (˜40%).
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Efficient parallel algorithm for statistical ion track simulations in crystalline materials
NASA Astrophysics Data System (ADS)
Jeon, Byoungseon; Grønbech-Jensen, Niels
2009-02-01
We present an efficient parallel algorithm for statistical Molecular Dynamics simulations of ion tracks in solids. The method is based on the Rare Event Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has been successfully applied to studies of, e.g., ion implantation into crystalline semiconductor wafers. We discuss the strategies for parallelizing the method, and we settle on a host-client type polling scheme in which a multiple of asynchronous processors are continuously fed to the host, which, in turn, distributes the resulting feed-back information to the clients. This real-time feed-back consists of, e.g., cumulative damage information or statistics updates necessary for the cloning in the rare event algorithm. We finally demonstrate the algorithm for radiation effects in a nuclear oxide fuel, and we show the balanced parallel approach with high parallel efficiency in multiple processor configurations.
Massively parallel Monte Carlo for many-particle simulations on GPUs
NASA Astrophysics Data System (ADS)
Glotzer, Sharon; Anderson, Joshua; Jankowski, Eric; Grubb, Thomas; Engel, Michael
2013-03-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. We present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a GeForce GTX 680, our GPU implementation executes 95 times faster than on a single Intel Xeon E5540 CPU core, enabling 17 times better performance per dollar and cutting energy usage by a factor of 10.
NASA Astrophysics Data System (ADS)
Aslani, Mohamad; Regele, Jonathan
2015-11-01
Numerical simulation of incompressible multiphase flows to describe fluid atomization is becoming more common. However, compressible multiphase flow simulations are mostly limited to shock-bubble interactions with only a few studies involving shock waves impacting liquid droplets. A methodology for simulating compressible multiphase flow is developed from existing approaches for the Parallel Adaptive Wavelet-Collocation Method. The method uses an interface capturing function with a steepening procedure for the fluid interface. Simulations of shock waves impacting liquid droplets illustrate the numerical capabilities.
NASA Astrophysics Data System (ADS)
Mizrah, E. A.; Tkachev, S. B.; Shtabel, N. V.
2015-10-01
Solar array simulators are nonlinear control systems designed to reproduce static and dynamic characteristics of solar array. Solar array characteristics depend on illumination, temperature, space environment and other causes. During on-earth testing of spacecraft power systems there is a problem reaching stable work of simulator with different impedance loads in wide range load regulation. In the article authors propose a research method for absolute process stability in solar array simulators and present results of absolute stability research for solar array simulator with continuous parallel type power amplifier.
Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation
NASA Technical Reports Server (NTRS)
Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.
2009-01-01
A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.
Zhao, Jinkui
2011-01-01
IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.
NASA Astrophysics Data System (ADS)
Dror, Ron
2013-03-01
Strong scaling of scientific applications on parallel architectures is increasingly limited by communication latency. This talk will describe the techniques used to reduce latency and mitigate its effects on performance in Anton, a massively parallel special-purpose machine that accelerates molecular dynamics (MD) simulations by orders of magnitude compared with the previous state of the art. Achieving this speedup required both specialized hardware mechanisms and a restructuring of the application software to reduce network latency, sender and receiver overhead, and synchronization costs. Key elements of Anton's approach, in addition to tightly integrated communication hardware, include formulating data transfer in terms of counted remote writes and leveraging fine-grained communication. Anton delivers end-to-end inter-node latency significantly lower than any other large-scale parallel machine, and the total critical-path communication time for an Anton MD simulation is less than 3% that of the next-fastest MD platform.
Xyce parallel electronic simulator reference guide, Version 6.0.1.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] .
A Plane-Parallel Wind Solution for Testing Numerical Simulations of Photoevaporation
NASA Astrophysics Data System (ADS)
Hutchison, Mark A.; Laibe, Guillaume
2016-04-01
Here, we derive a Parker-wind-like solution for a stratified, plane-parallel atmosphere undergoing photoionisation. The difference compared to the standard Parker solar wind is that the sonic point is crossed only at infinity. The simplicity of the analytic solution makes it a convenient test problem for numerical simulations of photoevaporation in protoplanetary discs.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
Massively parallel simulation of flow and transport in variably saturated porous and fractured media
Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten
2002-01-15
This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Moldy: a portable molecular dynamics simulation program for serial and parallel computers
NASA Astrophysics Data System (ADS)
Refson, Keith
2000-04-01
Moldy is a highly portable C program for performing molecular-dynamics simulations of solids and liquids using periodic boundary conditions. It runs in serial mode on a conventional workstation or on a parallel system using an interface to a parallel communications library such as MPI or BSP. The "replicated data" parallelization strategy is used to achieve reasonable performance with a minimal difference between serial and parallel code. The code has been optimized for high performance in both serial and parallel cases. The model system is completely specified in a run-time input file and may contain atoms, molecules or ions in any mixture. Molecules or molecular ions are treated in the rigid-molecule approximation and their rotational motion is modeled using quaternion methods. The equations of motion are integrated using a modified form of the Beeman algorithm. Simulations may be performed in the usual NVE ensemble or in isobaric and/or isothermal ensembles. Potential functions of the Lennard-Jones, 6-exp and MCY forms are supported and the code is structured to give an straightforward interface to add a new functional form. The Ewald method is used to calculate long-ranged electrostatic forces.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems
BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.
1999-10-14
Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.
Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka
2011-11-15
Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
NASA Astrophysics Data System (ADS)
Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji
2013-03-01
This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine
NASA Technical Reports Server (NTRS)
Turner, Mark G.; Topp, David A.
1998-01-01
A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.
NASA Astrophysics Data System (ADS)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik
2015-09-01
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
Parallel simulations of Grover's algorithm for closest match search in neutron monitor data
NASA Astrophysics Data System (ADS)
Kussainov, Arman; White, Yelena
We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
A method for data handling numerical results in parallel OpenFOAM simulations
NASA Astrophysics Data System (ADS)
Anton, Alin; Muntean, Sebastian
2015-12-01
Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit®[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
NASA Technical Reports Server (NTRS)
Lyons, Daniel T.; Desai, Prasun N.
2005-01-01
This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Application of parallel computing techniques to a large-scale reservoir simulation
Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten
2001-02-01
Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.
A Queue Simulation Tool for a High Performance Scientific Computing Center
NASA Technical Reports Server (NTRS)
Spear, Carrie; McGalliard, James
2007-01-01
The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.
2001-08-31
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.
Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S
2003-01-01
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models. PMID:12714301
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.; Mills, R. T.
2014-01-01
To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Long-time atomistic simulations with the Parallel Replica Dynamics method
NASA Astrophysics Data System (ADS)
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors
Nishiura, Daisuke; Sakaguchi, Hide
2011-03-01
Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
Hesse, M.; Birn, J.
1989-01-01
We investigate properties of the electric field component parallel to the magnetic field (E/sub /parallel//) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component. We emphasize particularly the spatial location of E/sub /parallel//, the concept of a diffusion zone and the role of E/sub /parallel// in accelerating electrons. We find a localization of the region of enhanced E/sub /parallel// in all space directions with a strong concentration in the z direction. We identify this region as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B/sub y/ implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B/sub y/ field is strong enough to force particles to follow field lines through the diffusion region. We estimate that for a typical net B/sub y/ field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the northern (southern) hemisphere should dominate for duskward (dawnward) net B/sub y/. In addition, we observe a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras. 12 refs., 9 figs.
Vortex-induced vibration of two parallel risers: Experimental test and numerical simulation
NASA Astrophysics Data System (ADS)
Huang, Weiping; Zhou, Yang; Chen, Haiming
2016-04-01
The vortex-induced vibration of two identical rigidly mounted risers in a parallel arrangement was studied using Ansys- CFX and model tests. The vortex shedding and force were recorded to determine the effect of spacing on the two-degree-of-freedom oscillation of the risers. CFX was used to study the single riser and two parallel risers in 2-8 D spacing considering the coupling effect. Because of the limited width of water channel, only three different riser spacings, 2 D, 3 D, and 4 D, were tested to validate the characteristics of the two parallel risers by comparing to the numerical simulation. The results indicate that the lift force changes significantly with the increase in spacing, and in the case of 3 D spacing, the lift force of the two parallel risers reaches the maximum. The vortex shedding of the risers in 3 D spacing shows that a variable velocity field with the same frequency as the vortex shedding is generated in the overlapped area, thus equalizing the period of drag force to that of lift force. It can be concluded that the interaction between the two parallel risers is significant when the risers are brought to a small distance between them because the trajectory of riser changes from oval to curve 8 as the spacing is increased. The phase difference of lift force between the two risers is also different as the spacing changes.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
NASA Technical Reports Server (NTRS)
Hesse, Michael; Birn, Joachim
1989-01-01
Properties of the electric field component parallel to the magnetic field (E sub parallel) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component were observed. Particularly emphasized was the spatial location of E(sub parallel), the concept of a diffusion zone and the role of E(sub parallel) in accelerating electrons. A localization of the region of enhanced E(sub parallel) in all space directions with a strong concentration in the z direction was found. This region was identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B(sub y) implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B(sub y) field is strong enough to force particles to follow field lines through the diffusion region. It is estimated that for a typical net B(sub y) field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the Northern (Southern) Hemisphere should dominate for duskward (dawnward) net B(sub y). In addition, a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras was observed.
Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas
Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.
2006-07-15
The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.
Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations
NASA Astrophysics Data System (ADS)
Hunsaker, Isaac L.
Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.
A scalable parallel algorithm for large-scale reactive force-field molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Nomura, Ken-ichi; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya
2008-01-01
A scalable parallel algorithm has been designed to perform multimillion-atom molecular dynamics (MD) simulations, in which first principles-based reactive force fields (ReaxFF) describe chemical reactions. Environment-dependent bond orders associated with atomic pairs and their derivatives are reused extensively with the aid of linked-list cells to minimize the computation associated with atomic n-tuple interactions ( n⩽4 explicitly and ⩽6 due to chain-rule differentiation). These n-tuple computations are made modular, so that they can be reconfigured effectively with a multiple time-step integrator to further reduce the computation time. Atomic charges are updated dynamically with an electronegativity equalization method, by iteratively minimizing the electrostatic energy with the charge-neutrality constraint. The ReaxFF-MD simulation algorithm has been implemented on parallel computers based on a spatial decomposition scheme combined with distributed n-tuple data structures. The measured parallel efficiency of the parallel ReaxFF-MD algorithm is 0.998 on 131,072 IBM BlueGene/L processors for a 1.01 billion-atom RDX system.
Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems
D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan
2014-01-01
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators
Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.
1999-11-13
In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.
OSIRIS 2.0: an integrated framework for parallel PIC simulations
NASA Astrophysics Data System (ADS)
Fonseca, Ricardo; Tsung, Frank; Deng, Suzhi; Ren, Chuang
2005-10-01
We describe OSIRIS 2.0 framework, an integrated framework for particle-in-cell (PIC) simulations. This framework is based on a three-dimensional, fully relativistic, massively parallel, object oriented particle-in-cell code, that has successfully been applied to a number of problems, ranging from laser-plasma interaction and inertial fusion to plasma shell collisions in astrophysical scenarios. The OSIRIS 2.0 framework is the new version of the OSIRIS code. Developed in Fortran 95, the code runs on multiple platforms and can be easily ported to new ones. Details on the capabilities of the framework are given, focusing on the new capabilities introduced, such as bessel beams, binary collisions, tunnel (ADK) and impact ionization, and new diagnostics, and also dynamic load balancing and parallel I/O. This framework also includes a visualization and data-analysis infrastructure, tightly integrated into the framework, developed to post-process the scalar and vector results from our simulations.
A Large Scale Simulation of Ultrasonic Wave Propagation in Concrete Using Parallelized EFIT
NASA Astrophysics Data System (ADS)
Nakahata, Kazuyuki; Tokunaga, Jyunichi; Kimoto, Kazushi; Hirose, Sohichi
A time domain simulation tool for the ultrasonic propagation in concrete is developed using the elastodynamic finite integration technique (EFIT) and the image-based modeling. The EFIT is a grid-based time domain differential technique and easily treats the different boundary conditions in the inhomogeneous material such as concrete. Here, the geometry of concrete is determined by a scanned image of concrete and the processed color bitmap image is fed into the EFIT. Although the ultrasonic wave simulation in such a complex material requires much time to calculate, we here execute the EFIT by a parallel computing technique using a shared memory computer system. In this study, formulations of the EFIT and treatment of the different boundary conditions are briefly described and examples of shear horizontal wave propagations in reinforced concrete are demonstrated. The methodology and performance of parallelization for the EFIT are also discussed.
Adaptive finite element simulation of flow and transport applications on parallel computers
NASA Astrophysics Data System (ADS)
Kirk, Benjamin Shelton
The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the design and to demonstrate the capability for resolving complex multiscale processes efficiently and reliably. The first application considered is the simulation of chemotactic biological systems such as colonies of Escherichia coli. This work appears to be the first application of AMR to chemotactic processes. These systems exhibit transient, highly localized features and are important in many biological processes, which make them ideal for simulation with adaptive techniques. A nonlinear reaction-diffusion model for such systems is described and a finite element formulation is developed. The solution methodology is described in detail. Several phenomenological studies are conducted to study chemotactic processes and resulting biological patterns which use the parallel adaptive refinement capability developed in this work. The other application study is much more extensive and deals with fine scale interactions for important hypersonic flows arising in aerospace applications. These flows are characterized by highly nonlinear, convection-dominated flowfields with very localized features such as shock waves and boundary layers. These localized features are well-suited to simulation with adaptive techniques. A novel treatment of the inviscid flux terms arising in a streamline-upwind Petrov-Galerkin finite element formulation of the compressible Navier-Stokes equations is also presented and is found to be superior to the traditional approach. The parallel adaptive finite element formulation is then applied to several complex flow studies, culminating in fully three-dimensional viscous flows about complex geometries such as the Space Shuttle Orbiter. Physical phenomena such as viscous/inviscid interaction, shock wave/boundary layer interaction, shock/shock interaction, and unsteady acoustic-driven flowfield response are considered in detail. A computational investigation of a 25°/55° double cone configuration details the complex multiscale flow features and investigates a potential source of experimentally-observed unsteady flowfield response.
Construction of a parallel processor for simulating manipulators and other mechanical systems
NASA Technical Reports Server (NTRS)
Hannauer, George
1991-01-01
This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Efficient parallelization of short-range molecular dynamics simulations on many-core systems.
Meyer, R
2013-11-01
This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark simulations on a typical 12-core machine show that the described algorithm achieves excellent parallel efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These simulations demonstrate that the algorithm scales well to large numbers of cores. PMID:24329381
Efficient parallelization of short-range molecular dynamics simulations on many-core systems
NASA Astrophysics Data System (ADS)
Meyer, R.
2013-11-01
This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark simulations on a typical 12-core machine show that the described algorithm achieves excellent parallel efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These simulations demonstrate that the algorithm scales well to large numbers of cores.
Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations
NASA Technical Reports Server (NTRS)
Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.
2013-01-01
Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.
Spontaneous hot flow anomalies at quasi-parallel shocks: 2. Hybrid simulations
NASA Astrophysics Data System (ADS)
Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.
2013-01-01