Parallel Discrete Event Simulation
NASA Astrophysics Data System (ADS)
Kunz, Georg
Ever since discrete event simulation has been adopted by a large research community, simulation developers have attempted to draw benefits from executing a simulation on multiple processing units in parallel. Hence, a wide range of research has been conducted on Parallel Discrete Event Simulation (PDES). In this chapter we give an overview of the challenges and approaches of parallel simulation. Furthermore, we present a survey of the parallelization capabilities of the network simulators OMNeT++, ns-2, DSIM and JiST.
Discrete Event Simulation Parallel Discrete-Event Simulation
Discrete Event Simulation Parallel Discrete-Event Simulation Using TM for PDES Conclusions & Future Works Using TM for high-performance Discrete-Event Simulation on multi-core architectures EuroTM 2013 14, 2013 Olivier Dalle Using TM for high-performance Discrete-Event Simulation on mu #12;Discrete
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Parallel Discrete Event Simulation of Lyme Disease
Szymanski, Boleslaw K.
Parallel Discrete Event Simulation of Lyme Disease Ewa Deelman , Thomas Caraco ¡ and Boleslaw K distribution of Lyme disease, currently the most frequently re- ported vector-borne disease of humans). Our goal is to understand patterns in the Lyme disease epidemic at the regional scale through studying
Program For Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.
1991-01-01
User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Relation of Parallel Discrete Event Simulation algorithms with physical models
NASA Astrophysics Data System (ADS)
Shchur, L. N.; Shchur, L. V.
2015-09-01
We extend concept of local simulation times in parallel discrete event simulation (PDES) in order to take into account architecture of the current hardware and software in high-performance computing. We shortly review previous research on the mapping of PDES on physical problems, and emphasise how physical results may help to predict parallel algorithms behaviour.
Continuously Monitored Global Virtual Time in Parallel Discrete Event Simulation
Bystroff, Chris
Continuously Monitored Global Virtual Time in Parallel Discrete Event Simulation Ewa Deelman information. In this paper we present a new algorithm, the Continuously Monitored Global Virtual Time (CMGVT, as well as the incoming and outgoing messages. Therefore, the major drawback of the optimistic 1 #12
Parallel discrete-event simulation of FCFS stochastic queueing networks
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Parallel discrete event simulation: A shared memory approach
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Synchronous parallel system for emulation and discrete event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (inventor)
1992-01-01
A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip
2010-01-01
We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that spatial scattering can often obviate the need for dynamic load balancing.
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor)
1998-01-01
The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.
The IDES framework: A case study in development of a parallel discrete-event simulation system
Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.
1997-12-31
This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.
Distributed discrete event simulation. Final report
De Vries, R.C.
1988-02-01
The presentation given here is restricted to discrete event simulation. The complexity of and time required for many present and potential discrete simulations exceeds the reasonable capacity of most present serial computers. The desire, then, is to implement the simulations on a parallel machine. However, certain problems arise in an effort to program the simulation on a parallel machine. In one category of methods deadlock care arise and some method is required to either detect deadlock and recover from it or to avoid deadlock through information passing. In the second category of methods, potentially incorrect simulations are allowed to proceed. If the situation is later determined to be incorrect, recovery from the error must be initiated. In either case, computation and information passing are required which would not be required in a serial implementation. The net effect is that the parallel simulation may not be much better than a serial simulation. In an effort to determine alternate approaches, important papers in the area were reviewed. As a part of that review process, each of the papers was summarized. The summary of each paper is presented in this report in the hopes that those doing future work in the area will be able to gain insight that might not otherwise be available, and to aid in deciding which papers would be most beneficial to pursue in more detail. The papers are broken down into categories and then by author. Conclusions reached after examining the papers and other material, such as direct talks with an author, are presented in the last section. Also presented there are some ideas that surfaced late in the research effort. These promise to be of some benefit in limiting information which must be passed between processes and in better understanding the structure of a distributed simulation. Pursuit of these ideas seems appropriate.
Performance bounds on parallel self-initiating discrete-event
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
The use is considered of massively parallel architectures to execute discrete-event simulations of what is termed self-initiating models. A logical process in a self-initiating model schedules its own state re-evaluation times, independently of any other logical process, and sends its new state to other logical processes following the re-evaluation. The interest is in the effects of that communication on synchronization. The performance is considered of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and lower bounds on the performance of a new conservative protocol. The analysis of Time Warp includes the overhead costs of state-saving and rollback. The analysis points out sufficient conditions for the conservative protocol to outperform Time Warp. The analysis also quantifies the sensitivity of performance to message fan-out, lookahead ability, and the probability distributions underlying the simulation.
Regenerative Steady-State Simulation of Discrete-Event Systems
Henderson, Shane
Regenerative Steady-State Simulation of Discrete-Event Systems Shane G. Henderson University of Michigan, and Peter W. Glynn Stanford University The regenerative method possesses certain asymptotic. Therefore, applying the regenerative method to steady-state discrete-event system simulations is of great
Synchronization of autonomous objects in discrete event simulation
NASA Technical Reports Server (NTRS)
Rogers, Ralph V.
1990-01-01
Autonomous objects in event-driven discrete event simulation offer the potential to combine the freedom of unrestricted movement and positional accuracy through Euclidean space of time-driven models with the computational efficiency of event-driven simulation. The principal challenge to autonomous object implementation is object synchronization. The concept of a spatial blackboard is offered as a potential methodology for synchronization. The issues facing implementation of a spatial blackboard are outlined and discussed.
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models
Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over the $\\mu$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.
A Comparison of Discrete Event Simulation and System Dynamics for Modelling Healthcare Systems
A Comparison of Discrete Event Simulation and System Dynamics for Modelling Healthcare Systems of these techniques by considering two case studies, a simple discrete event simulation model of HIV/AIDS and a system). Discrete event simulation and system dynamics are two quite different approaches to simulation modelling
Reversible Parallel Discrete Event Formulation of a TLM-based Radio Signal Propagation Model
Seal, Sudip K; Perumalla, Kalyan S
2011-01-01
Radio signal strength estimation is essential in many applications, including the design of military radio communications and industrial wireless installations. For scenarios with large or richly- featured geographical volumes, parallel processing is required to meet the memory and computa- tion time demands. Here, we present a scalable and efficient parallel execution of the sequential model for radio signal propagation recently developed by Nutaro et al. Starting with that model, we (a) provide a vector-based reformulation that has significantly lower computational overhead for event handling, (b) develop a parallel decomposition approach that is amenable to reversibility with minimal computational overheads, (c) present a framework for transparently mapping the conservative time-stepped model into an optimistic parallel discrete event execution, (d) present a new reversible method, along with its analysis and implementation, for inverting the vector-based event model to be executed in an optimistic parallel style of execution, and (e) present performance results from implementation on Cray XT platforms. We demonstrate scalability, with the largest runs tested on up to 127,500 cores of a Cray XT5, enabling simulation of larger scenarios and with faster execution than reported before on the radio propagation model. This also represents the first successful demonstration of the ability to efficiently map a conservative time-stepped model to an optimistic discrete-event execution.
Advances in Discrete-Event Simulation for MSL Command Validation
NASA Technical Reports Server (NTRS)
Patrikalakis, Alexander; O'Reilly, Taifun
2013-01-01
In the last five years, the discrete event simulator, SEQuence GENerator (SEQGEN), developed at the Jet Propulsion Laboratory to plan deep-space missions, has greatly increased uplink operations capacity to deal with increasingly complicated missions. In this paper, we describe how the Mars Science Laboratory (MSL) project makes full use of an interpreted environment to simulate change in more than fifty thousand flight software parameters and conditional command sequences to predict the result of executing a conditional branch in a command sequence, and enable the ability to warn users whenever one or more simulated spacecraft states change in an unexpected manner. Using these new SEQGEN features, operators plan more activities in one sol than ever before.
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena
Perumalla, Kalyan S; Seal, Sudip K
2011-01-01
In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallel execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.
Suppressing Roughness of Virtual Times in Parallel Discrete-Event
Korniss, Gyorgy
, the values of the local state variables change at discrete instants, synchronously or asynchro- nously of synchronization to ensure causality. The instantaneous changes in the local configura- tion are also called, and changes of the orientation of the local magnetic mo- ments, respectively. As the number of PEs on parallel
Enhancing Complex System Performance Using Discrete-Event Simulation
Allgood, Glenn O; Olama, Mohammed M; Lake, Joe E
2010-01-01
In this paper, we utilize discrete-event simulation (DES) merged with human factors analysis to provide the venue within which the separation and deconfliction of the system/human operating principles can occur. A concrete example is presented to illustrate the performance enhancement gains for an aviation cargo flow and security inspection system achieved through the development and use of a process DES. The overall performance of the system is computed, analyzed, and optimized for the different system dynamics. Various performance measures are considered such as system capacity, residual capacity, and total number of pallets waiting for inspection in the queue. These metrics are performance indicators of the system's ability to service current needs and respond to additional requests. We studied and analyzed different scenarios by changing various model parameters such as the number of pieces per pallet ratio, number of inspectors and cargo handling personnel, number of forklifts, number and types of detection systems, inspection modality distribution, alarm rate, and cargo closeout time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted performance measures identified effective ways to meet inspection requirements while maintaining or reducing overall operational cost and eliminating any shipping delays associated with any proposed changes in inspection requirements. With this understanding effective operational strategies can be developed to optimally use personnel while still maintaining plant efficiency, reducing process interruptions, and holding or reducing costs.
Predicting Liver Transplant Capacity Using Discrete Event Simulation.
Toro-Diaz, Hector; Mayorga, Maria E; Barritt, A Sidney; Orman, Eric S; Wheeler, Stephanie B
2014-11-12
The number of liver transplants (LTs) performed in the US increased until 2006 but has since declined despite an ongoing increase in demand. This decline may be due in part to decreased donor liver quality and increasing discard of poor-quality livers. We constructed a discrete event simulation (DES) model informed by current donor characteristics to predict future LT trends through the year 2030. The data source for our model is the United Network for Organ Sharing database, which contains patient-level information on all organ transplants performed in the US. Previous analysis showed that liver discard is increasing and that discarded organs are more often from donors who are older, are obese, have diabetes, and donated after cardiac death. Given that the prevalence of these factors is increasing, the DES model quantifies the reduction in the number of LTs performed through 2030. In addition, the model estimatesthe total number of future donors needed to maintain the current volume of LTs and the effect of a hypothetical scenario of improved reperfusion technology.We also forecast the number of patients on the waiting list and compare this with the estimated number of LTs to illustrate the impact that decreased LTs will have on patients needing transplants. By altering assumptions about the future donor pool, this model can be used to develop policy interventions to prevent a further decline in this lifesaving therapy. To our knowledge, there are no similar predictive models of future LT use based on epidemiological trends. PMID:25391681
Parallel discrete event simulation with predictors
Gummadi, Vidya
1995-01-01
The motivation for this research has been its applicability in sequence checking in a spacecraft's control commands. Spacecrafts are controlled by sequences of time-tagged control commands which are essentially onboard computer programs. 'The...
Interfaces to Enhance User-Directed Experimentation with Simulation Models of Discrete-Event Systems
Herrmann, Jeffrey W.
to improve user- directed experimentation with simulation models of discrete event systems. In user, an analyst conducts simulation runs to estimate system performance and then modifies the simulation modelInterfaces to Enhance User-Directed Experimentation with Simulation Models of Discrete
Integrating Discrete Event and Process-Level Simulation for Training in the I-X Framework
Wickler, G; Tate, Austin; Potter, S
The aim of this paper is to describe I-Sim, a simulation tool that is a fully integrated part of the underlying agent framework, I-X. I-Sim controls a discrete event simulator, based on the same activity model that is shared between all I-X...
Using Discrete-Event Simulation to Model Situational Awareness of Unmanned-Vehicle Operators
Cummings, Mary "Missy"
1 Using Discrete-Event Simulation to Model Situational Awareness of Unmanned-Vehicle Operators Carl on situational awareness as the size of the unmanned vehicle team being supervised is varied. INTRODUCTION N order to achieve the military's future goal of one operator controlling multiple unmanned vehicles (UVs
DISCRETE EVENT SIMULATION OF OPTICAL SWITCH MATRIX PERFORMANCE IN COMPUTER NETWORKS
Imam, Neena; Poole, Stephen W
2013-01-01
In this paper, we present application of a Discrete Event Simulator (DES) for performance modeling of optical switching devices in computer networks. Network simulators are valuable tools in situations where one cannot investigate the system directly. This situation may arise if the system under study does not exist yet or the cost of studying the system directly is prohibitive. Most available network simulators are based on the paradigm of discrete-event-based simulation. As computer networks become increasingly larger and more complex, sophisticated DES tool chains have become available for both commercial and academic research. Some well-known simulators are NS2, NS3, OPNET, and OMNEST. For this research, we have applied OMNEST for the purpose of simulating multi-wavelength performance of optical switch matrices in computer interconnection networks. Our results suggest that the application of DES to computer interconnection networks provides valuable insight in device performance and aids in topology and system optimization.
Fancher, Robert H.
1997-01-01
This study used discrete event simulation to model the personnel recruiting process for a U.S. Army recruiting company. Actual data from the company was collected and used to build the simulation model. The model is run ...
Discrete event simulation of the Defense Waste Processing Facility (DWPF) analytical laboratory
Shanahan, K.L.
1992-02-01
A discrete event simulation of the Savannah River Site (SRS) Defense Waste Processing Facility (DWPF) analytical laboratory has been constructed in the GPSS language. It was used to estimate laboratory analysis times at process analytical hold points and to study the effect of sample number on those times. Typical results are presented for three different simultaneous representing increasing levels of complexity, and for different sampling schemes. Example equipment utilization time plots are also included. SRS DWPF laboratory management and chemists found the simulations very useful for resource and schedule planning.
Using Discrete Event Simulation to predict KPI's at a Projected Emergency Room.
Concha, Pablo; Neriz, Liliana; Parada, Danilo; Ramis, Francisco
2015-01-01
Discrete Event Simulation (DES) is a powerful factor in the design of clinical facilities. DES enables facilities to be built or adapted to achieve the expected Key Performance Indicators (KPI's) such as average waiting times according to acuity, average stay times and others. Our computational model was built and validated using expert judgment and supporting statistical data. One scenario studied resulted in a 50% decrease in the average cycle time of patients compared to the original model, mainly by modifying the patient's attention model. PMID:26262262
Jahn, Beate; Theurl, Engelbert; Siebert, Uwe; Pfeiffer, Karl-Peter
2010-01-01
In most decision-analytic models in health care, it is assumed that there is treatment without delay and availability of all required resources. Therefore, waiting times caused by limited resources and their impact on treatment effects and costs often remain unconsidered. Queuing theory enables mathematical analysis and the derivation of several performance measures of queuing systems. Nevertheless, an analytical approach with closed formulas is not always possible. Therefore, simulation techniques are used to evaluate systems that include queuing or waiting, for example, discrete event simulation. To include queuing in decision-analytic models requires a basic knowledge of queuing theory and of the underlying interrelationships. This tutorial introduces queuing theory. Analysts and decision-makers get an understanding of queue characteristics, modeling features, and its strength. Conceptual issues are covered, but the emphasis is on practical issues like modeling the arrival of patients. The treatment of coronary artery disease with percutaneous coronary intervention including stent placement serves as an illustrative queuing example. Discrete event simulation is applied to explicitly model resource capacities, to incorporate waiting lines and queues in the decision-analytic modeling example. PMID:20345550
Discrete-event simulation for the design and evaluation of physical protection systems
Jordan, S.E.; Snell, M.K.; Madsen, M.M.; Smith, J.S.; Peters, B.A.
1998-08-01
This paper explores the use of discrete-event simulation for the design and control of physical protection systems for fixed-site facilities housing items of significant value. It begins by discussing several modeling and simulation activities currently performed in designing and analyzing these protection systems and then discusses capabilities that design/analysis tools should have. The remainder of the article then discusses in detail how some of these new capabilities have been implemented in software to achieve a prototype design and analysis tool. The simulation software technology provides a communications mechanism between a running simulation and one or more external programs. In the prototype security analysis tool, these capabilities are used to facilitate human-in-the-loop interaction and to support a real-time connection to a virtual reality (VR) model of the facility being analyzed. This simulation tool can be used for both training (in real-time mode) and facility analysis and design (in fast mode).
DeMO: An Ontology for Discrete-event Modeling and Simulation
Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S
2011-01-01
Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Basham, Bryan D.
1989-01-01
CONFIG is a modeling and simulation tool prototype for analyzing the normal and faulty qualitative behaviors of engineered systems. Qualitative modeling and discrete-event simulation have been adapted and integrated, to support early development, during system design, of software and procedures for management of failures, especially in diagnostic expert systems. Qualitative component models are defined in terms of normal and faulty modes and processes, which are defined by invocation statements and effect statements with time delays. System models are constructed graphically by using instances of components and relations from object-oriented hierarchical model libraries. Extension and reuse of CONFIG models and analysis capabilities in hybrid rule- and model-based expert fault-management support systems are discussed.
Developing Flexible Discrete Event Simulation Models in an Uncertain Policy Environment
NASA Technical Reports Server (NTRS)
Miranda, David J.; Fayez, Sam; Steele, Martin J.
2011-01-01
On February 1st, 2010 U.S. President Barack Obama submitted to Congress his proposed budget request for Fiscal Year 2011. This budget included significant changes to the National Aeronautics and Space Administration (NASA), including the proposed cancellation of the Constellation Program. This change proved to be controversial and Congressional approval of the program's official cancellation would take many months to complete. During this same period an end-to-end discrete event simulation (DES) model of Constellation operations was being built through the joint efforts of Productivity Apex Inc. (PAl) and Science Applications International Corporation (SAIC) teams under the guidance of NASA. The uncertainty in regards to the Constellation program presented a major challenge to the DES team, as to: continue the development of this program-of-record simulation, while at the same time remain prepared for possible changes to the program. This required the team to rethink how it would develop it's model and make it flexible enough to support possible future vehicles while at the same time be specific enough to support the program-of-record. This challenge was compounded by the fact that this model was being developed through the traditional DES process-orientation which lacked the flexibility of object-oriented approaches. The team met this challenge through significant pre-planning that led to the "modularization" of the model's structure by identifying what was generic, finding natural logic break points, and the standardization of interlogic numbering system. The outcome of this work resulted in a model that not only was ready to be easily modified to support any future rocket programs, but also a model that was extremely structured and organized in a way that facilitated rapid verification. This paper discusses in detail the process the team followed to build this model and the many advantages this method provides builders of traditional process-oriented discrete event simulations.
Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling
NASA Technical Reports Server (NTRS)
Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.
2010-01-01
NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand DES capabilities to address KSC's planning needs.
The effects of indoor environmental exposures on pediatric asthma: a discrete event simulation model
2012-01-01
Background In the United States, asthma is the most common chronic disease of childhood across all socioeconomic classes and is the most frequent cause of hospitalization among children. Asthma exacerbations have been associated with exposure to residential indoor environmental stressors such as allergens and air pollutants as well as numerous additional factors. Simulation modeling is a valuable tool that can be used to evaluate interventions for complex multifactorial diseases such as asthma but in spite of its flexibility and applicability, modeling applications in either environmental exposures or asthma have been limited to date. Methods We designed a discrete event simulation model to study the effect of environmental factors on asthma exacerbations in school-age children living in low-income multi-family housing. Model outcomes include asthma symptoms, medication use, hospitalizations, and emergency room visits. Environmental factors were linked to percent predicted forced expiratory volume in 1 second (FEV1%), which in turn was linked to risk equations for each outcome. Exposures affecting FEV1% included indoor and outdoor sources of NO2 and PM2.5, cockroach allergen, and dampness as a proxy for mold. Results Model design parameters and equations are described in detail. We evaluated the model by simulating 50,000 children over 10 years and showed that pollutant concentrations and health outcome rates are comparable to values reported in the literature. In an application example, we simulated what would happen if the kitchen and bathroom exhaust fans were improved for the entire cohort, and showed reductions in pollutant concentrations and healthcare utilization rates. Conclusions We describe the design and evaluation of a discrete event simulation model of pediatric asthma for children living in low-income multi-family housing. Our model simulates the effect of environmental factors (combustion pollutants and allergens), medication compliance, seasonality, and medical history on asthma outcomes (symptom-days, medication use, hospitalizations, and emergency room visits). The model can be used to evaluate building interventions and green building construction practices on pollutant concentrations, energy savings, and asthma healthcare utilization costs, and demonstrates the value of a simulation approach for studying complex diseases such as asthma. PMID:22989068
Koala: A DiscreteEvent Simulation Model of Infrastructure Clouds Koala is a discrete, written in SLX1 , facilitates investigation of global behavior throughout a single IaaS cloud. Koala scales. Koala is based loosely on the Amazon Elastic Compute Cloud (EC2) and on Eucalyptus open
Discrete event simulation tool for analysis of qualitative models of continuous processing systems
NASA Technical Reports Server (NTRS)
Malin, Jane T. (inventor); Basham, Bryan D. (inventor); Harris, Richard A. (inventor)
1990-01-01
An artificial intelligence design and qualitative modeling tool is disclosed for creating computer models and simulating continuous activities, functions, and/or behavior using developed discrete event techniques. Conveniently, the tool is organized in four modules: library design module, model construction module, simulation module, and experimentation and analysis. The library design module supports the building of library knowledge including component classes and elements pertinent to a particular domain of continuous activities, functions, and behavior being modeled. The continuous behavior is defined discretely with respect to invocation statements, effect statements, and time delays. The functionality of the components is defined in terms of variable cluster instances, independent processes, and modes, further defined in terms of mode transition processes and mode dependent processes. Model construction utilizes the hierarchy of libraries and connects them with appropriate relations. The simulation executes a specialized initialization routine and executes events in a manner that includes selective inherency of characteristics through a time and event schema until the event queue in the simulator is emptied. The experimentation and analysis module supports analysis through the generation of appropriate log files and graphics developments and includes the ability of log file comparisons.
Towards High Performance Discrete-Event Simulations of Smart Electric Grids
Perumalla, Kalyan S; Nutaro, James J; Yoginath, Srikanth B
2011-01-01
Future electric grid technology is envisioned on the notion of a smart grid in which responsive end-user devices play an integral part of the transmission and distribution control systems. Detailed simulation is often the primary choice in analyzing small network designs, and the only choice in analyzing large-scale electric network designs. Here, we identify and articulate the high-performance computing needs underlying high-resolution discrete event simulation of smart electric grid operation large network scenarios such as the entire Eastern Interconnect. We focus on the simulator's most computationally intensive operation, namely, the dynamic numerical solution for the electric grid state, for both time-integration as well as event-detection. We explore solution approaches using general-purpose dense and sparse solvers, and propose a scalable solver specialized for the sparse structures of actual electric networks. Based on experiments with an implementation in the THYME simulator, we identify performance issues and possible solution approaches for smart grid experimentation in the large.
StratBAM: A Discrete-Event Simulation Model to Support Strategic Hospital Bed Capacity Decisions.
Devapriya, Priyantha; Strömblad, Christopher T B; Bailey, Matthew D; Frazier, Seth; Bulger, John; Kemberling, Sharon T; Wood, Kenneth E
2015-10-01
The ability to accurately measure and assess current and potential health care system capacities is an issue of local and national significance. Recent joint statements by the Institute of Medicine and the Agency for Healthcare Research and Quality have emphasized the need to apply industrial and systems engineering principles to improving health care quality and patient safety outcomes. To address this need, a decision support tool was developed for planning and budgeting of current and future bed capacity, and evaluating potential process improvement efforts. The Strategic Bed Analysis Model (StratBAM) is a discrete-event simulation model created after a thorough analysis of patient flow and data from Geisinger Health System's (GHS) electronic health records. Key inputs include: timing, quantity and category of patient arrivals and discharges; unit-level length of care; patient paths; and projected patient volume and length of stay. Key outputs include: admission wait time by arrival source and receiving unit, and occupancy rates. Electronic health records were used to estimate parameters for probability distributions and to build empirical distributions for unit-level length of care and for patient paths. Validation of the simulation model against GHS operational data confirmed its ability to model real-world data consistently and accurately. StratBAM was successfully used to evaluate the system impact of forecasted patient volumes and length of stay in terms of patient wait times, occupancy rates, and cost. The model is generalizable and can be appropriately scaled for larger and smaller health care settings. PMID:26310949
Wilke, Jeremiah J; Kenny, Joseph P.
2015-02-01
Discrete event simulation provides a powerful mechanism for designing and testing new extreme- scale programming models for high-performance computing. Rather than debug, run, and wait for results on an actual system, design can first iterate through a simulator. This is particularly useful when test beds cannot be used, i.e. to explore hardware or scales that do not yet exist or are inaccessible. Here we detail the macroscale components of the structural simulation toolkit (SST). Instead of depending on trace replay or state machines, the simulator is architected to execute real code on real software stacks. Our particular user-space threading framework allows massive scales to be simulated even on small clusters. The link between the discrete event core and the threading framework allows interesting performance metrics like call graphs to be collected from a simulated run. Performance analysis via simulation can thus become an important phase in extreme-scale programming model and runtime system design via the SST macroscale components.
NASA Technical Reports Server (NTRS)
Dubos, Gregory F.; Cornford, Steven
2012-01-01
While the ability to model the state of a space system over time is essential during spacecraft operations, the use of time-based simulations remains rare in preliminary design. The absence of the time dimension in most traditional early design tools can however become a hurdle when designing complex systems whose development and operations can be disrupted by various events, such as delays or failures. As the value delivered by a space system is highly affected by such events, exploring the trade space for designs that yield the maximum value calls for the explicit modeling of time.This paper discusses the use of discrete-event models to simulate spacecraft development schedule as well as operational scenarios and on-orbit resources in the presence of uncertainty. It illustrates how such simulations can be utilized to support trade studies, through the example of a tool developed for DARPA's F6 program to assist the design of "fractionated spacecraft".
Aggarwal, S.; Ryland, S.; Peck, R.
1980-06-19
This report outlines a methodology to study the effects of disruptive events on nuclear waste material in stable geologic sites. The methodology is based upon developing a discrete events model that can be simulated on the computer. This methodology allows a natural development of simulation models that use computer resources in an efficient manner. Accurate modeling in this area depends in large part upon accurate modeling of ion transport behavior in the storage media. Unfortunately, developments in this area are not at a stage where there is any consensus on proper models for such transport. Consequently, our work is directed primarily towards showing how disruptive events can be properly incorporated in such a model, rather than as a predictive tool at this stage. When and if proper geologic parameters can be determined, then it would be possible to use this as a predictive model. Assumptions and their bases are discussed, and the mathematical and computer model are described.
Using machine learning techniques to interpret results from discrete event
Mladenic, Dunja
Using machine learning techniques to interpret results from discrete event simulation Dunja Mladeni machine learning techniques. The results of two simulators were processed as machine learning problems discovered. Key words: discrete event simulation, machine learning, artificial intelligence 1 Introduction
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
- plained by climate change or human-caused degradation of the landscape. Finally, by placing model villages- ing specific time frames) as the core of its contents. The simulation models for environmental changes simulation model is being developed to meet the project goals. The human and landscape models serve as both
A methodology for fabrication of intelligent discrete-event simulation models
Morgeson, J.D.; Burns, J.R.
1987-01-01
In this article a meta-specification for the software requirements and design of intelligent discrete next-event simulation models has been presented. The specification is consistent with established practices for software development as presented in the software engineering literature. The specification has been adapted to take into consideration the specialized needs of object-oriented programming resulting in the actor-centered taxonomy. The heart of the meta-specification is the methodology for requirements specification and design specification of the model. The software products developed by use of the methodology proposed herein are at the leading edge of technology in two very synergistic disciplines - expert systems and simulation. By incorporating simulation concepts into expert systems a deeper reasoning capability is obtained - one that is able to emulate the dynamics or behavior of the object system or process over time. By including expert systems concepts into simulation, the capability to emulate the reasoning functions of decision-makers involved with (and subsumed by) the object system is attained. In either case the robustness of the technology is greatly enhanced.
Forest biomass supply logistics for a power plant using the discrete-event simulation approach
Mobini, Mahdi; Sowlati, T.; Sokhansanj, Shahabaddine
2011-04-01
This study investigates the logistics of supplying forest biomass to a potential power plant. Due to the complexities in such a supply logistics system, a simulation model based on the framework of Integrated Biomass Supply Analysis and Logistics (IBSAL) is developed in this study to evaluate the cost of delivered forest biomass, the equilibrium moisture content, and carbon emissions from the logistics operations. The model is applied to a proposed case of 300 MW power plant in Quesnel, BC, Canada. The results show that the biomass demand of the power plant would not be met every year. The weighted average cost of delivered biomass to the gate of the power plant is about C$ 90 per dry tonne. Estimates of equilibrium moisture content of delivered biomass and CO2 emissions resulted from the processes are also provided.
Knowledge acquisition for discrete event systems using machine learning
Mladenic, Dunja
Knowledge acquisition for discrete event systems using machine learning Dunja Mladeni'c, 1 and Ivan of discrete event simulation systems is a difficult task. Machine Learning has been investigated to help of discrete event simulation mod els and machine learning as tools for the intelligent analy sis
2012-01-01
Background Previous cost-effectiveness studies of cholinesterase inhibitors have modeled Alzheimer's disease (AD) progression and treatment effects through single or global severity measures, or progression to "Full Time Care". This analysis evaluates the cost-effectiveness of donepezil versus memantine or no treatment in Germany by considering correlated changes in cognition, behavior and function. Methods Rates of change were modeled using trial and registry-based patient level data. A discrete event simulation projected outcomes for three identical patient groups: donepezil 10 mg, memantine 20 mg and no therapy. Patient mix, mortality and costs were developed using Germany-specific sources. Results Treatment of patients with mild to moderately severe AD with donepezil compared to no treatment was associated with 0.13 QALYs gained per patient, and 0.01 QALYs gained per caregiver and resulted in average savings of €7,007 and €9,893 per patient from the healthcare system and societal perspectives, respectively. In patients with moderate to moderately-severe AD, donepezil compared to memantine resulted in QALY gains averaging 0.01 per patient, and savings averaging €1,960 and €2,825 from the healthcare system and societal perspective, respectively. In probabilistic sensitivity analyses, donepezil dominated no treatment in most replications and memantine in over 70% of the replications. Donepezil leads to savings in 95% of replications versus memantine. Conclusions Donepezil is highly cost-effective in patients with AD in Germany, leading to improvements in health outcomes and substantial savings compared to no treatment. This holds across a variety of sensitivity analyses. PMID:22316501
Simulating Billion-Task Parallel Programs
Perumalla, Kalyan S; Park, Alfred J
2014-01-01
In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.
Inflated speedups in parallel simulations via malloc()
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.
Analysis hierarchical model for discrete event systems
NASA Astrophysics Data System (ADS)
Ciortea, E. M.
2015-11-01
The This paper presents the hierarchical model based on discrete event network for robotic systems. Based on the hierarchical approach, Petri network is analysed as a network of the highest conceptual level and the lowest level of local control. For modelling and control of complex robotic systems using extended Petri nets. Such a system is structured, controlled and analysed in this paper by using Visual Object Net ++ package that is relatively simple and easy to use, and the results are shown as representations easy to interpret. The hierarchical structure of the robotic system is implemented on computers analysed using specialized programs. Implementation of hierarchical model discrete event systems, as a real-time operating system on a computer network connected via a serial bus is possible, where each computer is dedicated to local and Petri model of a subsystem global robotic system. Since Petri models are simplified to apply general computers, analysis, modelling, complex manufacturing systems control can be achieved using Petri nets. Discrete event systems is a pragmatic tool for modelling industrial systems. For system modelling using Petri nets because we have our system where discrete event. To highlight the auxiliary time Petri model using transport stream divided into hierarchical levels and sections are analysed successively. Proposed robotic system simulation using timed Petri, offers the opportunity to view the robotic time. Application of goods or robotic and transmission times obtained by measuring spot is obtained graphics showing the average time for transport activity, using the parameters sets of finished products. individually.
A distributed discrete event simulation, or How to steal more CPU cycles than you could ever imagine
Gershwin, Stanley B.
. For example, if one is studying a manufacturing system where machine failures are important to the overall generator, and statistics are collected by executing the program. It may be necessary to run the simulation of statistical considerations, it is common to split a large simulation job into a batch of independent
Guo, Shien; Getsios, Denis; Hernandez, Luis; Cho, Kelly; Lawler, Elizabeth; Altincatal, Arman; Lanes, Stephan; Blankenburg, Michael
2012-01-01
The growing understanding of the use of biomarkers in Alzheimer's disease (AD) may enable physicians to make more accurate and timely diagnoses. Florbetaben, a beta-amyloid tracer used with positron emission tomography (PET), is one of these diagnostic biomarkers. This analysis was undertaken to explore the potential value of florbetaben PET in the diagnosis of AD among patients with suspected dementia and to identify key data that are needed to further substantiate its value. A discrete event simulation was developed to conduct exploratory analyses from both US payer and societal perspectives. The model simulates the lifetime course of disease progression for individuals, evaluating the impact of their patient management from initial diagnostic work-up to final diagnosis. Model inputs were obtained from specific analyses of a large longitudinal dataset from the New England Veterans Healthcare System and supplemented with data from public data sources and assumptions. The analyses indicate that florbetaben PET has the potential to improve patient outcomes and reduce costs under certain scenarios. Key data on the use of florbetaben PET, such as its influence on time to confirmation of final diagnosis, treatment uptake, and treatment persistency, are unavailable and would be required to confirm its value. PMID:23326754
Terminal Dynamics Approach to Discrete Event Systems
NASA Technical Reports Server (NTRS)
Zak, Michail; Meyers, Ronald
1995-01-01
This paper presents and discusses a mathematical formalism for simulation of discrete event dynamic (DED)-a special type of 'man-made' systems to serve specific purposes of information processing. The main objective of this work is to demonstrate that the mathematical formalism for DED can be based upon a terminal model of Newtonian dynamics which allows one to relax Lipschitz conditions at some discrete points.!.
NASA Technical Reports Server (NTRS)
Nicol, David; Fujimoto, Richard
1992-01-01
This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.
Comas, Mercè; Arrospide, Arantzazu; Mar, Javier; Sala, Maria; Vilaprinyó, Ester; Hernández, Cristina; Cots, Francesc; Martínez, Juan; Castells, Xavier
2014-01-01
Objective To assess the budgetary impact of switching from screen-film mammography to full-field digital mammography in a population-based breast cancer screening program. Methods A discrete-event simulation model was built to reproduce the breast cancer screening process (biennial mammographic screening of women aged 50 to 69 years) combined with the natural history of breast cancer. The simulation started with 100,000 women and, during a 20-year simulation horizon, new women were dynamically entered according to the aging of the Spanish population. Data on screening were obtained from Spanish breast cancer screening programs. Data on the natural history of breast cancer were based on US data adapted to our population. A budget impact analysis comparing digital with screen-film screening mammography was performed in a sample of 2,000 simulation runs. A sensitivity analysis was performed for crucial screening-related parameters. Distinct scenarios for recall and detection rates were compared. Results Statistically significant savings were found for overall costs, treatment costs and the costs of additional tests in the long term. The overall cost saving was 1,115,857€ (95%CI from 932,147 to 1,299,567) in the 10th year and 2,866,124€ (95%CI from 2,492,610 to 3,239,638) in the 20th year, representing 4.5% and 8.1% of the overall cost associated with screen-film mammography. The sensitivity analysis showed net savings in the long term. Conclusions Switching to digital mammography in a population-based breast cancer screening program saves long-term budget expense, in addition to providing technical advantages. Our results were consistent across distinct scenarios representing the different results obtained in European breast cancer screening programs. PMID:24832200
2014-01-01
Background Osteoporotic fractures cause a large health burden and substantial costs. This study estimated the expected fracture numbers and costs for the remaining lifetime of postmenopausal women in Germany. Methods A discrete event simulation (DES) model which tracks changes in fracture risk due to osteoporosis, a previous fracture or institutionalization in a nursing home was developed. Expected lifetime fracture numbers and costs per capita were estimated for postmenopausal women (aged 50 and older) at average osteoporosis risk (AOR) and for those never suffering from osteoporosis. Direct and indirect costs were modeled. Deterministic univariate and probabilistic sensitivity analyses were conducted. Results The expected fracture numbers over the remaining lifetime of a 50 year old woman with AOR for each fracture type (% attributable to osteoporosis) were: hip 0.282 (57.9%), wrist 0.229 (18.2%), clinical vertebral 0.206 (39.2%), humerus 0.147 (43.5%), pelvis 0.105 (47.5%), and other femur 0.033 (52.1%). Expected discounted fracture lifetime costs (excess cost attributable to osteoporosis) per 50 year old woman with AOR amounted to €4,479 (€1,995). Most costs were accrued in the hospital €1,743 (€751) and long-term care sectors €1,210 (€620). Univariate sensitivity analysis resulted in percentage changes between -48.4% (if fracture rates decreased by 2% per year) and +83.5% (if fracture rates increased by 2% per year) compared to base case excess costs. Costs for women with osteoporosis were about 3.3 times of those never getting osteoporosis (€7,463 vs. €2,247), and were markedly increased for women with a previous fracture. Conclusion The results of this study indicate that osteoporosis causes a substantial share of fracture costs in postmenopausal women, which strongly increase with age and previous fractures. PMID:24981316
DISCRETE EVENT MODELING IN PTOLEMY II
California at Berkeley, University of
DISCRETE EVENT MODELING IN PTOLEMY II Lukito Muliadi Department of Electrical Engineering in Ptolemy II i Abstract This report describes the discrete-event semantics and its implementation in the Ptolemy II soft- ware architecture. The discrete-event system representation is appropriate for time
Scaling Time Warp-based Discrete Event Execution to 10^{4} Processors on Blue Gene Supercomputer
Perumalla, Kalyan S
2007-01-01
Lately, important large-scale simulation applications, such as emergency/event planning and response, are emerging that are based on discrete event models. The applications are characterized by their scale (several millions of simulated entities), their fine-grained nature of computation (microseconds per event), and their highly dynamic inter-entity event interactions. The desired scale and speed together call for highly scalable parallel discrete event simulation (PDES) engines. However, few such parallel engines have been designed or tested on platforms with thousands of processors. Here an overview is given of a unique PDES engine that has been designed to support Time Warp-style optimistic parallel execution as well as a more generalized mixed, optimistic-conservative synchronization. The engine is designed to run on massively parallel architectures with minimal overheads. A performance study of the engine is presented, including the first results to date of PDES benchmarks demonstrating scalability to as many as 16,384 processors, on an IBM Blue Gene supercomputer. The results show, for the first time, the promise of effectively sustaining very large scale discrete event execution on up to 10^{4} processors.
Xyce parallel electronic simulator.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Parallel Dislocation Simulator
Energy Science and Technology Software Center (ESTSC)
2006-10-30
ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.
Non-Lipschitz Dynamics Approach to Discrete Event Systems
NASA Technical Reports Server (NTRS)
Zak, M.; Meyers, R.
1995-01-01
This paper presents and discusses a mathematical formalism for simulation of discrete event dynamics (DED) - a special type of 'man- made' system designed to aid specific areas of information processing. A main objective is to demonstrate that the mathematical formalism for DED can be based upon the terminal model of Newtonian dynamics which allows one to relax Lipschitz conditions at some discrete points.
Xyce(?) Parallel Electronic Simulator
Energy Science and Technology Software Center (ESTSC)
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore »network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less
Xyce(?) Parallel Electronic Simulator
2013-10-03
The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.
Optimal Discrete Event Supervisory Control of Aircraft Gas Turbine Engines
NASA Technical Reports Server (NTRS)
Litt, Jonathan (Technical Monitor); Ray, Asok
2004-01-01
This report presents an application of the recently developed theory of optimal Discrete Event Supervisory (DES) control that is based on a signed real measure of regular languages. The DES control techniques are validated on an aircraft gas turbine engine simulation test bed. The test bed is implemented on a networked computer system in which two computers operate in the client-server mode. Several DES controllers have been tested for engine performance and reliability.
An algebra of discrete event processes
NASA Technical Reports Server (NTRS)
Heymann, Michael; Meyer, George
1991-01-01
This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.
Discrete Events as Units of Perceived Time
ERIC Educational Resources Information Center
Liverence, Brandon M.; Scholl, Brian J.
2012-01-01
In visual images, we perceive both space (as a continuous visual medium) and objects (that inhabit space). Similarly, in dynamic visual experience, we perceive both continuous time and discrete events. What is the relationship between these units of experience? The most intuitive answer may be similar to the spatial case: time is perceived as an…
Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores
Perumalla, Kalyan S; Park, Alfred J; Tipparaju, Vinod
2014-01-01
Global virtual time (GVT) computation is a key determinant of the efficiency and runtime dynamics of parallel discrete event simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of up to 54 billion events executed per second is registered. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.
Multiple Autonomous Discrete Event Controllers for Constellations
NASA Technical Reports Server (NTRS)
Esposito, Timothy C.
2003-01-01
The Multiple Autonomous Discrete Event Controllers for Constellations (MADECC) project is an effort within the National Aeronautics and Space Administration Goddard Space Flight Center's (NASA/GSFC) Information Systems Division to develop autonomous positioning and attitude control for constellation satellites. It will be accomplished using traditional control theory and advanced coordination algorithms developed by the Johns Hopkins University Applied Physics Laboratory (JHU/APL). This capability will be demonstrated in the discrete event control test-bed located at JHU/APL. This project will be modeled for the Leonardo constellation mission, but is intended to be adaptable to any constellation mission. To develop a common software architecture. the controllers will only model very high-level responses. For instance, after determining that a maneuver must be made. the MADECC system will output B (Delta)V (velocity change) value. Lower level systems must then decide which thrusters to fire and for how long to achieve that (Delta)V.
Nonlinear Control and Discrete Event Systems
NASA Technical Reports Server (NTRS)
Meyer, George; Null, Cynthia H. (Technical Monitor)
1995-01-01
As the operation of large systems becomes ever more dependent on extensive automation, the need for an effective solution to the problem of design and validation of the underlying software becomes more critical. Large systems possesses much detailed structure, typically hierarchical, and they are hybrid. Information processing at the top of the hierarchy is by means of formal logic and sentences; on the bottom it is by means of simple scalar differential equations and functions of time; and in the middle it is by an interacting mix of nonlinear multi-axis differential equations and automata, and functions of time and discrete events. The lecture will address the overall problem as it relates to flight vehicle management, describe the middle level, and offer a design approach that is based on Differential Geometry and Discrete Event Dynamic Systems Theory.
GVT Algorithms and Discrete Event Dynamics on 128K+ Processor Cores
Perumalla, Kalyan S; Park, Alfred J; Tipparaju, Vinod
2011-01-01
Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on over 64,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event execution dynamics on massively parallel platforms.
Parallelizing Timed Petri Net simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1993-01-01
The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Planning and supervision of reactor defueling using discrete event techniques
Garcia, H.E.; Imel, G.R.; Houshyar, A.
1995-12-31
New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.
An assessment of the ModSim/TWOS parallel simulation environment
Rich, D.O.; Michelsen, R.E.
1991-01-01
The Time Warp Operating System (TWOS) has been the focus of significant research in parallel, discrete-event simulation (PDES). A new language, ModSim, has been developed for use in conjunction with TWOS. The coupling of ModSim and TWOS is an attempt to address the development of large-scale, complex, discrete-event simulation models for parallel execution. The approach, simply stated, is to provide a high-level simulation-language that embodies well-known software engineering principles combined with a high-performance parallel execution environment. The inherent difficulty with this approach is the mapping of the simulation application to the parallel run-time environment. To use TWOS, Time Warp applications are currently developed in C and must be tailored according to a set of constraints and conventions. C/TWOS applications are carefully developed using explicit calls to the Time Warp primitives; thus, the mapping of application to parallel run-time environment is done by the application developer. The disadvantage to this approach is the questionable scalability to larger software efforts; the obvious advantage is the degree of control over managing the efficient execution of the application. The ModSim/TWOS system provides an automatic mapping from a ModSim application to an equivalent C/TWOS application. The major flaw with the ModSim/TWOS system is it currently exists is that there is no compiler support for mapping a ModSim application into an efficient C/TWOS application. Moreover, the ModSim language as currently defined does not provide explicit hooks into the Time Warp Operating System and hence the developer is unable to tailor a ModSim application in the same fashion that a C application can be tailored. Without sufficient compiler support, there is a mismatch between ModSim's object-oriented, process-based execution model and the Time Warp execution model.
CAISSON: Interconnect Network Simulator
NASA Technical Reports Server (NTRS)
Springer, Paul L.
2006-01-01
Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.
Xyce parallel electronic simulator design.
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Parallel Network Simulations with NEURON
Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.
2009-01-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
Analytic Perturbation Analysis of Discrete Event Dynamic Systems
Uryasev, S.
1994-09-01
This paper considers a new Analytic Perturbation Analysis (APA) approach for Discrete Event Dynamic Systems (DEDS) with discontinuous sample-path functions with respect to control parameters. The performance functions for DEDS usually are formulated as mathematical expectations, which can be calculated only numerically. APA is based on new analytic formulas for the gradients of expectations of indicator functions; therefore, it is called an analytic perturbation analysis. The gradient of performance function may not coincide with the expectation of a gradient of sample-path function (i.e., the interchange formula for the gradient and expectation sign may not be valid). Estimates of gradients can be obtained with one simulation run of the models.
Computational Issues in Intelligent Control: Discrete-Event and Hybrid Systems
Koutsoukos, Xenofon D.
that are cen- tral in intelligent control. In particular, we discuss how the design, simulationComputational Issues in Intelligent Control: Discrete-Event and Hybrid Systems Xenofon D, IN 46556 e-mail: xkoutsou,antsaklis.1@nd.edu Abstract Intelligent control methodologies are being developed
Graphite : a parallel distributed simulator for multicores
Kasture, Harshad
2010-01-01
This thesis describes Graphite, a parallel, distributed simulator for simulating large-scale multicore architectures, and focuses particularly on the functional aspects of simulating a single, unmodified multi-threaded ...
Modelling machine ensembles with discrete event dynamical system theory
NASA Technical Reports Server (NTRS)
Hunter, Dan
1990-01-01
Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).
Discrete Event Supervisory Control Applied to Propulsion Systems
NASA Technical Reports Server (NTRS)
Litt, Jonathan S.; Shah, Neerav
2005-01-01
The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.
Mutually Nonblocking Supervisory Control of Discrete Event Systems
Kumar, Ratnesh
1 Mutually Nonblocking Supervisory Control of Discrete Event Systems M. Fabian Department to each individual specification. We call this the problem of mutually nonblocking supervision, which. We present a necessary and sufficient condition for the existence of a mutually nonblocking
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offs in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.
Parallel methods for the flight simulation model
Xiong, Wei Zhong; Swietlik, C.
1994-06-01
The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-01-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Parallel numerical reservoir simulation: A feasibility study
Michielse, P.H.
1994-12-31
This paper discusses a feasibility study to implement a parallel reservoir simulator on parallel computers. The basis of this study is a reservoir simulator that models an injection-production mechanism. The simulator implements a multigrid solver for the elliptic part of the equations, and uses adaptive local grid refinement to rack moving fronts in the reservoir. The parallelization method is based on a domain decomposition method, which assigns the subdomains to the processors. In order to obtain a correct solution, communication across the internal boundaries between the subdomains is required. The implementation of the multigrid method imposes restrictions on the domain decomposition. Furthermore, the adaptive local grid refinement may cause the work load distribution over the processors to be out of balance. Hence, some load balancing technique is required to ensure parallel efficiency. This parallel efficiency is illustrated by experiments on a Convex MetaSeries system.
Graphite: A Distributed Parallel Simulator for Multicores
Beckmann, Nathan
2009-11-09
This paper introduces the open-source Graphite distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multicore processors containing dozens, hundreds, ...
Xyce parallel electronic simulator : users' guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
Hierarchical Discrete Event Supervisory Control of Aircraft Propulsion Systems
NASA Technical Reports Server (NTRS)
Yasar, Murat; Tolani, Devendra; Ray, Asok; Shah, Neerav; Litt, Jonathan S.
2004-01-01
This paper presents a hierarchical application of Discrete Event Supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the mission objectives. Each engine is operated under a continuously varying control system that maintains the specified performance and a local discrete-event supervisor for condition monitoring and life extending control. A global upper level DES controller is designed for load balancing and overall health management of the propulsion system.
Parallel Implementation of Power System Dynamic Simulation
Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng; Wu, Di; Chen, Yousu
2013-07-21
Dynamic simulation of power system transient stability is important for planning, monitoring, operation, and control of electrical power systems. However, modeling the system dynamics and network involves the computationally intensive time-domain solution of numerous differential and algebraic equations (DAE). This results in a transient stability implementation that may not maintain the real-time constraints of an online security assessment. This paper presents a parallel implementation of the dynamic simulation on a high-performance computing (HPC) platform using parallel simulation algorithms and computation architectures. It enables the simulation to run even faster than real time, enabling the “look-ahead” capability of upcoming stability problems in the power grid.
Extracting Discrete Event System Models from Hybrid Control Systems
Antsaklis, Panos
consists of three parts. The modeling and interactions of these parts are now described. 2.1 Plant halfspaces are used to define a set of plant events and a discrete event system model is generated which captures the behavior of the plant and interface of the hybrid control sys- tem. 1 Introduction be used
Simulating the scheduling of parallel supercomputer applications
Seager, M.K.; Stichnoth, J.M.
1989-09-19
An Event Driven Simulator for Evaluating Multiprocessing Scheduling (EDSEMS) disciplines is presented. The simulator is made up of three components: machine model; parallel workload characterization ; and scheduling disciplines for mapping parallel applications (many processes cooperating on the same computation) onto processors. A detailed description of how the simulator is constructed, how to use it and how to interpret the output is also given. Initial results are presented from the simulation of parallel supercomputer workloads using Dog-Eat-Dog,'' Family'' and Gang'' scheduling disciplines. These results indicate that Gang scheduling is far better at giving the number of processors that a job requests than Dog-Eat-Dog or Family scheduling. In addition, the system throughput and turnaround time are not adversely affected by this strategy. 10 refs., 8 figs., 1 tab.
Time parallel gravitational collapse simulation
Andreas Kreienbuehl; Pietro Benedusi; Daniel Ruprecht; Rolf Krause
2015-09-04
This article demonstrates the applicability of the parallel-in-time method Parareal to the numerical solution of the Einstein gravity equations for the spherical collapse of a massless scalar field. To account for the shrinking of the spatial domain in time, a tailored load balancing scheme is proposed and compared to load balancing based on number of time steps alone. The performance of Parareal is studied for both the sub-critical and black hole case; our experiments show that Parareal generates substantial speedup and, in the super-critical regime, can also reproduce the black hole mass scaling law.
Visualization and Tracking of Parallel CFD Simulations
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kremenetsky, Mark
1995-01-01
We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Xyce parallel electronic simulator release notes.
Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
Stochastic Parallel PARticle Kinetic Simulator
Energy Science and Technology Software Center (ESTSC)
2008-07-01
SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore »simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less
Stochastic Parallel PARticle Kinetic Simulator
2008-07-01
SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invoked simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.
Parallel Algorithms for Time and Frequency Domain Circuit Simulation
Dong, Wei
2010-10-12
and proposes new techniques for both time-domain and frequency-domain parallel circuit simulations. For time-domain simulation, this dissertation presents a parallel transient simulation methodology. This new approach, called WavePipe, exploits coarse...
NASA Technical Reports Server (NTRS)
Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.
1994-01-01
Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.
Parallel Performance of a Combustion Chemistry Simulation
Skinner, Gregg; Eigenmann, Rudolf
1995-01-01
We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Parallel algorithm strategies for circuit simulation.
Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard
2010-01-01
Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.
Parallel node placement method by bubble simulation
NASA Astrophysics Data System (ADS)
Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang
2014-03-01
An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.
Xyce parallel electronic simulator : reference guide.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.
Fracture simulations via massively parallel molecular dynamics
Holian, B.L.; Abraham, F.F.; Ravelo, R.
1993-09-01
Fracture simulations at the atomistic level have heretofore been carried out for relatively small systems of particles, typically 10,000 or less. In order to study anything approaching a macroscopic system, massively parallel molecular dynamics (MD) must be employed. In two spatial dimensions (2D), it is feasible to simulate a sample that is 0.1 {mu}m on a side. We report on recent MD simulations of mode I crack extension under tensile loading at high strain rates. The method of uniaxial, homogeneously expanding periodic boundary conditions was employed to represent tensile stress conditions near the crack tip. The effects of strain rate, temperature, material properties (equation of state and defect energies), and system size were examined. We found that, in order to mimic a bulk sample, several tricks (in addition to expansion boundary conditions) need to be employed: (1) the sample must be pre-strained to nearly the condition at which the crack will spontaneously open; (2) to relieve the stresses at free surfaces, such as the initial notch, annealing by kinetic-energy quenching must be carried out to prevent unwanted rarefactions; (3) sound waves emitted as the crack tip opens and dislocations emitted from the crack tip during blunting must be absorbed by special reservoir regions. The tricks described briefly in this paper will be especially important to carrying out feasible massively parallel 3D simulations via MD.
Parallel Strategies for Crash and Impact Simulations
Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.
1998-12-07
We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.
Improving ICU patient flow through discrete-event simulation
Christensen, Benjamin A. (Benjamin Arthur)
2012-01-01
Massachusetts General Hospital (MGH), the largest hospital in New England and a national leader in care delivery, teaching, and research, operates ten Intensive Care Units (ICUs), including the 20-bed Ellison 4 Surgical ...
A parallel algorithm for implicit depletant simulations
NASA Astrophysics Data System (ADS)
Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.
2015-11-01
We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ?c < 0.50, and additionally enables simulation of the fluid-solid transition.
Parallel solvers for reservoir simulation on MIMD computers
Piault, E.; Willien, F.; Roux, F.X.
1995-12-01
We have investigated parallel solvers for reservoir simulation. We compare different solvers and preconditioners using T3D and SP1 parallel computers. We use block diagonal domain decomposition preconditioner with non-overlapping sub-domains.
Parallel and Distributed Multi-Algorithm Circuit Simulation
Dai, Ruicheng
2012-10-19
With the proliferation of parallel computing, parallel computer-aided design (CAD) has received significant research interests. Transient transistor-level circuit simulation plays an important role in digital/analog circuit design and verification...
Parallel Proximity Detection for Computer Simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1997-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel Proximity Detection for Computer Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)
1998-01-01
The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver N??T?r. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (N??T?r and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype
ERIC Educational Resources Information Center
Sanchez, A.; Bucio, J.
2012-01-01
This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a…
Decision Making in Fuzzy Discrete Event Systems1.
Lin, F; Ying, H; Macarthur, R D; Cohn, J A; Barth-Jones, D; Crane, L R
2007-09-15
The primary goal of the study presented in this paper is to develop a novel and comprehensive approach to decision making using fuzzy discrete event systems (FDES) and to apply such an approach to real-world problems. At the theoretical front, we develop a new control architecture of FDES as a way of decision making, which includes a FDES decision model, a fuzzy objective generator for generating optimal control objectives, and a control scheme using both disablement and enforcement. We develop an online approach to dealing with the optimal control problem efficiently. As an application, we apply the approach to HIV/AIDS treatment planning, a technical challenge since AIDS is one of the most complex diseases to treat. We build a FDES decision model for HIV/AIDS treatment based on expert's knowledge, treatment guidelines, clinic trials, patient database statistics, and other available information. Our preliminary retrospective evaluation shows that the approach is capable of generating optimal control objectives for real patients in our AIDS clinic database and is able to apply our online approach to deciding an optimal treatment regimen for each patient. In the process, we have developed methods to resolve the following two new theoretical issues that have not been addressed in the literature: (1) the optimal control problem has state dependent performance index and hence it is not monotonic, (2) the state space of a FDES is infinite. PMID:19562097
Parallel Performance of a Combustion Chemistry Simulation Gregg Skinner
Padua, David
combustion- generated pollutants, reducing knocking in internal combustion engines, studyingParallel Performance of a Combustion Chemistry Simulation Gregg Skinner Rudolf Eigenmann Center used a description of a combustion simulation'smathematicaland computationalmethods to develop
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1997-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be encapsulated in C + + code thereby facilitating reuse without undue development effort. The details are covered in the aforementioned section of the interim report filed on April 28, 1997.
d li d l iModeling and Solution Issues in Discrete Event Simulationin Discrete Event Simulation
Grossmann, Ignacio E.
, Global warming) Operator training, model validation (computational pilot plant)Operator training, model in the state of the system. Resources represent anything with restricted capacity Global variables is performed from the calendar Statistics collectors are used to evaluate system performancey p #12
Parallel Monte Carlo simulation of multilattice thin film growth
NASA Astrophysics Data System (ADS)
Shu, J. W.; Lu, Qin; Wong, Wai-on; Huang, Han-chen
2001-07-01
This paper describe a new parallel algorithm for the multi-lattice Monte Carlo atomistic simulator for thin film deposition (ADEPT), implemented on parallel computer using the PVM (Parallel Virtual Machine) message passing library. This parallel algorithm is based on domain decomposition with overlapping and asynchronous communication. Multiple lattices are represented by a single reference lattice through one-to-one mappings, with resulting computational demands being comparable to those in the single-lattice Monte Carlo model. Asynchronous communication and domain overlapping techniques are used to reduce the waiting time and communication time among parallel processors. Results show that the algorithm is highly efficient with large number of processors. The algorithm was implemented on a parallel machine with 50 processors, and it is suitable for parallel Monte Carlo simulation of thin film growth with either a distributed memory parallel computer or a shared memory machine with message passing libraries. In this paper, the significant communication time in parallel MC simulation of thin film growth is effectively reduced by adopting domain decomposition with overlapping between sub-domains and asynchronous communication among processors. The overhead of communication does not increase evidently and speedup shows an ascending tendency when the number of processor increases. A near linear increase in computing speed was achieved with number of processors increases and there is no theoretical limit on the number of processors to be used. The techniques developed in this work are also suitable for the implementation of the Monte Carlo code on other parallel systems.
Applications Parallel PIC plasma simulation through particle
Vlad, Gregorio
for the conÂ®nement degradation of the plasma. www.elsevier.com/locate/parco Parallel Computing 27 (2001) 295, each of them representing a cloud of non-mutually interacting physical particles. The mutual
Parallel architecture for real-time simulation. Master's thesis
Cockrell, C.D.
1989-01-01
This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel architecture based upon The University of Alabama's OPERA architecture. Each component of this system is discussed and rationale presented for its selection. The problem selected, real-time simulation of the Space Shuttle Main Engine for the test and evaluation of the proposed architecture, is explored, identifying the areas where parallelism can be exploited and parallel processing applied. Results from the test and evaluation phase are presented and compared with the results of the same problem that has been processed on a uniprocessor system.
DCCB and SCC Based Fast Circuit Partition Algorithm For Parallel SPICE Simulation
Wang, Yu
DCCB and SCC Based Fast Circuit Partition Algorithm For Parallel SPICE Simulation Xiaowei Zhou, Yu facing VLSI circuits for parallel simulation. This paper presents an efficient circuit partition algorithm specially designed for VLSI circuit partition and parallel simulation. The algorithm
PARALLEL COMPUTER SIMULATION TECHNIQUES FOR THE STUDY OF MACROMOLECULES
Wilson, Mark R.
PARALLEL COMPUTER SIMULATION TECHNIQUES FOR THE STUDY OF MACROMOLECULES Mark R. Wilson and Jaroslav years two important developments in computing have occurred. At the high-cost end of the scale, supercomputers have become parallel comput- ers. The ultra-fast (specialist) processors and the expensive vector-computers
Parallel Simulation of ElectronSolid Interactions Electron Microscopy Modeling
Plimpton, Steve
Page 1 Parallel Simulation of ElectronSolid Interactions for Electron Microscopy Modeling S. J, Monte Carlo, electron, microscopy, random number generation Abstract A parallel implementation Introduction Analytical electron microscopy (AEM) is a tool for characterizing the spatial distribution of ele
Parallel FEM Simulation of Crack Propagation --Challenges, Status, and Perspectives
Stodghill, Paul
Parallel FEM Simulation of Crack Propagation -- Challenges, Status, and Perspectives List and accurate computer simulation of crack propagation in realistic 3D structures would be a valuable tool generation crack propagation simulation software that aims to make this potential a reality. Within the scope
HPC Infrastructure for Solid Earth Simulation on Parallel Computers
NASA Astrophysics Data System (ADS)
Nakajima, K.; Chen, L.; Okuda, H.
2004-12-01
Recently, various types of parallel computers with various types of architectures and processing elements (PE) have emerged, which include PC clusters and the Earth Simulator. Moreover, users can easily access to these computer resources through network on Grid environment. It is well-known that thorough tuning is required for programmers to achieve excellent performance on each computer. The method for tuning strongly depends on the type of PE and architecture. Optimization by tuning is a very tough work, especially for developers of applications. Moreover, parallel programming using message passing library such as MPI is another big task for application programmers. In GeoFEM project (http://gefeom.tokyo.rist.or.jp), authors have developed a parallel FEM platform for solid earth simulation on the Earth Simulator, which supports parallel I/O, parallel linear solvers and parallel visualization. This platform can efficiently hide complicated procedures for parallel programming and optimization on vector processors from application programmers. This type of infrastructure is very useful. Source codes developed on PC with single processor is easily optimized on massively parallel computer by linking the source code to the parallel platform installed on the target computer. This parallel platform, called HPC Infrastructure will provide dramatic efficiency, portability and reliability in development of scientific simulation codes. For example, line number of the source codes is expected to be less than 10,000 and porting legacy codes to parallel computer takes 2 or 3 weeks. Original GeoFEM platform supports only I/O, linear solvers and visualization. In the present work, further development for adaptive mesh refinement (AMR) and dynamic load-balancing (DLB) have been carried out. In this presentation, examples of large-scale solid earth simulation using the Earth Simulator will be demonstrated. Moreover, recent results of a parallel computational steering tool using an MxN communication model will be shown. In an MxN communication model, the large-scale computation modules run on M PE's and high performance parallel visualization modules run on N PE's, concurrently. This can allow computation and visualization to select suitable parallel hardware environments respectively. Meanwhile, real-time steering can be achieved during computation so that the users can check and adjust the computation process in real time. Furthermore, different numbers of PE's can achieve better configuration between computation and visualization under Grid environment.
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
Xyce parallel electronic simulator : users' guide. Version 5.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
Parallel Simulation for VLSI Power Grid
Zhang, Le
2015-07-23
Due to the increasing complexity of VLSI circuits, power grid simulation has become more and more time-consuming. Hence, there is a need for fast and accurate power grid simulator. In order to perform power grid simulation in a timely manner...
Traffic simulations on parallel computers using domain decomposition techniques
Hanebutte, U.R.; Tentner, A.M.
1995-12-31
Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Deiterding, Ralf
-structure interaction simulation of blast and explosions impacting on realistic building structures with a block simulation with a block-structured AMR method 1 #12;Introduction Parallel SAMR Fluid-structure interaction Verification and validation configurations Blast-driven deformation Detonation-driven deformations Conclusions
Parallel simulation of strong ground motions during recent and historical
Furumura, Takashi
Parallel simulation of strong ground motions during recent and historical damaging earthquakes in Tokyo, Japan T. Furumura a,*, L. Chen b a Earthquake Research Institute, University of Tokyo, 1 such as the Earth Simulator supercomputer and the deployment of dense networks of strong ground motion instruments
PARALLEL LOGIC SIMULATION OF MILLION-GATE VLSI CIRCUITS
Varela, Carlos
PARALLEL LOGIC SIMULATION OF MILLION-GATE VLSI CIRCUITS By Lijuan Zhu A Thesis Submitted. Introduction and background . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 VLSI Circuit simulation . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 FPGA/ASIC Design Flow . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Four groups of circuit
An approach to real-time simulation using parallel processing
NASA Technical Reports Server (NTRS)
Blech, R. A.; Arpasi, D. J.
1981-01-01
A preliminary simulator design that uses a parallel computer organization to provide accuracy, portability, and low cost is presented. The hardware and software for this prototype simulator are discussed. A detailed discussion of the inter-computer data transfer mechanism is also presented.
CONSERVATIVE PARALLEL SIMULATION OF A MESSAGEPASSING NETWORK: A PERFORMANCE STUDY
Miguel-Alonso, JosÃ©
). After evaluating the behaviour of the simulators the next step was to work with a realistic model. We simulation, performance evaluation, parallel computers ABSTRACT In this paper we evaluate the behaviour an interconnection network proposed as the communication mechanism of a multicomputer. After a set of experiments, we
SPINET: A Parallel Computing Approach to Spine Simulations
Schneider, Jean-Guy
SPINET: A Parallel Computing Approach to Spine Simulations Peter G. Kropf 1 , Edgar F.A. Lederer 2, and symbolic and modern functional programming. The target application is the human spine. Simulations of the spine help to investigate and better understand the mechanisms of back pain and spinal injury. Two
A Parallel Visualization Pipeline for Terascale Earthquake Simulations
Ma, Kwan-Liu
A Parallel Visualization Pipeline for Terascale Earthquake Simulations Hongfeng Yu Kwan-Liu Ma at the Pittsburgh Supercomputing Center (PSC) for studying the largest earthquake simulation ever performed visualization, vol- ume rendering 1. INTRODUCTION Large-scale computer modeling of the earthquake-induced ground
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.
A hybrid parallel framework for the cellular Potts model simulations
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Parallel Monte Carlo Simulation for control system design
NASA Technical Reports Server (NTRS)
Schubert, Wolfgang M.
1995-01-01
The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.
NASA Technical Reports Server (NTRS)
Zeigler, Bernard P.
1989-01-01
It is shown how systems can be advantageously represented as discrete-event models by using DEVS (discrete-event system specification), a set-theoretic formalism. Such DEVS models provide a basis for the design of event-based logic control. In this control paradigm, the controller expects to receive confirming sensor responses to its control commands within definite time windows determined by its DEVS model of the system under control. The event-based contral paradigm is applied in advanced robotic and intelligent automation, showing how classical process control can be readily interfaced with rule-based symbolic reasoning systems.
Parallel runway requirement analysis study. Volume 2: Simulation manual
NASA Technical Reports Server (NTRS)
Ebrahimi, Yaghoob S.; Chun, Ken S.
1993-01-01
This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10/18/99). At the least, the research would need to be done on Windows 95/Windows NT based platforms. Moreover, with the acquisition of Lahey Fortran package for PC platform, and the existing Borland C + + 5. 0, we can do work on C + + wrapper issues. We have carefully studied the blueprint for Space Transportation Propulsion Integrated Design Environment for the next 25 years [13] and found the inclusion of HBCUs in that effort encouraging. Especially in the long period for which a map is provided, there is no doubt that HBCUs will grow and become better equipped to do meaningful research. In the shorter period, as was suggested in our presentation at the HBCU conference, some key decisions regarding the aging Fortran based software for rocket propellants will need to be made. One important issue is whether or not object oriented languages such as C + + or Java should be used for distributed computing. Whether or not "distributed computing" is necessary for the existing software is yet another, larger, question to be tackled with.
Efficient parallel CFD-DEM simulations using OpenMP
NASA Astrophysics Data System (ADS)
Amritkar, Amit; Deb, Surya; Tafti, Danesh
2014-01-01
The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.
Xyce parallel electronic simulator reference guide, version 6.0.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.
2013-08-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].
Xyce parallel electronic simulator reference guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator : reference guide, version 4.1.
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Parallel Finite Element Simulations of Czochralski Melt Flows
Adjerid, Slimane
Parallel Finite Element Simulations of Czochralski Melt Flows S. Adjerid, J.E. Flaherty, K. Jansen through the temperature. We study melt ows associated with a Czochralski crystal growth process crystal with increasing Grashof number. The popular Czochralski CZ process 2 of bulk crystal growth fea
Merging Parallel Simulation Programs Abhishek Agarwal and Maria Hybinette
Hybinette, Maria
Merging Parallel Simulation Programs Abhishek Agarwal and Maria Hybinette Computer Science, as we merge logi- cal processes that have been previously cloned and we show that this can further be duplicated. We discuss our implementation of merg- ing, and illustrate its effectiveness in several example
Parallelization Strategies for Large Particle Simulations in Astrophysics
NASA Astrophysics Data System (ADS)
Pattabiraman, Bharath
The modeling of collisional N-body stellar systems is a topic of great current interest in several branches of astrophysics and cosmology. These systems are dominated by the physics of relaxation, the collective effect of many weak, random gravitational encounters between stars. They connect directly to our understanding of star clusters, and to the formation of exotic objects such as X-ray binaries, pulsars, and massive black holes. As a prototypical multi-physics, multi-scale problem, the numerical simulation of such systems is computationally intensive, and can only be achieved through high-performance computing. The goal of this thesis is to present parallelization and optimization strategies that can be used to develop efficient computational tools for simulating collisional N-body systems. This leads to major advances: 1) From an astrophysics perspective, these tools enable the study of new physical regimes out of reach by previous simulations. They also lead to much more complete parameter space exploration, allowing direct comparison of numerical results to observational data. 2) On the high-performance computing front, efficient parallelization of a multi-component application requires the meticulous redesign of the various components, as well as innovative parallelization techniques. Many of the challenges faced in this process lie at the very heart of high-performance computing research, including achieving optimal load balancing, maximizing utilization of computational resources, and making effective use of different parallel platforms. For modeling collisional N-body systems, a Monte Carlo approach provides ideal balance between speed and accuracy, as opposed to the more accurate but less scalable direct N-body method. We describe the development of a new version of the Cluster Monte Carlo (CMC) code capable of simulating systems with a realistic number of stars, while accounting for all important physical processes. This efficient and scalable parallel version of CMC runs on both GPUs and distributed-memory architectures. We introduce various parallelization and optimization strategies that include the use of best-suited data structures, adaptive data partitioning schemes, parallel random number generation, parallel I/O, and optimized parallel algorithms, resulting in a very desirable scalability of the run-time with the processor number.
On-Time Diagnosis of Discrete Event Systems Aditya Mahajan and Demosthenis Teneketzis
Mahajan, Aditya
is stopped when no fault had occurred, a false alarm penalty is incurred; on the other hand if a fault had a monitoring rule which min- imizes the worst case cost along all traces of the language de- scribing the discrete event system. An optimal diagnosis rule is determined using a dynamic programming algorithm. An ex
A language measure for performance evaluation of discrete-event supervisory control systems q
Ray, Asok
A language measure for performance evaluation of discrete-event supervisory control systems q Xi a signed real measure of sublanguages of a regular language based on the prin- ciples of automata theory of a regular language for quantitative evaluation of the controlled behavior of a deterministic finite
Safety Control of Discrete Event Systems Using Finite State Machines with Parameters
Lin, Feng
in a relatively short time period is that we adapted a simple model of finite state machines. Because of this, weSafety Control of Discrete Event Systems Using Finite State Machines with Parameters Yi-Liang Chen modeled as finite state machines have been well developed over the years in addressing various fundamental
Department of Electrical Engineering and Computer Science Discrete Event Systems Group
Tilbury, Dawn
Diagnostics in the Industrial World · The Three C's: Cost, Computation, and Customer Satisfaction - Downtime of Electrical Engineering and Computer Science 3 Discrete Event Systems Group Requirements for Industrial Event Systems Group Example 1: Heating, Ventilation, and Air Conditioning Systems Components hard
Survival Curve Estimation for Informatively Coarsened Discrete Event-Time Data
Scharfstein, Daniel
at random (CAR) described previously [24, 8]. There has been some work utilizing auxiliary data on additional information. Others [9] proposed testing CAR using auxiliary data, and Finkelstein, GogginsSurvival Curve Estimation for Informatively Coarsened Discrete Event-Time Data Michelle Shardell1
The parallel subdomain-levelset deflation method in reservoir simulation
NASA Astrophysics Data System (ADS)
van der Linden, J. H.; Jönsthövel, T. B.; Lukyanov, A. A.; Vuik, C.
2016-01-01
Extreme and isolated eigenvalues are known to be harmful to the convergence of an iterative solver. These eigenvalues can be produced by strong heterogeneity in the underlying physics. We can improve the quality of the spectrum by 'deflating' the harmful eigenvalues. In this work, deflation is applied to linear systems in reservoir simulation. In particular, large, sudden differences in the permeability produce extreme eigenvalues. The number and magnitude of these eigenvalues is linked to the number and magnitude of the permeability jumps. Two deflation methods are discussed. Firstly, we state that harmonic Ritz eigenvector deflation, which computes the deflation vectors from the information produced by the linear solver, is unfeasible in modern reservoir simulation due to high costs and lack of parallelism. Secondly, we test a physics-based subdomain-levelset deflation algorithm that constructs the deflation vectors a priori. Numerical experiments show that both methods can improve the performance of the linear solver. We highlight the fact that subdomain-levelset deflation is particularly suitable for a parallel implementation. For cases with well-defined permeability jumps of a factor 104 or higher, parallel physics-based deflation has potential in commercial applications. In particular, the good scalability of parallel subdomain-levelset deflation combined with the robust parallel preconditioner for deflated system suggests the use of this method as an alternative for AMG.
Scalable parallel solution coupling for multiphysics reactor simulation
NASA Astrophysics Data System (ADS)
Tautges, Timothy J.; Caceres, Alvaro
2009-07-01
Reactor simulation depends on the coupled solution of various physics types, including neutronics, thermal/hydraulics, and structural mechanics. This paper describes the formulation and implementation of a parallel solution coupling capability being developed for reactor simulation. The coupling process consists of mesh and coupler initialization, point location, field interpolation, and field normalization. We report here our test of this capability on an example problem, namely, a reflector assembly from an advanced burner test reactor. Performance of this coupler in parallel is reasonable for the chosen problem size and range of processor counts. The runtime is dominated by startup costs, which amortize over the entire coupled simulation. Future efforts will include adding more sophisticated interpolation and normalization methods, to accommodate different numerical solvers used in various physics modules and to obtain better conservation properties for certain field types.
Reusable Component Model Development Approach for Parallel and Distributed Simulation
Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng
2014-01-01
Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751
Parallelization of Program to Optimize Simulated Trajectories (POST3D)
NASA Technical Reports Server (NTRS)
Hammond, Dana P.; Korte, John J. (Technical Monitor)
2001-01-01
This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Xyce Parallel Electronic Simulator Users Guide Version 6.2.
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2014-09-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce%40sandia.gov (outside Sandia) xyce-sandia%40sandia.gov (Sandia only)
Daigle, Matthew
Fault Diagnosis of Continuous Systems Using Discrete-Event Methods Matthew Daigle, Xenofon.j.daigle,xenofon.koutsoukos,gautam.biswas@vanderbilt.edu Abstract-- Fault diagnosis is crucial for ensuring the safe operation of complex engineering systems fault isolation in systems with complex continuous dynamics. This paper presents a novel discrete- event
The Poisson Simulation Approach to Combined Simulation Leif Gustafsson
Aggregation, Modelling, Combined simulation, Continuous System Simulation, Discrete Event Simulation, HybridThe Poisson Simulation Approach to Combined Simulation Leif Gustafsson Signals and Systems, Dept the foundations of combined Discrete Event Simulation (DES) and Continuous Systems Simu- lation (CSS) by extending
Numerical simulation of supersonic wake flow with parallel computers
Wong, C.C.; Soetrisno, M.
1995-07-01
Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.
Casting pearls ballistically: Efficient massively parallel simulation of particle deposition
Lubachevsky, B.D.; Privman, V.; Roy, S.C.
1996-06-01
We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.
Modularized Parallel Neutron Instrument Simulation on the TeraGrid
Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E
2007-01-01
In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.
Adaptive domain decomposition for Monte Carlo simulations on parallel processors
NASA Technical Reports Server (NTRS)
Wilmoth, Richard G.
1990-01-01
A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefield flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.
Adaptive domain decomposition for Monte Carlo simulations on parallel processors
NASA Technical Reports Server (NTRS)
Wilmoth, Richard G.
1991-01-01
A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefied flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.
Parallel algorithms for simulating continuous time Markov chains
NASA Technical Reports Server (NTRS)
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
K-NN algorithm in Parallel VLSI Simulation School of Computer Science
Tropper, Carl
level circuit simulation. A fundamental problem posed by a parallel environment is the decision of whether it is best to simulate a particular circuit sequentially or on a parallel platform.Furthermore, in the event that a circuit should be simulated on a parallel platform, it is necessary to decide how many
At the Biological Modeling and Simulation Frontier
2009-01-01
modeling and simulation (M&S) of biological systems. Wesystem, observed from particular, The Modeling and Simulationmodeling and simulation: integrating discrete event and continuous complex dynamic systems.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.
Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.
2005-06-01
This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6
Parallel conjugate gradient algorithms for manipulator dynamic simulation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
A network of discrete events for the representation and analysis of diffusion dynamics.
Pintus, Alberto M; Pazzona, Federico G; Demontis, Pierfranco; Suffritti, Giuseppe B
2015-11-14
We developed a coarse-grained description of the phenomenology of diffusive processes, in terms of a space of discrete events and its representation as a network. Once a proper classification of the discrete events underlying the diffusive process is carried out, their transition matrix is calculated on the basis of molecular dynamics data. This matrix can be represented as a directed, weighted network where nodes represent discrete events, and the weight of edges is given by the probability that one follows the other. The structure of this network reflects dynamical properties of the process of interest in such features as its modularity and the entropy rate of nodes. As an example of the applicability of this conceptual framework, we discuss here the physics of diffusion of small non-polar molecules in a microporous material, in terms of the structure of the corresponding network of events, and explain on this basis the diffusivity trends observed. A quantitative account of these trends is obtained by considering the contribution of the various events to the displacement autocorrelation function. PMID:26567654
A network of discrete events for the representation and analysis of diffusion dynamics
NASA Astrophysics Data System (ADS)
Pintus, Alberto M.; Pazzona, Federico G.; Demontis, Pierfranco; Suffritti, Giuseppe B.
2015-11-01
We developed a coarse-grained description of the phenomenology of diffusive processes, in terms of a space of discrete events and its representation as a network. Once a proper classification of the discrete events underlying the diffusive process is carried out, their transition matrix is calculated on the basis of molecular dynamics data. This matrix can be represented as a directed, weighted network where nodes represent discrete events, and the weight of edges is given by the probability that one follows the other. The structure of this network reflects dynamical properties of the process of interest in such features as its modularity and the entropy rate of nodes. As an example of the applicability of this conceptual framework, we discuss here the physics of diffusion of small non-polar molecules in a microporous material, in terms of the structure of the corresponding network of events, and explain on this basis the diffusivity trends observed. A quantitative account of these trends is obtained by considering the contribution of the various events to the displacement autocorrelation function.
MapReduce Parallel Cuckoo Hashing and Oblivious RAM Simulations
Goodrich, Michael T
2010-01-01
We present an efficient algorithm for performing cuckoo hashing in the MapReduce parallel model of computation and we show how this result in turn leads to improved methods for performing data-oblivious RAM simulations. Our contributions involve a number of seemingly unrelated new results, including: a parallel MapReduce cuckoo hashing algorithm that runs in O(log n) time and uses O(n) total work, with very high probability a reduction of data-oblivious simulation of sparse-streaming MapReduce algorithms to oblivious sorting an external-memory data-oblivious sorting algorithm using O((N/B) log^2_(M/B) (N/B)) I/Os constant-memory data-oblivious RAM simulation with O(log^2 n) amortized time overhead, with very high probability, or with expected O(log2 n) amortized time overhead and better constant factors sublinear-memory data-oblivious RAM simulation with O(n^nu) private memory and O(log n) amortized time overhead, with very high probability, for constant nu > 0. This last result is, in fact, the main result o...
A massively parallel cellular automaton for the simulation of recrystallization
NASA Astrophysics Data System (ADS)
Kühbach, M.; Barrales-Mora, L. A.; Gottstein, G.
2014-10-01
A new implementation of a cellular automaton for the simulation of primary recrystallization in 3D space is presented. In this new approach, a parallel computer architecture is utilized to partition the simulation domain into multiple computational subdomains that can be treated as coupled, gradually coupled or decoupled entities. This enabled us to identify the characteristic growth length associated with the space repartitioning during nucleus growth. In doing so, several communication strategies between the simulation domains were implemented and tested for accuracy and parallel performance. Specifically, the model was applied to investigate the effect of a gradual spatial decoupling on microstructure evolution during oriented growth of random texture components into a deformed Al single crystal. For a domain discretized into one billion cells, it was found that a particular decoupling strategy resulted in faster executions of about two orders of magnitude and highly accurate simulations. Further partition of the domain into isolated entities systematically and negatively impacts microstructure evolution. We investigated this effect quantitatively by geometrical considerations.
Long-range interactions & parallel scalability in molecular simulations
Michael Patra; Marja T. Hyvonen; Emma Falck; Mohsen Sabouri-Ghomi; Ilpo Vattulainen; Mikko Karttunen
2006-06-21
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modelling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes - we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e., communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Plimpton, Steve; Thompson, Aidan; Crozier, Paul
LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.
Mapping a battlefield simulation onto message-passing parallel architectures
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.
Development of magnetron sputtering simulator with GPU parallel computing
NASA Astrophysics Data System (ADS)
Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil
2014-12-01
Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that initially both simulations are in good agreement; however, differences develop over time due to statistical noise in the PIC-MCC GPGPU model.
Xyce Parallel Electronic Simulator Users Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation
NASA Astrophysics Data System (ADS)
Schneider, Evan E.; Robertson, Brant E.
2015-04-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (?2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
Numerical Simulation of Flow Field Within Parallel Plate Plastometer
NASA Technical Reports Server (NTRS)
Antar, Basil N.
2002-01-01
Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.
High Performance Parallel Methods for Space Weather Simulations
NASA Technical Reports Server (NTRS)
Hunter, Paul (Technical Monitor); Gombosi, Tamas I.
2003-01-01
This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.
Simulation of hypervelocity impact on massively parallel supercomputer
Fang, H.E.
1994-12-31
Hypervelocity impact studies are important for debris shield and armor/anti-armor research and development. Numerical simulations are frequently performed to complement experimental studies, and to evaluate code accuracy. Parametric computational studies involving material properties, geometry and impact velocity can be used to understand hypervelocity impact processes. These impact simulations normally need to address shock wave physics phenomena, material deformation and failure, and motion of debris particles. Detailed, three-dimensional calculations of such events have large memory and processing time requirements. At Sandia National Laboratories, many impact problems of interest require tens of millions of computational cells. Furthermore, even the inadequately resolved problems often require tens or hundred of Cray CPU hours to complete. Recent numerical studies done by Grady and Kipp at Sandia using the Eulerian shock wave physics code CTH demonstrated very good agreement with many features of a copper sphere-on-steel plate oblique impact experiment, fully utilizing the compute power and memory of Sandia`s Cray supercomputer. To satisfy requirements for more finely resolved simulations in order to obtain a better understanding of the crater formation process and impact ejecta motion, the numerical work has been moved from the shared-memory Cray to a large, distributed-memory, massively parallel supercomputing system using PCTH, a parallel version of CTH. The current work is a continuation of the studies, but done on Sandia`s Intel 1840-processor Paragon X/PS parallel computer. With the great compute power and large memory provided by the Paragon, a highly detailed PCTH calculation has been completed for the copper sphere impacting steel plate experiment. Although the PCTH calculation used a mesh which is 4.5 times bigger than the original Cray setup, it finished in much less CPU time.
Massively parallel algorithms for trace-driven cache simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.
1991-01-01
Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines
NASA Technical Reports Server (NTRS)
Kiris, Cetin C.; Kwak, Dochan; Chan, William
2000-01-01
This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.
Niehof, Jonathan T.; Morley, Steven K.
2012-01-01
We review and develop techniques to determine associations between series of discrete events. The bootstrap, a nonparametric statistical method, allows the determination of the significance of associations with minimal assumptions about the underlying processes. We find the key requirement for this method: one of the series must be widely spaced in time to guarantee the theoretical applicability of the bootstrap. If this condition is met, the calculated significance passes a reasonableness test. We conclude with some potential future extensions and caveats on the applicability of these methods. The techniques presented have been implemented in a Python-based software toolkit.
Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems
NASA Astrophysics Data System (ADS)
Cai, K.; Wonham, W. M.
2009-03-01
A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the 'easiest' and 'hardest' boundary cases of localization are discussed.
Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems
Cai, K.; Wonham, W. M.
2009-03-05
A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the 'easiest' and 'hardest' boundary cases of localization are discussed.
NASA Technical Reports Server (NTRS)
Mizell, Carolyn Barrett; Malone, Linda
2007-01-01
The development process for a large software development project is very complex and dependent on many variables that are dynamic and interrelated. Factors such as size, productivity and defect injection rates will have substantial impact on the project in terms of cost and schedule. These factors can be affected by the intricacies of the process itself as well as human behavior because the process is very labor intensive. The complex nature of the development process can be investigated with software development process models that utilize discrete event simulation to analyze the effects of process changes. The organizational environment and its effects on the workforce can be analyzed with system dynamics that utilizes continuous simulation. Each has unique strengths and the benefits of both types can be exploited by combining a system dynamics model and a discrete event process model. This paper will demonstrate how the two types of models can be combined to investigate the impacts of human resource interactions on productivity and ultimately on cost and schedule.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-01-01
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887
Parallel continuous simulated tempering and its applications in large-scale molecular simulations
NASA Astrophysics Data System (ADS)
Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng
2014-07-01
In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Parallel grid library for rapid and flexible simulation development
NASA Astrophysics Data System (ADS)
Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.
2013-04-01
We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.
Simulation of multidimensional gaseous detonations with a parallel adaptive method
NASA Astrophysics Data System (ADS)
Deiterding, Ralf
2008-11-01
A detonation wave is a self-sustained, violent form of shock-induced combustion that is characterized by a subtle energetic interplay between leading hydrodynamic shock wave and following chemical reaction. Multidimensional gaseous detonations never remain planar and instead exhibit transverse shocks that form triple points with transient Mach reflection patterns. Their accurate numerical simulation requires a very high resolution around shock and reaction zone. A parallel adaptive finite volume method for the chemically reactive Euler equations for mixtures of thermally perfect gases has been developed for this purpose. Its key components are a high-resolution shock-capturing scheme of Roe-type, block-structured Cartesian mesh adaptation, and operator splitting to handle stiff, detailed kinetics. Beside simple verification examples to quantify the savings in wall time from mesh adaptation and parallelization, large-scale computations of Chapman-Jouguet detonations in low-pressure hydrogen-oxygen-argon mixtures will be discussed. These computations allowed the detailed analysis of triple point structures under transient conditions and a comparison between two and three space dimensions.
Parallel algorithm for multiscale atomistic/continuum simulations using LAMMPS
NASA Astrophysics Data System (ADS)
Pavia, F.; Curtin, W. A.
2015-07-01
Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. We use the robust Coupled Atomistic/Discrete-Dislocation (CADD) displacement-coupling method, but without the transfer of dislocations between atoms and continuum. The main purpose of the work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending CADD-type coupling to 3D. Validation of the implementation includes the demonstration of (i) stability in finite-temperature dynamics using Langevin dynamics, (ii) elimination of wave reflections due to large dynamic events occurring in the MD region and (iii) the absence of spurious forces acting on dislocations due to the MD/FE coupling, for dislocations further than 10 Å from the coupling boundary. A first non-trivial example application of dislocation glide and bowing around obstacles is shown, for dislocation lengths of??50 nm using fewer than 1 000?000 atoms but reproducing results of extremely large atomistic simulations at much lower computational cost.
Parallel hp-Finite Element Simulations of 3D Resistivity Logging Instruments
Torres-Verdín, Carlos
Methods in Metallurgy AGH University of Science and Technology Abstract We simulate electromagnetic (EM-adaptivity, Finite Element Method, Parallel algorithms, Compu- tational electromagnetics Acknowledgment The second
NASA Astrophysics Data System (ADS)
Jabbarzadeh, A.; Atkinson, J. D.; Tanner, R. I.
1997-12-01
A parallel algorithm has been developed for the simulation of Couette shear flow between structured walls. The algorithm is designed to simulate the shear flow of atomic and molecular fluids. The parallel link-cells model is used for parallelization with some modifications for accomodating the nonperiodic boundaries in the wall direction. Some techniques are also introduced for handling the non-homogeneous nature of the flow in the proximity of the physical walls in order to achieve a balanced workload between processors. PVM (Parallel Virtual Machine) is employed as the message passing paradigm for communication between the processors. The algorithm has been tested for a number of benchmarks with different sizes for simulating the shear flow of n-hexadecane. The maximum number of processors used was 28 DEC Alpha 500/256 workstations which were connected by a 100 Mbits/s Ethernet. A maximum speedup of 11 was obtained with 28 processors. The efficiency ranged from 92% to 40% depending on the number of processors and the system size.
Parallelizing N-Body Simulations on a Heterogeneous Cluster
NASA Astrophysics Data System (ADS)
Stenborg, T. N.
2009-10-01
This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.
A Scalable Parallel Monte Carlo Method for Free Energy Simulations of Molecular Systems
Chan, Derek Y C
A Scalable Parallel Monte Carlo Method for Free Energy Simulations of Molecular Systems MALEK O to the system Hamiltonian. This external potential is related to the free energy. In the parallel implementation77, 2005 Key words: parallel computing; high performance computing; Monte Carlo; free energy; molecular
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
NASA Astrophysics Data System (ADS)
Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan
2016-01-01
3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.
A massively parallel solution strategy for efficient thermal radiation simulation
NASA Astrophysics Data System (ADS)
Nguyen, P. D.; Moureau, V.; Vervisch, L.; Perret, N.
2012-06-01
A novel and efficient methodology to solve the Radiative Transfer Equations (RTE) in thermal radiation is discussed. The BiCGStab(2) iterative solution method, as designed for the non-symmetric linear equation systems, is used to solve the discretized RTE. The numerical upwind and central schemes are blended to provide a stable numerical scheme (MUCS) for interpolation of the cell facial radiation intensities in finite volume formulation. The combination of the BiCGStab(2) and MUCS methods proved to be very efficient when coupling with the DOM approach to solve the RTE. A cost-effective tabulation technique for the gaseous radiative property model SNB-FSCK using 7-point Gauss-Labatto quadrature scheme is also introduced. The whole methodology is implemented into a massively parallel unstructured CFD code where the radiative and fluid flow solutions share the same domain decomposition, which is the bottleneck in current radiative solvers. The dual mesh decomposition at the cell groups level and processors level is adopted to optimize the CFD code for massively parallel computing. The whole method is applied to simulate the radiation heat-transfer in a 3D rectangular enclosure containing non-isothermal CO2 and H2O mixtures. Two test cases are studied for homogeneous and inhomogeneous distributions of CO2 and H2O in the enclosure. The result is reported for the heat flux and radiation energy source and the comparison is also made between the present methodology BiCGStab(2)/MUCS/tabulated SNB-FSCK, the benchmark method SNB-CK (implemented at 25cm-1 narrow-band) and some other methods available in the literature. The present method (BiCGStab(2)/MUCS/tabulated SNB-FSCK) yields more accurate predictions particularly for the radiation source term. When comparing with the benchmark solution, the relative error of the radiation source term is remarkably reduced to less than 4% and the CPU time is drastically diminished.
Xyce Parallel Electronic Simulator Reference Guide Version 6.4
Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory
2015-12-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)
Exception handling controllers: An application of pushdown systems to discrete event control
Griffin, Christopher H
2008-01-01
Recent work by the author has extended the Supervisory Control Theory to include the class of control languages defined by pushdown machines. A pushdown machine is a finite state machine extended by an infinite stack memory. In this paper, we define a specific type of deterministic pushdown machine that is particularly useful as a discrete event controller. Checking controllability of pushdown machines requires computing the complement of the controller machine. We show that Exception Handling Controllers have the property that algorithms for taking their complements and determining their prefix closures are nearly identical to the algorithms available for finite state machines. Further, they exhibit an important property that makes checking for controllability extremely simple. Hence, they maintain the simplicity of the finite state machine, while providing the extra power associated with a pushdown stack memory. We provide an example of a useful control specification that cannot be implemented using a finite state machine, but can be implemented using an Exception Handling Controller.
Parallel Computation in Simulating Di usion and Deformation in Human Brain
Zhang, Jun
Parallel Computation in Simulating Di#11;usion and Deformation in Human Brain #3; Ning Kang y Jun of parallel and high performance computation in simulating the di#11;usion process in the human brain and in modeling the deformation of the human brain. Computational neuroscience is a branch of biomedical science
Particle/Continuum Hybrid Simulation in a Parallel Computing Environment
NASA Technical Reports Server (NTRS)
Baganoff, Donald
1996-01-01
The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.
Using Discrete Event Simulation to Model Multi-Robot Multi-Operator Teamwork
Gao, F.
With the increasing need for teams of operators in controlling multiple robots, it is important to understand how to construct the team and support team processes. While running experiments can be time consuming and ...
Analysis of a hospital network transportation system with discrete event simulation
Kwon, Annie Y. (Annie Yean)
2011-01-01
VA New England Healthcare System (VISN1) provides transportation to veterans between eight medical centers and over 35 Community Based Outpatient Clinics across New England. Due to high variation in its geographic area, ...
Using Discrete Event Simulation to Model Multi-Robot Multi-Operator Teamwork
Cummings, Mary "Missy"
. The teamwork of multi-robot and multi-operator teams can be modeled based on queuing theory as well. However queuing theory to model the operator utilization of air traffic controllers. Divita et al. (2004) modeled a team performing supervisory control of an air defense warfare system using queuing theory. However
DISCRETE EVENT MODELING, SIMULATION AND CONTROL WITH APPLICATION TO SENSOR BASED
ROBOTICS BY Shahab Sheikh-Bahaei B.S., Electrical Engineering, Isfahan University of Technology, 1999 and support of my advisor Professor Mo Jamshidi, Director of Center for Autonomous Control Engineering (ACE for Autonomous Control Engineering (ACE)" for financial support of this work1 . And finally, and the most
A discrete event simulation model for unstructured supervisory control of unmanned vehicles
McDonald, Anthony D. (Anthony Douglas)
2010-01-01
Most current Unmanned Vehicle (UV) systems consist of teams of operators controlling a single UV. Technological advances will likely lead to the inversion of this ratio, and automation of low level tasking. These advances ...
Quantifying supply chain disruption risk using Monte Carlo and discrete-event simulation
Schmitt, Amanda J.
We present a model constructed for a large consumer products company to assess their vulnerability to disruption risk and quantify its impact on customer service. Risk profiles for the locations and connections in the ...
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
ANNarchy: a code generation approach to neural simulations on parallel hardware
Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
ANNarchy: a code generation approach to neural simulations on parallel hardware.
Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H
2015-01-01
Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957
Comparison of serial and parallel simulations of a corridor fire using FDS
NASA Astrophysics Data System (ADS)
Valasek, L.
2015-09-01
Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.
NASA Technical Reports Server (NTRS)
Hsieh, Shang-Hsien; Abel, J. F.
1993-01-01
The principal objective of this research is to investigate, develop and demonstrate coarse-grained, parallel-processing strategies for nonlinear dynamic simulations for rotating bladed-disk assemblies. The parallel -processing strategies addressed include numerical algorithms for parallel nonlinear solutions and techniques to effect load balancing among processors. The parallel environment employed is a distributed-memory, coarse-grained one consisting of networked workstations. A parallel explicit time integration method has been implemented for transient nonlinear solutions of rotationg bladed-disk assemblies. Automatic domain partitioning techniques have been investigated for load balancing among processors. Advanced computing environments, data structures and interactive computer graphics all contribute to an integrated parallel finite element analysis system to facilitate more efficient and powerful dynamic simulations.
L'Ecuyer, Pierre
Discrete Event Dynamic Systems: Theory and Applications, *, 1--24 (Draft: April 18, 1996) c fl and experimental design approaches, in order to evaluate the expected performance of a discrete event system parameters of the underlying probability law (for the system's evolution) and parameters of the sample
Parallel computing in enterprise modeling.
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations
Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia
2010-08-01
Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.
Electrical Simulations of Series and Parallel PV Arc-Faults Jack Flicker and Jay Johnson
Electrical Simulations of Series and Parallel PV Arc-Faults Jack Flicker and Jay Johnson Sandia this danger by requiring arc-fault circuit interrupters (AFCI). Currently, the requirement is only for series arc-faults, but to fully protect PV installations from arc-fault-generated fires, parallel arc
Simulation of Earthquake Liquefaction Response on Parallel Computers , K. H. Law3
Stanford University
1 Simulation of Earthquake Liquefaction Response on Parallel Computers J. Peng1 , J. Lu2 , K. H. Introduction Large-scale finite element simulations of earthquake ground response including liquefaction of current operating systems (e.g. Linux, MS Windows, and so forth), large-scale earthquake simulations may
A Study of Load Imbalance for Parallel Reservoir Simulation with Multiple Partitioning Strategies
Guo, Xuyang
2015-07-27
High performance computing is an option to increase reservoir simulation efficiency. However, highly scalable and efficient parallel application is not always easy to obtain from case to case. Load imbalance caused by mesh ...
A simulator for adaptive parallel applications Basile Schaeli, Sebastian Gerlach, Roger D. Hersch
Hersch, Roger D.
A simulator for adaptive parallel applications Basile Schaeli, Sebastian Gerlach, Roger D. Hersch Sciences {basile.schaeli, sebastian.gerlach, rd.hersch}@epfl.ch Abstract Dynamically allocating computing
NASA Astrophysics Data System (ADS)
Rudd, Kevin Edward
In this dissertation, we present two parallelized 3D simulation techniques for three-dimensional acoustic and elastic wave propagation based on the finite integration technique. We demonstrate their usefulness in solving real-world problems with examples in the three very different areas of nondestructive evaluation, medical imaging, and security screening. More precisely, these include concealed weapons detection, periodontal ultrasography, and guided wave inspection of complex piping systems. We have employed these simulation methods to study complex wave phenomena and to develop and test a variety of signal processing and hardware configurations. Simulation results are compared to experimental measurements to confirm the accuracy of the parallel simulation methods.
O( N) parallel tight binding molecular dynamics simulation of carbon nanotubes
NASA Astrophysics Data System (ADS)
Özdo?an, Cem; Dereli, Gülay; Ça??n, Tahir
2002-10-01
We report an O( N) parallel tight binding molecular dynamics simulation study of (10×10) structured carbon nanotubes (CNT) at 300 K. We converted a sequential O( N3) TBMD simulation program into an O( N) parallel code, utilizing the concept of parallel virtual machines (PVM). The code is tested in a distributed memory system consisting of a cluster with 8 PC's that run under Linux (Slackware 2.2.13 kernel). Our results on the speed up, efficiency and system size are given.
NASA Astrophysics Data System (ADS)
Wu, Di M.; Zhao, S. S.; Lu, Jun Q.; Hu, Xin-Hua
2000-06-01
In Monte Carlo simulations of light propagating in biological tissues, photons propagating in the media are described as classic particles being scattered and absorbed randomly in the media, and their path are tracked individually. To obtain any statistically significant results, however, a large number of photons is needed in the simulations and the calculations are time consuming and sometime impossible with existing computing resource, especially when considering the inhomogeneous boundary conditions. To overcome this difficulty, we have implemented a parallel computing technique into our Monte Carlo simulations. And this moment is well justified due to the nature of the Monte Carlo simulation. Utilizing the PVM (Parallel Virtual Machine, a parallel computing software package), parallel codes in both C and Fortran have been developed on the massive parallel computer of Cray T3E and a local PC-network running Unix/Sun Solaris. Our results show that parallel computing can significantly reduce the running time and make efficient usage of low cost personal computers. In this report, we present a numerical study of light propagation in a slab phantom of skin tissue using the parallel computing technique.
Xyce parallel electronic simulator users' guide, Version 6.0.1.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.1
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory
2014-03-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Humans can integrate feedback of discrete events in their sensorimotor control of a robotic hand
Segil, Jacob L.; Clemente, Francesco; Weir, Richard F. ff; Edin, Benoni
2015-01-01
Providing functionally effective sensory feedback to users of prosthetics is a largely unsolved challenge. Traditional solutions require high band-widths for providing feedback for the control of manipulation and yet have been largely unsuccessful. In this study, we have explored a strategy that relies on temporally discrete sensory feedback that is technically simple to provide. According to the Discrete Event-driven Sensory feedback Control (DESC) policy, motor tasks in humans are organized in phases delimited by means of sensory encoded discrete mechanical events. To explore the applicability of DESC for control, we designed a paradigm in which healthy humans operated an artificial robot hand to lift and replace an instrumented object, a task that can readily be learned and mastered under visual control. Assuming that the central nervous system of humans naturally organizes motor tasks based on a strategy akin to DESC, we delivered short-lasting vibrotactile feedback related to events that are known to forcefully affect progression of the grasp-lift-and-hold task. After training, we determined whether the artificial feedback had been integrated with the sensorimotor control by introducing short delays and we indeed observed that the participants significantly delayed subsequent phases of the task. This study thus gives support to the DESC policy hypothesis. Moreover, it demonstrates that humans can integrate temporally discrete sensory feedback while controlling an artificial hand and invites further studies in which inexpensive, noninvasive technology could be used in clever ways to provide physiologically appropriate sensory feedback in upper limb prosthetics with much lower band-width requirements than with traditional solutions. PMID:24992899
Humans can integrate feedback of discrete events in their sensorimotor control of a robotic hand.
Cipriani, Christian; Segil, Jacob L; Clemente, Francesco; ff Weir, Richard F; Edin, Benoni
2014-11-01
Providing functionally effective sensory feedback to users of prosthetics is a largely unsolved challenge. Traditional solutions require high band-widths for providing feedback for the control of manipulation and yet have been largely unsuccessful. In this study, we have explored a strategy that relies on temporally discrete sensory feedback that is technically simple to provide. According to the Discrete Event-driven Sensory feedback Control (DESC) policy, motor tasks in humans are organized in phases delimited by means of sensory encoded discrete mechanical events. To explore the applicability of DESC for control, we designed a paradigm in which healthy humans operated an artificial robot hand to lift and replace an instrumented object, a task that can readily be learned and mastered under visual control. Assuming that the central nervous system of humans naturally organizes motor tasks based on a strategy akin to DESC, we delivered short-lasting vibrotactile feedback related to events that are known to forcefully affect progression of the grasp-lift-and-hold task. After training, we determined whether the artificial feedback had been integrated with the sensorimotor control by introducing short delays and we indeed observed that the participants significantly delayed subsequent phases of the task. This study thus gives support to the DESC policy hypothesis. Moreover, it demonstrates that humans can integrate temporally discrete sensory feedback while controlling an artificial hand and invites further studies in which inexpensive, noninvasive technology could be used in clever ways to provide physiologically appropriate sensory feedback in upper limb prosthetics with much lower band-width requirements than with traditional solutions. PMID:24992899
Parallelization of a Molecular Dynamics Simulation of AN Ion-Surface Collision System:
NASA Astrophysics Data System (ADS)
Ati?, Murat; Özdo?an, Cem; Güvenç, Ziya B.
Parallel molecular dynamics simulation study of the ion-surface collision system is reported. A sequential molecular dynamics simulation program is converted into a parallel code utilizing the concept of parallel virtual machine (PVM). An effective and favorable algorithm is developed. Our parallelization of the algorithm shows that it is more efficient because of the optimal pair listing, linear scaling, and constant behavior of the internode communications. The code is tested in a distributed memory system consisting of a cluster of eight PCs that run under Linux (Debian 2.4.20 kernel). Our results on the collision system are discussed based on the speed up, efficiency and the system size. Furthermore, the code is used for a full simulation of the Ar-Ni(100) collision system and calculated physical quantities are presented.
A Queue Simulation Tool for a High Performance Scientific Computing Center
NASA Technical Reports Server (NTRS)
Spear, Carrie; McGalliard, James
2007-01-01
The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.
Tropper, Carl
Parallel Logic Simulation of Million-Gate VLSI Circuits Lijuan Zhu, Gilbert Chen, and Boleslaw K]. Con- sequently, parallel simulation has become a necessity for such circuits. 1.1. A Viterbi decoder design As our benchmark, we selected the circuits implement- ing a state-parallel RE Viterbi decoder
A parallel finite element simulator for ion transport through three-dimensional ion channel systems.
Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo
2013-09-15
A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and ?-Hemolysin (?-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and ?-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. PMID:23740647
At the Biological Modeling and Simulation Frontier
2009-01-01
modeling and simulation (M&S) of biological systems. Wemodeling and simulation: integrating discrete event and continuous complex dynamic systems.system, observed from particular, The Modeling and Simulation
Parallel FEM Simulation of Crack Propagation on the AC 3 Velocity Cluster #
Stodghill, Paul
Parallel FEM Simulation of Crack Propagation on the AC 3 Velocity Cluster # George Coulouris 3: This paper describes our experiences porting the Crack Propagation on Teraflop Computers (CPTC) testbed simulation of crack propagation in realistic 3D structures would be a valuable tool for engineers
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Antsaklis, Panos
K. M. Passino and P. J. Antsaklis, "Solutions to Optimal Control Problems for Discrete Event 1990. #12;K. M. Passino and P. J. Antsaklis, "Solutions to Optimal Control Problems for Discrete Event 1990. #12;K. M. Passino and P. J. Antsaklis, "Solutions to Optimal Control Problems for Discrete Event
Parallel Unsteady Turbopump Flow Simulations for Reusable Launch Vehicles
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Kwak, Dochan
2000-01-01
An efficient solution procedure for time-accurate solutions of Incompressible Navier-Stokes equation is obtained. Artificial compressibility method requires a fast convergence scheme. Pressure projection method is efficient when small time-step is required. The number of sub-iteration is reduced significantly when Poisson solver employed with the continuity equation. Both computing time and memory usage are reduced (at least 3 times). Other work includes Multi Level Parallelism (MLP) of INS3D, overset connectivity for the validation case, experimental measurements, and computational model for boost pump.
A fuzzy discrete event system approach to determining optimal HIV/AIDS treatment regimens.
Ying, Hao; Lin, Feng; MacArthur, Rodger D; Cohn, Jonathan A; Barth-Jones, Daniel C; Ye, Hong; Crane, Lawrence R
2006-10-01
Treatment decision-making is complex and involves many factors. A systematic decision-making and optimization technology capable of handling variations and uncertainties of patient characteristics and physician's subjectivity is currently unavailable. We recently developed a novel general-purpose fuzzy discrete event systems theory for optimal decision-making. We now apply it to develop an innovative system for medical treatment, specifically for the first round of highly active antiretroviral therapy of human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) patients involving three historically widely used regimens. The objective is to develop such a system whose regimen choice for any given patient will exactly match expert AIDS physician's selection to produce the (anticipated) optimal treatment outcome. Our regimen selection system consists of a treatment objectives classifier, fuzzy finite state machine models for treatment regimens, and a genetic-algorithm-based optimizer. The optimizer enables the system to either emulate an individual doctor's decision-making or generate a regimen that simultaneously satisfies diverse treatment preferences of multiple physicians to the maximum extent. We used the optimizer to automatically learn the values of 26 parameters of the models. The learning was based on the consensus of AIDS specialists A and B on this project, whose exact agreement was only 35%. The performance of the resulting models was first assessed. We then carried out a retrospective study of the entire system using all the qualifying patients treated in our institution's AIDS Clinical Center in 2001. A total of 35 patients were treated by 13 specialists using the regimens (four and eight patients were treated by specialists A and B, respectively). We compared the actually prescribed regimens with those selected by the system using the same available information. The overall exact agreement was 82.9% (29 out of 35), with the exact agreement with specialists A and B both at 100%. The exact agreement for the remaining 11 physicians not involved in the system training was 73.9% (17 out of 23), an impressive result given the fact that expert opinion can be quite divergent for treatment decisions of such complexity. Our specialists also carefully examined the six mismatched cases and deemed that the system actually chose a more appropriate regimen for four of them. In the other two cases, either would be reasonable choices. Our approach has the capabilities of generalizing, learning, and representing knowledge even in the face of weak consensus, and being readily upgradeable to new medical knowledge. These are practically important features to medical applications in general, and HIV/AIDS treatment in particular, as national HIV/AIDS treatment guidelines are modified several times per year. PMID:17044400
Tavakoli, Ruhollah
2010-01-01
A simple method for improving cache efficiency of serial and parallel explicit finite procedure with application to casting solidification simulation over three-dimensional complex geometries is presented. The method is based on division of the global data to smaller blocks and treating each block independently from others at each time step. A novel parallel finite element algorithm for non-overlapped element-base decomposed domain is presented for implementation of serial and parallel version of the presented method. Effect of mesh reordering on the efficiency is also investigated. A simple algorithm is presented for high quality decomposition of decoupled global mesh. Our result shows 10-20 \\% performance improvement by mesh reordering and 1.2-2.2 speedup with application of the presented cache efficient algorithm (for serial and parallel versions). Also the presented parallel solver (without cache-efficient feature) shows nearly linear speedup on the traditional Ethernet networked Linux cluster.
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS
R. RYNE; J. QIANG
2000-08-01
In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.
Virtual reality visualization of parallel molecular dynamics simulation
Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.
1995-12-31
When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more »We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less
Parallel performance optimizations on unstructured mesh-based simulations
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-06-01
This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
Monte Carlo simulations of converging laser beam propagating in turbid media with parallel computing
NASA Astrophysics Data System (ADS)
Wu, Di; Lu, Jun Q.; Hu, Xin H.; Zhao, S. S.
1999-11-01
Due to its flexibility and simplicity, Monte Carlo method is often used to study light propagation in turbid medium where the photons are treated like classic particles being scattered and absorbed randomly based on a radiative transfer theory. However, due to the need of large number of photons to produce statistically significance results, this type of calculations requires large computing resources. To overcome such difficulty, we implemented parallel computing technique into our Monte Carlo simulations. The algorithm is based on the fact that the classic particles are uncorrelated, and the trajectories of multiple photons can be tracked simultaneously. When a beam of focused light incident to the medium, the incident photons are divided into groups according to the available processes on a parallel machine and the calculations are carried out in parallel. Utilizing PVM (Parallel Virtual Machine, a parallel computing software), the parallel programs in both C and FORTRAN are developed on the massive parallel computer Cray T3E at the North Carolina Supercomputer Center and a local PC-cluster network running UNIX/Sun Solaris. The parallel performances of our codes have been excellent on both Cray T3E and the PC clusters. In this paper, we present results on a focusing laser beam propagating through a highly scattering and diluted solution of intralipid. The dependence of the spatial distribution of light near the focal point on the concentration of intralipid solution is studied and its significance is discussed.
Parallel Molecular Dynamics simulation: implementation of PVM for a lipid membrane
NASA Astrophysics Data System (ADS)
Fang, Zhiwu; Haymet, A. D. J.; Shinoda, Wataru; Okazaki, Susumu
1999-02-01
This paper describes a parallel algorithm for Molecular Dynamics simulation of a lipid membrane using the isothermal—isobaric ensemble. A message-passing paradigm is adopted for interprocessor communications using PVM3 (Parallel Virtual Machine). A data decomposition technique is employed for the parallelization of the calculation of intermolecular forces. The algorithm has been tested both on distributed memory architecture (DEC Alpha 500 workstation clusters) and shared memory architecture (SGI Powerchallenge with 20 R10000 processors) for a dipalmitoylphosphatidylcholine (DPPC) lipid bilayer consisting of 32 DPPC molecules and 928 water molecules. For each architecture, we measure the execution time with average work load, and the optimal number of processors for the current simulation. Some dynamical quantities are presented for a 2 ns simulation obtained with 5 processors on DEC Alpha 500 workstations. Our results show that the code is extremely efficient on 5-8 processors, and a useful addition to other major computational resources.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Parallelization issues of a code for physically-based simulation of fabrics
NASA Astrophysics Data System (ADS)
Romero, Sergio; Gutiérrez, Eladio; Romero, Luis F.; Plata, Oscar; Zapata, Emilio L.
2004-10-01
The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.
Massively parallel simulation of cardiac electrical wave propagation ...
2007-10-30
cardiac electrical activity presents several challenges: the heart has a .... the distribution of these cell types on global wave propagation to be studied. ... order of 10 cm, so directly simulating wave propagation in the human ventricle, a primary goal of .... tion: a model of rabbit ventricular anatomy, Progress Biophys. Mol. Bio.
Massively Parallel Reactive and Quantum Molecular Dynamics Simulations
NASA Astrophysics Data System (ADS)
Vashishta, Priya
2015-03-01
In this talk I will discuss two simulations: Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on a 163,840-processor BlueGene/P supercomputer to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit on the surface. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated. Quantum molecular dynamics (QMD) simulations are performed on 786,432-processor Blue Gene/Q to study on-demand production of hydrogen gas from water using Al nanoclusters. QMD simulations reveal rapid hydrogen production from water by an Al nanocluster. We find a low activation-barrier mechanism, in which a pair of Lewis acid and base sites on the Aln surface preferentially catalyzes hydrogen production. I will also discuss on-demand production of hydrogen gas from water using and LiAl alloy particles. Research reported in this lecture was carried in collaboration with Rajiv Kalia, Aiichiro Nakano and Ken-ichi Nomura from the University of Southern California, and Fuyuki Shimojo and Kohei Shimamura from Kumamoto University, Japan.
Expression-level Parallelism for Distributed Spice Circuit Simulation
Gerstlauer, Andreas
values and enforce event causality, such that the local causality constraint (LCC) is observed. The LCC requires that concurrent simulators process external events in time step order [2]. Techniques addressing the Spice kernel [7] or host environment, at the cost of a selectable tradeoff in execution speed versus
Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis
NASA Technical Reports Server (NTRS)
Sawyer, Darren Charles
1994-01-01
The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.
A conflict-free, path-level parallelization approach for sequential simulation algorithms
NASA Astrophysics Data System (ADS)
Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.
2015-07-01
Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.
NASA Technical Reports Server (NTRS)
Krosel, S. M.; Milner, E. J.
1982-01-01
The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
Krosel, S.M.; Milner, E.J.
1982-01-01
Illustrates the application of predictor-corrector integration algorithms developed for the digital parallel processing environment. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real-time performance, inter-processor communication and algorithm startup are also discussed. 10 references.
NASA Astrophysics Data System (ADS)
Bordovitsyna, T. V.; Avdyushev, V. A.; Chuvashov, I. N.; Aleksandrova, A. G.; Tomilova, I. V.
2009-11-01
In this paper features of numerical simulation of the large-scale system artificial satellites motion by parallel computing is discussed per example instantiation program complex "Numerical model of the system artificial satellites motion" in cluster "Skiff Cyberia". It is shown that using of parallel computing allows to implement simultaneously high-precision numerical simulation of the motion of large-scale system artificial satellites. It opens comprehensive facilities in solve direct and regressive problems of dynamics such satellite system as GLONASS and objects of space debris.
Wilsey, Philip A.
Explorations of State Savings and Optimistic Fossil Collection for Parallel Simulation on Multi and Optimistic Fossil Collection (OFC). These methods have been previously deeveloped and studied in shared
A parallel finite volume algorithm for large-eddy simulation of turbulent flows
NASA Astrophysics Data System (ADS)
Bui, Trong Tri
1998-11-01
A parallel unstructured finite volume algorithm is developed for large-eddy simulation of compressible turbulent flows. Major components of the algorithm include piecewise linear least-square reconstruction of the unknown variables, trilinear finite element interpolation for the spatial coordinates, Roe flux difference splitting, and second-order MacCormack explicit time marching. The computer code is designed from the start to take full advantage of the additional computational capability provided by the current parallel computer systems. Parallel implementation is done using the message passing programming model and message passing libraries such as the Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The development of the numerical algorithm is presented in detail. The parallel strategy and issues regarding the implementation of a flow simulation code on the current generation of parallel machines are discussed. The results from parallel performance studies show that the algorithm is well suited for parallel computer systems that use the message passing programming model. Nearly perfect parallel speedup is obtained on MPP systems such as the Cray T3D and IBM SP2. Performance comparison with the older supercomputer systems such as the Cray YMP show that the simulations done on the parallel systems are approximately 10 to 30 times faster. The results of the accuracy and performance studies for the current algorithm are reported. To validate the flow simulation code, a number of Euler and Navier-Stokes simulations are done for internal duct flows. Inviscid Euler simulation of a very small amplitude acoustic wave interacting with a shock wave in a quasi-1D convergent-divergent nozzle shows that the algorithm is capable of simultaneously tracking the very small disturbances of the acoustic wave and capturing the shock wave. Navier-Stokes simulations are made for fully developed laminar flow in a square duct, developing laminar flow in a rectangular duct, and developing laminar flow in a 90-degree square bend. The Navier-Stokes solutions show good agreements with available analytical solutions and experimental data. To validate the flow simulation code for turbulence simulation, LES of fully-developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. The accuracy of the above algorithm for turbulence simulations is evaluated by comparison with the DNS solution. The effects of grid resolution, upwind numerical dissipation, and subgrid scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux difference splitting dissipation adversely affect the accuracy of the turbulence simulation. This problem is unique to the turbulence simulation, since it does not occur in the Euler and laminar Navier-Stokes simulations using the same code. For accurate turbulence simulation, it is found that only three to five percent of the standard Roe flux difference splitting dissipation is needed.
Parallel finite element simulation of mooring forces on floating objects
NASA Astrophysics Data System (ADS)
Aliabadi, S.; Abedi, J.; Zellars, B.
2003-03-01
The coupling between the equations governing the free-surface flows, the six degrees of freedom non-linear rigid body dynamics, the linear elasticity equations for mesh-moving and the cables has resulted in a fluid-structure interaction technology capable of simulating mooring forces on floating objects. The finite element solution strategy is based on a combination approach derived from fixed-mesh and moving-mesh techniques. Here, the free-surface flow simulations are based on the Navier-Stokes equations written for two incompressible fluids where the impact of one fluid on the other one is extremely small. An interface function with two distinct values is used to locate the position of the free-surface. The stabilized finite element formulations are written and integrated in an arbitrary Lagrangian-Eulerian domain. This allows us to handle the motion of the time dependent geometries. Forces and momentums exerted on the floating object by both water and hawsers are calculated and used to update the position of the floating object in time. In the mesh moving scheme, we assume that the computational domain is made of elastic materials. The linear elasticity equations are solved to obtain the displacements for each computational node. The non-linear rigid body dynamics equations are coupled with the governing equations of fluid flow and are solved simultaneously to update the position of the floating object. The numerical examples includes a 3D simulation of water waves impacting on a moored floating box and a model boat and simulation of floating object under water constrained with a cable.
Strong-strong beam-beam simulation on parallel computer
Qiang, Ji
2004-08-02
The beam-beam interaction puts a strong limit on the luminosity of the high energy storage ring colliders. At the interaction points, the electromagnetic fields generated by one beam focus or defocus the opposite beam. This can cause beam blowup and a reduction of luminosity. An accurate simulation of the beam-beam interaction is needed to help optimize the luminosity in high energy colliders.
Characterization of parallel-hole collimator using Monte Carlo Simulation
Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh
2015-01-01
Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730
Parallel simulation of tsunami inundation on a large-scale supercomputer
NASA Astrophysics Data System (ADS)
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.
NASA Technical Reports Server (NTRS)
Fijany, Amir (inventor); Bejczy, Antal K. (inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Parallel FEM Simulation of Electromechanics in the Heart
NASA Astrophysics Data System (ADS)
Xia, Henian; Wong, Kwai; Zhao, Xiaopeng
2011-11-01
Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.
Parallel implementation of molecular dynamics simulation for short-ranged interaction
NASA Astrophysics Data System (ADS)
Wu, Jong-Shinn; Hsu, Yu-Lin; Lee, Yun-Min
2005-08-01
A parallel molecular dynamics simulation method, designed for large-scale problems, employing dynamic spatial domain decomposition for short-ranged molecular interactions is proposed. In this parallel cellular molecular dynamics (PCMD) simulation method, the link-cell data structure is used to reduce the searching time required for forming the cut-off neighbor list as well as for domain decomposition, which utilizes the multi-level graph-partitioning technique. A simple threshold scheme (STS), in which workload imbalance is monitored and compared with some threshold value during the runtime, is proposed to decide the proper time for repartitioning the domain. The simulation code is implemented and tested on the memory-distributed parallel machine, e.g., PC-cluster system. Parallel performance is studied using approximately one million L-J atoms in the condensed, vaporized and supercritical states. Results show that fairly good parallel efficiency at 49 processors can be obtained for the condensed and supercritical states (˜60%), while it is comparably lower for the vaporized state (˜40%).
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
NASA Astrophysics Data System (ADS)
Mizrah, E. A.; Tkachev, S. B.; Shtabel, N. V.
2015-10-01
Solar array simulators are nonlinear control systems designed to reproduce static and dynamic characteristics of solar array. Solar array characteristics depend on illumination, temperature, space environment and other causes. During on-earth testing of spacecraft power systems there is a problem reaching stable work of simulator with different impedance loads in wide range load regulation. In the article authors propose a research method for absolute process stability in solar array simulators and present results of absolute stability research for solar array simulator with continuous parallel type power amplifier.
Zhao, Jinkui
2011-01-01
IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.
Xyce parallel electronic simulator reference guide, Version 6.0.1.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.
2014-01-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] .
On parallel random number generation for accelerating simulations of communication systems
NASA Astrophysics Data System (ADS)
Brugger, C.; Weithoffer, S.; de Schryver, C.; Wasenmüller, U.; Wehn, N.
2014-11-01
Powerful compute clusters and multi-core systems have become widely available in research and industry nowadays. This boost in utilizable computational power tempts people to run compute-intensive tasks on those clusters, either for speed or accuracy reasons. Especially Monte Carlo simulations with their inherent parallelism promise very high speedups. Nevertheless, the quality of Monte Carlo simulations strongly depends on the quality of the employed random numbers. In this work we present a comprehensive analysis of state-of-the-art pseudo random number generators like the MT19937 or the WELL generator used for parallel stream generation in different settings. These random number generators can be realized in hardware as well as in software and help to accelerate the analysis (or simulation) of communications systems. We show that it is possible to generate high-quality parallel random number streams with both generators, as long as some configuration constraints are met. We furthermore depict that distributed simulations with those generator types are viable even to very high degrees of parallelism.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
The midpoint method for parallelization of particle simulations Kevin J. Bowers and Ron O. Dror
Shaw, David E.
workload in MD is associated with the evaluation of electrostatic and van der Waals forces between allThe midpoint method for parallelization of particle simulations Kevin J. Bowers and Ron O. Dror D. Shawa D. E. Shaw Research, LLC, 39th Floor, Tower 45, 120 West 45th Street, New York, New York 10036
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Robust large-scale parallel nonlinear solvers for simulations.
Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson
2005-11-01
This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.
NASA Astrophysics Data System (ADS)
Wang, Shyh-Wei; Guo, Shuang-Fa
1998-07-01
A stepwise Boltzmann transport equation (BTE) simulation using non-uniform energy grid momentum matrix and exact nuclear scattering cross-section is successfully parallelized to simulate the ion implantation of multi-component targets. Assuming that the interactions of ion with different target atoms are independent, the scattering of ions with different components can be calculated concurrently by different processors. It is developed on CONVEX SPP-1000 and the software environment of parallel virtual machine (PVM) with a master-slave paradigm. A speedup of 3.3 has been obtained for the simulation of As ions implanted into AZ1350 (C6.2H6O1N0.15S0.06) which is composed of five components. In addition, our new scheme gives better agreement with the experimental results for heavy ion implantation than the conventional method using a uniform energy grid and approximated scattering function.
Re-forming supercritical quasi-parallel shocks. I - One- and two-dimensional simulations
NASA Technical Reports Server (NTRS)
Thomas, V. A.; Winske, D.; Omidi, N.
1990-01-01
The process of reforming supercritical quasi-parallel shocks is investigated using one-dimensional and two-dimensional hybrid (particle ion, massless fluid electron) simulations both of shocks and of simpler two-stream interactions. It is found that the supercritical quasi-parallel shock is not steady. Instread of a well-defined shock ramp between upstream and downstream states that remains at a fixed position in the flow, the ramp periodically steepens, broadens, and then reforms upstream of its former position. It is concluded that the wave generation process is localized at the shock ramp and that the reformation process proceeds in the absence of upstream perturbations intersecting the shock.
NASA Astrophysics Data System (ADS)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik
2015-09-01
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
NASA Technical Reports Server (NTRS)
Lyons, Daniel T.; Desai, Prasun N.
2005-01-01
This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.
Design of a real-time wind turbine simulator using a custom parallel architecture
NASA Technical Reports Server (NTRS)
Hoffman, John A.; Gluck, R.; Sridhar, S.
1995-01-01
The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.
Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.
2001-08-31
This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.; Mills, R. T.
2014-01-01
To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
Discrete Event Dyn Syst manuscript No. (will be inserted by the editor)
Egerstedt, Magnus
. The paper concludes with a simulation of agents performing a drumline-inspired dance using decentralized the drumline-inspired multi-agent dance shown in Figure 1. One way to go about it would be to have agents
Improving the Performance of the Extreme-scale Simulator
Engelmann, Christian; Naughton, III, Thomas J
2014-01-01
Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation-based toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the parallel discrete event simulation management overhead and (2) a new simulated MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement, such as by reducing the simulation overhead for running the NAS Parallel Benchmark suite inside the simulator from 1,020\\% to 238% for the conjugate gradient (CG) benchmark and from 102% to 0% for the embarrassingly parallel (EP) and benchmark, as well as, from 37,511% to 13,808% for CG and from 3,332% to 204% for EP with accurate process failure simulation.
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors
Nishiura, Daisuke; Sakaguchi, Hide
2011-03-01
Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.
Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution
Hesse, M.; Birn, J.
1989-01-01
We investigate properties of the electric field component parallel to the magnetic field (E/sub /parallel//) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component. We emphasize particularly the spatial location of E/sub /parallel//, the concept of a diffusion zone and the role of E/sub /parallel// in accelerating electrons. We find a localization of the region of enhanced E/sub /parallel// in all space directions with a strong concentration in the z direction. We identify this region as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B/sub y/ implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B/sub y/ field is strong enough to force particles to follow field lines through the diffusion region. We estimate that for a typical net B/sub y/ field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the northern (southern) hemisphere should dominate for duskward (dawnward) net B/sub y/. In addition, we observe a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras. 12 refs., 9 figs.
Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas
Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.
2006-07-15
The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems
D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan
2014-01-01
There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the S?-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716
Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika
2010-07-15
The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.
Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators
Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.
1999-11-13
In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.
Massively parallel molecular dynamics simulations of two-dimensional materials at high strain rates
Wagner, N.J. . Dept. of Chemical Engineering); Holian, B.L. )
1992-01-01
Large scale molecular dynamics simulations on a massively parallel computer are performed to investigate the mechanical behavior of 2-dimensional materials. A pair potential and a model embedded atom many-body potential are examined, corresponding to brittle'' and ductile'' materials, respectively. A parallel MD algorithm is developed to exploit the architecture of the Connection Machine, enabling simulations of > 10[sup 6] atoms. A model spallation experiment is performed on a 2-D triagonal crystal with a well-defined nanocrystalline defect on the spall plane. The process of spallation is modelled as a uniform adiabatic expansion. The spall strength is shown to be proportional to the logarithm of the applied strain rate and a dislocation dynamics model is used to explain the results. Good predictions for the onset of spallation in the computer experiments is found from the simple model. The nanocrystal defect affects the propagation of the shock front and failure is enhanced along the grain boundary.
Adaptive finite element simulation of flow and transport applications on parallel computers
NASA Astrophysics Data System (ADS)
Kirk, Benjamin Shelton
The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the design and to demonstrate the capability for resolving complex multiscale processes efficiently and reliably. The first application considered is the simulation of chemotactic biological systems such as colonies of Escherichia coli. This work appears to be the first application of AMR to chemotactic processes. These systems exhibit transient, highly localized features and are important in many biological processes, which make them ideal for simulation with adaptive techniques. A nonlinear reaction-diffusion model for such systems is described and a finite element formulation is developed. The solution methodology is described in detail. Several phenomenological studies are conducted to study chemotactic processes and resulting biological patterns which use the parallel adaptive refinement capability developed in this work. The other application study is much more extensive and deals with fine scale interactions for important hypersonic flows arising in aerospace applications. These flows are characterized by highly nonlinear, convection-dominated flowfields with very localized features such as shock waves and boundary layers. These localized features are well-suited to simulation with adaptive techniques. A novel treatment of the inviscid flux terms arising in a streamline-upwind Petrov-Galerkin finite element formulation of the compressible Navier-Stokes equations is also presented and is found to be superior to the traditional approach. The parallel adaptive finite element formulation is then applied to several complex flow studies, culminating in fully three-dimensional viscous flows about complex geometries such as the Space Shuttle Orbiter. Physical phenomena such as viscous/inviscid interaction, shock wave/boundary layer interaction, shock/shock interaction, and unsteady acoustic-driven flowfield response are considered in detail. A computational investigation of a 25°/55° double cone configuration details the complex multiscale flow features and investigates a potential source of experimentally-observed unsteady flowfield response.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
Construction of a parallel processor for simulating manipulators and other mechanical systems
NASA Technical Reports Server (NTRS)
Hannauer, George
1991-01-01
This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Furumura, Takashi
Simulator for seismic waves from the Kobe earthquake provided a very good reproduction of strong ground earthquake has occurred for the past 80 years. Simulation results for a hypothetical earthquake in TokyoJournal of the Earth Simulator, Volume 3, September 2005, 2938 29 Large-scale parallel simulation
Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations
NASA Technical Reports Server (NTRS)
Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.
2013-01-01
Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.
Holkundkar, Amol R.
2013-11-15
The objective of this article is to report the parallel implementation of the 3D molecular dynamic simulation code for laser-cluster interactions. The benchmarking of the code has been done by comparing the simulation results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.
NASA Astrophysics Data System (ADS)
Honkonen, I.
2015-03-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic simulation cell method presented here, generic simulation cell class (gensimcell), also includes support for parallel programming by allowing model developers to select which simulation variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.
Embedded microclusters in zeolites and cluster beam sputtering: Simulation on parallel computers
NASA Astrophysics Data System (ADS)
Vashishta, P.; Kalia, R. K.; Greenwell, D. L.
1994-09-01
We have designed a time-space multiresolution approach for large-scale molecular-dynamics (MD) simulations involving long-range Coulomb forces and three-body interactions. This approach has been implemented on various parallel architectures including the 512-node Intel Touchstone Delta at Caltech and the 128-processor IBM SP1 at Argonne National Laboratory. Parallel MD simulations involving 1.12-million particles have been performed to investigate the pore interface growth and the roughness of fracture surfaces in porous silica. When the mass density is reduced to a critical value, pores grow catastrophically to cause fracture. The roughness exponent for internally fractured surfaces, (alpha) = 0.87 +/- 0.02, supports experimental claims about the universality of (alpha). A reliable interatomic potential has been developed for MD simulations of Si3N4. The nature of phonon densities-of-states due to low-energy floppy modes in crystalline and glassy states has been investigated. Floppy modes appear continuously in the glass as the connectivity of the system is reduced. In the crystal, they appear suddenly at 30% volume expansion. The density-of-states due to floppy modes varies linearly with energy, and the specific heat is significantly enhanced by these modes. Thermal conductivities of ceramic materials are calculated with a nonequilibrium MD method and the Kubo-Greenwood formula using a parallel eigensolver and the parallel MD approach. The calculations for amorphous silica agree well with experiments over a very wide range of temperatures above the plateau region. Currently, we are investigating thermal transport mechanisms in technologically important materials - porous glasses, nanophase ceramics, and zeolites.
Massively parallel Monte Carlo for many-particle simulations on GPUs
Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.
2013-12-01
Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.
Parallelization of the Nanoscale Device Simulator nanoMOS2.0 Using a 100 Nodes Linux Cluster
Butt, Ali R.
MOSFETs [2]. Parallelizing a Matlab code does not produce the most efficient simulations the energy grid over several processors. A 88% speed-up is achieved using the Parallel Matlab Interface . I written in Matlab, and it has been a very easy way to investigate the physics of nanoscale double gate
Gobbert, Matthias K.
a Parallelized Genetic Algorithm Joseph Cornish*, Robert Forder**, Ivan Erill*, Matthias K. Gobbert** *Department a genetic algorithm in parallel using a server-client organization to simulate the evolution are not able to recognize correlation information in binding sites. We implement a genetic algorithm
Stanford University
Simulation of Earthquake Liquefaction Response on Parallel Computers Jun Peng, Jinchi Lu, Kincho H. Law and Ahmed Elgamal INTRODUCTION Simulations of earthquake responses and liquefaction effects models. Large-scale earthquake simulations are not feasible on most current single processor computers
Parallel, adaptive, multi-object trajectory integrator for space simulation applications
NASA Astrophysics Data System (ADS)
Atanassov, Atanas Marinov
2014-10-01
Computer simulation is a very helpful approach for improving results from space born experiments. Initial-value problems (IVPs) can be applied for modeling dynamics of different objects - artificial Earth satellites, charged particles in magnetic and electric fields, charged or non-charged dust particles, space debris. An ordinary differential equations systems (ODESs) integrator based on applying different order embedded Runge-Kutta-Fehlberg methods is developed. These methods enable evaluation of the local error. Instead of step-size control based on local error evaluation, an optimal integration method is selected. Integration while meeting the required local error proceeds with constant-sized steps. This optimal scheme selection reduces the amount of calculation needed for solving the IVPs. In addition, for an implementation on a multi core processor and parallelization based on threads application, we describe how to solve multiple systems of IVPs efficiently in parallel. The proposed integrator allows the application of a different force model for every object in multi-satellite simulation models. Simultaneous application of the integrator toward different kinds of problems in the frames of one combined simulation model is possible too. The basic application of the integrator is solving mechanical IVPs in the context of simulation models and their application in complex multi-satellite space missions and as a design tool for experiments.
Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations
Perumalla, Kalyan S
2009-01-01
The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas.
Use of Parallel Micro-Platform for the Simulation the Space Exploration
NASA Astrophysics Data System (ADS)
Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen
The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
NASA Astrophysics Data System (ADS)
Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M. H.; Najafi, Bijan
2012-11-01
We use molecular dynamics simulations to study the structure, dynamics, and transport properties of nano-confined water between parallel graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our simulations show anisotropic structure and dynamics of the confined water phase in directions parallel and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions parallel and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ˜10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm-3, bubble formation and restructuring of the water layers are observed.
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.
Huss, Sorin A.
Estimation of signal activity in digital circuits based on multiple abstraction levels and massive parallel simulation techniques Werner W. Bachmann, and Sorin A. Huss Department of Computer Science, Integrated Circuits and Systems Laboratory Darmstadt University of Technology, 64283 Darmstadt, Germany
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H
2006-09-04
We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM/MD simulation on a Grid consisting of 6 supercomputer centers in the US and Japan (in total of 150 thousand processor-hours), in which the number of processors change dynamically on demand and resources are allocated and migrated dynamically in response to faults. Furthermore, performance portability has been demonstrated on a wide range of platforms such as BlueGene/L, Altix 3000, and AMD Opteron-based Linux clusters.
Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems
Martinez, E.; Monasterio, P.R.; Marian, J.
2011-02-20
An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.
Univ. of California, San Diego; Li, Xiaoye Sherry; Cicotti, Pietro; Li, Xiaoye Sherry; Baden, Scott B.
2008-04-15
Sparse parallel factorization is among the most complicated and irregular algorithms to analyze and optimize. Performance depends both on system characteristics such as the floating point rate, the memory hierarchy, and the interconnect performance, as well as input matrix characteristics such as such as the number and location of nonzeros. We present LUsim, a simulation framework for modeling the performance of sparse LU factorization. Our framework uses micro-benchmarks to calibrate the parameters of machine characteristics and additional tools to facilitate real-time performance modeling. We are using LUsim to analyze an existing parallel sparse LU factorization code, and to explore a latency tolerant variant. We developed and validated a model of the factorization in SuperLU_DIST, then we modeled and implemented a new variant of slud, replacing a blocking collective communication phase with a non-blocking asynchronous point-to-point one. Our strategy realized a mean improvement of 11percent over a suite of test matrices.
Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.
An AnyLogic Simulation Model for Power and Performance Analysis of Data Centres
Al Hanbali, Ahmad
-offs for data centres that save energy via power management. The models are cooperating discrete-event and agent, simulation, discrete-event models, agent-based models, power management, performance analysis, power, workload models, (heterogeneous) servers and power management strategies. The capabilities of our modelling
CLUSTEREASY: A Program for Simulating Scalar Field Evolution on Parallel Computers
Gary N Felder
2007-12-05
We describe a new, parallel programming version of the scalar field simulation program LATTICEEASY. The new C++ program, CLUSTEREASY, can simulate arbitrary scalar field models on distributed-memory clusters. The speed and memory requirements scale well with the number of processors. As with the serial version of LATTICEEASY, CLUSTEREASY can run simulations in one, two, or three dimensions, with or without expansion of the universe, with customizable parameters and output. The program and its full documentation are available on the LATTICEEASY website at http://www.science.smith.edu/departments/Physics/fstaff/gfelder/latticeeasy/. In this paper we provide a brief overview of what CLUSTEREASY does and the ways in which it does and doesn't differ from the serial version of LATTICEEASY.
Massively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification
Bauer, Martin; Steinmetz, Philipp; Jainta, Marcus; Berghoff, Marco; Schornbaum, Florian; Godenschwager, Christian; Köstler, Harald; Nestler, Britta; Rüde, Ulrich
2015-01-01
Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and improve it with a new grand potential formulation to couple the concentration evolution. This extension is very compute intensive due to a temperature dependent diffusive concentration. We significantly extend previous simulations that have used simpler phase-field models or were performed on smaller domain sizes. The new method has been implemented within the massively parallel HPC framework waLBerla that is designed to exploit current supercomputers efficiently. We apply various optimization techniques, including buffering techniques, explicit SIMD kernel vectorization, and communication hiding. Simulations utilizing up to 262,144 cores have been run on three different supercomputing architectures and weak scalability results are show...
NASA Astrophysics Data System (ADS)
Grimm, Guido; Fey, Dietmar; Degenkolb, Marko; Erhart, Werner
1998-09-01
We present a simulation environment for parallel optoelectronic data-processing systems, and we especially consider the fusion of optoelectronic integrated circuits and optical interconnection modules. hadlop , which stands for hardware description language for optical processing, is a simulator that works at the digital design level. So far, hadlop has allowed algorithm and architecture studies for smart-pixel systems. We have just begun to extend the capabilities of hadlop toward an automatic synthesis tool for three-dimensional optoelectronic VLSI circuits. A hadlop architecture will then be the basis for the automatic generation of detailed construction plans that consider the interaction between optical interconnection modules and optoelectronic integrated circuits. The simulation system is freeware and is available through the Internet at http: www2.informatik.uni-jena.de pope HADLOP hadlop.html.
Xyce parallel electronic simulator design : mathematical formulation, version 2.0.
Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.
2004-06-01
This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively parallel SPICE-style circuit simulator developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit simulation, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit simulation, such as voltage limiting, are also described in detail.
Embedded Microclusters in Zeolites and Cluster Beam Sputtering -- Simulation on Parallel Computers
Greenwell, Donald L.; Kalia, Rajiv K.; Vashishta, Priya
1996-12-01
This report summarizes the research carried out under DOE supported program (DOE/ER/45477) Computer Science--during the course of this project. Large-scale molecular-dynamics (MD) simulations were performed to investigate: (1) sintering of microporous and nanophase Si{sub 3}N{sub 4}; (2) crack-front propagation in amorphous silica; (3) phonons in highly efficient multiscale algorithms and dynamic load-balancing schemes for mapping process, structural correlations, and mechanical behavior including dynamic fracture in graphitic tubules; and (4) amorphization and fracture in nanowires. The simulations were carried out with irregular atomistic simulations on distributed-memory parallel architectures. These research activities resulted in fifty-three publications and fifty-five invited presentations.
The use of a parallel virtual machine (PVM) for finite-difference wave simulations
NASA Astrophysics Data System (ADS)
Niccanna, Clodagh; Bean, Christopher J.
1997-08-01
Computer modelling is now applied routinely throughout the geosciences in an attempt to create synthetic data for comparison with real data. At present, in seismology, there is no analytical solution to the wave equation which allows wave simulations in "geologically realistic" (complex) media. Consequently, computationally expensive numerical solutions are required. Using a finite-difference solution to the wave equation provides a suitable means of modelling seismic waves in a heterogeneous medium. However, when applying this method the grid sizes and the number of time steps required (to ensure numerical stability and sufficiently long wave propagation distances) are limited because of their demand on computer time and memory. Supercomputers represent an obvious solution to these limitations. This paper presents an alternative which is inexpensive, convenient and portable. By clustering a set of processors, for example PCs or workstations, a parallel configuration can be obtained by using the processors available on each machine to perform sections of the calculations simultaneously. By using Parallel Virtual Machine (PVM) — a public domain software package which allows a programmer to create and access a concurrent computing system made from networks of loosely coupled processing elements (Geist and others, 1994) — we have reduced wall-clock times and increased array sizes for a finite-difference solution to the acoustic, elastic and viscoelastic wave equations. In this paper we present methods of parallelizing a serial code and load-balancing this parallelized code. A comparison of serial and parallel wall-clock times, a comparison of wall-clock times on a variety of clusters of machines and the role of communication in this application are presented for a finite-difference solution to the acoustic wave equation.
Pei, Yidong; Pei, Baoqing; Li, Hui; Fan, Yubo
2013-01-01
In view of the shortage of medical equipment road transportation simulation platform, we put forward a road transportation simulation method based on 6-DOF parallel robots. A 3D road spectrum model was built by the improvement of the harmonic superposition method. The simulation model was then compared with the standard model to verify its performance. Taking the road spectrum as the excitation, we could get the robot motion data to control the parallel robot through the S-shaped linear interpolation of the absolute position. It can simulate the movement of vehicles with different speed under various road conditions efficiently and accurately. PMID:23668043
NASA Astrophysics Data System (ADS)
Honkonen, I.
2014-07-01
I present a method for developing extensible and modular computational models without sacrificing serial or parallel performance or source code readability. By using a generic simulation cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for parallel programming is also provided by allowing users to select which simulation variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic simulation cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: https://github.com/nasailja/gensimcell for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.
Antsaklis, Panos
J. A. Stiver and P. J. Antsaklis, "A Novel Discrete Event System Approach to Modeling and Analysis, University of Notre Dame, June 1991. #12;J. A. Stiver and P. J. Antsaklis, "A Novel Discrete Event System. of Electrical Engineering, University of Notre Dame, June 1991. #12;J. A. Stiver and P. J. Antsaklis, "A Novel
Xyce parallel electronic simulator users%3CU%2B2019%3E guide, version 6.0.
Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.
2013-08-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Guo Fan; Giacalone, Joe
2013-08-20
We present three-dimensional hybrid simulations of collisionless shocks that propagate parallel to the background magnetic field to study the acceleration of protons that forms a high-energy tail on the distribution. We focus on the initial acceleration of thermal protons and compare it with results from one-dimensional simulations. We find that for both one- and three-dimensional simulations, particles that end up in the high-energy tail of the distribution later in the simulation gained their initial energy right at the shock. This confirms previous results but is the first to demonstrate this using fully three-dimensional fields. The result is not consistent with the ''thermal leakage'' model. We also show that the gyrocenters of protons in the three-dimensional simulation can drift away from the magnetic field lines on which they started due to the removal of ignorable coordinates that exist in one- and two-dimensional simulations. Our study clarifies the injection problem for diffusive shock acceleration.
NASA Astrophysics Data System (ADS)
Chuvashov, I. N.
2010-12-01
The features of high-precision numerical simulation of the Earth satellite motion using parallel computing are discussed on example the implementation of the cluster "Skiff Cyberia" software complex "Numerical model of the motion of system satellites". It is shown that the use of 128 bit word length allows considering weak perturbations from the high-order harmonics in the expansion of the geopotential and the effect of strain geopotential harmonics arising due to the combination of tidal perturbations associated with exposure to the moon and sun on the solid Earth and its oceans.
Childers, J T; LeCompte, T J; Papka, M E; Benjamin, D P
2015-01-01
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
Visualization of parallel molecular dynamics simulation on a remote visualization platform
Lee, T.Y.; Raghavendra, C.S.; Nicholas, J.B.
1994-09-01
Visualization requires high performance computers. In order to use these shared high performance computers located at national centers, the authors need an environment for remote visualization. Remote visualization is a special process that uses computing resources and data that are physically distributed over long distances. In their experimental environment, a parallel raytracer is designed for the rendering task. It allows one to efficiently visualize molecular dynamics simulations represented by three dimensional ball-and-stick models. Different issues encountered in creating their platform are discussed, such as I/O, load balancing, and data distribution.
Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop
Ghosh, K K
2011-11-07
Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.
Halic, Tansel; Ahn, Woojin; De, Suvranu
2014-01-01
This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions. PMID:24732497
Supporting the Development of Resilient Message Passing Applications using Simulation
Naughton, III, Thomas J; Engelmann, Christian; Vallee, Geoffroy R; Boehm, Swen
2014-01-01
An emerging aspect of high-performance computing (HPC) hardware/software co-design is investigating performance under failure. The work in this paper extends the Extreme-scale Simulator (xSim), which was designed for evaluating the performance of message passing interface (MPI) applications on future HPC architectures, with fault-tolerant MPI extensions proposed by the MPI Fault Tolerance Working Group. xSim permits running MPI applications with millions of concurrent MPI ranks, while observing application performance in a simulated extreme-scale system using a lightweight parallel discrete event simulation. The newly added features offer user-level failure mitigation (ULFM) extensions at the simulated MPI layer to support algorithm-based fault tolerance (ABFT). The presented solution permits investigating performance under failure and failure handling of ABFT solutions. The newly enhanced xSim is the very first performance tool that supports ULFM and ABFT.
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097
Spencer, VN
2001-08-29
An investigation has been conducted regarding the ability of clustered personal computers to improve the performance of executing software simulations for solving engineering problems. The power and utility of personal computers continues to grow exponentially through advances in computing capabilities such as newer microprocessors, advances in microchip technologies, electronic packaging, and cost effective gigabyte-size hard drive capacity. Many engineering problems require significant computing power. Therefore, the computation has to be done by high-performance computer systems that cost millions of dollars and need gigabytes of memory to complete the task. Alternately, it is feasible to provide adequate computing in the form of clustered personal computers. This method cuts the cost and size by linking (clustering) personal computers together across a network. Clusters also have the advantage that they can be used as stand-alone computers when they are not operating as a parallel computer. Parallel computing software to exploit clusters is available for computer operating systems like Unix, Windows NT, or Linux. This project concentrates on the use of Windows NT, and the Parallel Virtual Machine (PVM) system to solve an engineering dynamics problem in Fortran.
Bradley, Randolph L. (Randolph Lewis)
2012-01-01
Heavy industries operate equipment having a long life to generate revenue or perform a mission. These industries must invest in the specialized service parts needed to maintain their equipment, because unlike in other ...
Parallel computing simulation of electrical excitation and conduction in the 3D human heart.
Di Yu; Dongping Du; Hui Yang; Yicheng Tu
2014-01-01
A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart simulation model with the use of massive parallel computing with GPGPU and OpenGL. The simulation algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based simulation outperforms the conventional CPU-based approach and significantly improves the speed of simulation. By adopting modern computer architecture, this present investigation enables real-time simulation and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart. PMID:25570947
L-PICOLA: A parallel code for fast dark matter simulation
NASA Astrophysics Data System (ADS)
Howlett, C.; Manera, M.; Percival, W. J.
2015-09-01
Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-parallel code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body simulation. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the simulation and simulate the past lightcone at run-time, with optional replication of the simulation volume. Through comparisons to fully non-linear N-Body simulations we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.
A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS
Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.
2013-02-15
We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.
Simulating massively parallel electron beam inspection for sub-20 nm defects
NASA Astrophysics Data System (ADS)
Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt
2015-03-01
SEMATECH has initiated a program to develop massively-parallel electron beam defect inspection (MPEBI). Here we use JMONSEL simulations to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these simulated images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.
Deiterding, Ralf; Wood, Stephen L
2013-01-01
We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside simulations verifying the coupled fluid-structure solver and assessing its parallel scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the simulation of a prototypical blast explosion in a realistic multistory building are presented.
Monte Carlo Simulations of Nonlinear Particle Acceleration in Parallel Trans-relativistic Shocks
Ellison, Donald C; Bykov, Andrei M
2013-01-01
We present results from a Monte Carlo simulation of a parallel collisionless shock undergoing particle acceleration. Our simulation, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and "smooth" the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-...
Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development
NASA Technical Reports Server (NTRS)
Mangieri, Mark L.; Hoang, June
2014-01-01
NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.
A Many-Task Parallel Approach for Multiscale Simulations of Subsurface Flow and Reactive Transport
Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.
2014-12-16
Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall simulation domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in parallel; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo simulations of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing parallel continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale simulations, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface simulations with minimal overhead.
Parallel numerical simulation of the ultrasonic waves in a prestressed formation.
Chen, Hao; Wang, Xiuming; Lin, Weijun
2006-12-22
Formation stress prediction plays an important role in petroleum production. Understanding ultrasonic wave propagation in a stress-induced anisotropic formation will help us to find an efficient method to correctly predict formation stress or formation pore pressure. In this work, a parallel 3D finite-difference time domain (FDTD) method is developed to simulate elastic wave propagation in pre-stressed formations. A perfectly matched layer (PML) is used as an absorbing boundary condition. The acceleration ration of total CPU computation time and the lasting time of the program run in the super computer-ShenTeng 6800 in the Super Computation Center of Chinese Academy of Science (CAS) are tested. It shows that the acceleration factor of the parallel FDTD program is considerably high even if the domain is only divided in one direction. When the total computation model size fixed, the acceleration factor of 8 CPU and 64 CPU is 3.0 and 13.8, respectively. The velocities under various static stresses are obtained by processing the array data calculated with the FDTD using Prony's method. The linear relation between velocity and the applied pre-stress is in agreement with that predicted by the acoustoelasticity theory. Results from the numerical simulation confirm the reciprocity principle and the superposition principle. PMID:16806377
SDA 7: A modular and parallel implementation of the simulation of diffusional association software.
Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C
2015-08-01
The simulation of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to simulate the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the parallelization of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the parallel performance. PMID:26123630
MDSLB: A new static load balancing method for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang
2014-02-01
Large-scale parallelization of molecular dynamics simulations is facing challenges which seriously affect the simulation efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in parallel, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.
Zonal methods for the parallel execution of range-limited N-body simulations
Bowers, Kevin J.; Dror, Ron O.; Shaw, David E. . E-mail: david@deshaw.com
2007-01-20
Particle simulations in fields ranging from biochemistry to astrophysics require the evaluation of interactions between all pairs of particles separated by less than some fixed interaction radius. The applicability of such simulations is often limited by the time required for calculation, but the use of massive parallelism to accelerate these computations is typically limited by inter-processor communication requirements. Recently, Snir [M. Snir, A note on N-body computations with cutoffs, Theor. Comput. Syst. 37 (2004) 295-318] and Shaw [D.E. Shaw, A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions, J. Comput. Chem. 26 (2005) 1318-1328] independently introduced two distinct methods that offer asymptotic reductions in the amount of data transferred between processors. In the present paper, we show that these schemes represent special cases of a more general class of methods, and introduce several new algorithms in this class that offer practical advantages over all previously described methods for a wide range of problem parameters. We also show that several of these algorithms approach an approximate lower bound on inter-processor data transfer.
ERIC Educational Resources Information Center
Lord, Robert E.; And Others
The purpose of this study was to evaluate the parallel processing impact of multiple-instruction multiple-data path (MIMD) computers on flight simulation software. Basic mathematical functions and arithmetic expressions from typical flight simulation software were selected and run on an MIMD computer to evaluate the improvement in execution time…
Li, Xiaoye Sherry
Massively Parallel X-ray Scattering Simulations Abhinav Sarje, Xiaoye S. Li, Slim Chourou, Elaine R--Although present X-ray scattering techniques can provide tremendous information on the nano-structural proper- ties, and scalable Grazing Incidence Small Angle X-ray Scattering simulation algorithm and codes that we have
NASA Technical Reports Server (NTRS)
Corliss, Lloyd; Du Val, Ronald W.; Gillman, Herbert, III; Huynh, Loc C.
1990-01-01
In recent efforts by NASA, the Army, and Advanced Rotorcraft Technology, Inc. (ART), the application of parallel processing techniques to real-time simulation have been studied. Traditionally, real-time helicopter simulations have omitted the modeling of high-frequency phenomena in order to achieve real-time operation on affordable computers. Parallel processing technology can now provide the means for significantly improving the fidelity of real-time simulation, and one specific area for improvement is the modeling of rotor dynamics. This paper focuses on the results of a piloted simulation in which a traditional rotor-map mathematical model was compared with a more sophisticated blade-element mathematical model that had been implemented using parallel processing hardware and software technology.
Schulz, M; Trinitis, C
2007-07-09
In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and many-core, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in parallel algorithms, simulation techniques, and software integration from multiple disciplines. In its 6th year ParSim continues to build a bridge between computer science and the application disciplines and to help with fostering cooperations between the different fields. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a shorter turn-around time. This offers the unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, ten papers with authors in ten countries were submitted to ParSim, and after a quick turn-around, yet thorough review process we decided to accept three of them for publication and presentation during the ParSim session. These three papers show the use of simulation in a range of different application fields including earthquake and turbulence simulation. At the same time, they also address computer science aspects and discuss different parallelization strategies, programming models and environments, as well as scalability. We are confident that this provides an attractive program and that ParSim will yet again be an informal setting for lively discussions and for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Thomas Herault and Franck Cappello, the PC chairs, for their support to continue the ParSim series at EuroPVM/MPI 2007. We would also like to thank the numerous reviewers, who provided us with their reviews in such a short amount of time (in most cases in just a few days) and thereby helped us to maintain the tight schedule. Last, but certainly not least, we would like to thank all those who took the time to submit papers and hence made this event possible in the first place. We are confident that this session will fulfill its purpose to provide new insights from both the engineering and the computer science side and encourages interdisciplinary exchange of ideas and cooperations. We hope that this will continue ParSim's tradition at EuroPVM/MPI.
Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.
2011-08-15
There has been a great concern about the origin of the parallel electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to simulate magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with simulation results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when parallel current flows. The electron convection term enables us to describe a situation in which a parallel electric field and parallel electron acceleration coexist, which is impossible for ideal or resistive MHD.
University of Miami; Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.
2013-04-30
Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach was evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.
Application of a 3D, Adaptive, Parallel, MHD Code to Supernova Remnant Simulations
NASA Astrophysics Data System (ADS)
Kominsky, P.; Drake, R. P.; Powell, K. G.
2001-05-01
We at Michigan have a computational model, BATS-R-US, which incorporates several modern features that make it suitable for calculations of supernova remnant evolution. In particular, it is a three-dimensional MHD model, using a method called the Multiscale Adaptive Upwind Scheme for MagnetoHydroDynamics (MAUS-MHD). It incorporates a data structure that allows for adaptive refinement of the mesh, even in massively parallel calculations. Its advanced Godunov method, a solution-adaptive, upwind, high-resolution scheme, incorporates a new, flux-based approach to the Riemann solver with improved numerical properties. This code has been successfully applied to several problems, including the simulation of comets and of planetary magnetospheres, in the 3D context of the Heliosphere. The code was developed under a NASA computational grand challenge grant to run very rapidly on parallel platforms. It is also now being used to study time-dependent systems such as the transport of particles and energy from solar coronal mass ejections to the Earth. We are in the process of modifying this code so that it can accommodate the very strong shocks present in supernova remnants. Our test case simulates the explosion of a star of 1.4 solar masses with an energy of 1 foe, in a uniform background medium. We have performed runs of 250,000 to 1 million cells on 8 nodes of an Origin 2000. These relatively coarse grids do not allow fine details of instabilities to become visible. Nevertheless, the macroscopic evolution of the shock is simulated well, with the forward and reverse shocks visible in velocity profiles. We will show our work to date. This work was supported by NASA through its GSRP program.
Trinitis, C; Bader, M; Schulz, M
2009-06-09
In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Significant progress in CPU architecture (multi- and many-core CPUs, SMT, transactional memory, virtualization support, shared caches etc.) system scalability, and interconnect technology, continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in algorithms, simulation techniques, and software integration from multiple disciplines. In its 8th year, ParSim continues to build a bridge between application disciplines and computer science and to help fostering closer cooperations between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. We believe that this offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables participants to present and discuss their work within the scope of both the session and the host conference. This year, five papers from authors in five countries were submitted to Par-Sim, and we selected three of them. They cover a range of different application fields including mechanical engineering, material science, and structural engineering simulations. We are confident that this resulted in an attractive special session and that this will be an informal setting for lively discussions as well as for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Jan Westerholm, Juha Fagerholm and Jussi Heikonen, the PC chairs, for their encouragement and support to continue the ParSim series at EuroPVM/MPI 2009. We would also like to thank the numerous reviewers, who provided us with their reviews in such a short amount of time (in most cases in just a few days) and thereby helped us to maintain the tight schedule. Last, but certainly not least, we would like to thank all those who took the time to submit papers and hence made this event possible in the first place. We are confident that this session will fulfill its purpose to provide new insights from both the engineering and the computer science side and encourages interdisciplinary exchange of ideas and cooperations, and that this will continue ParSim's tradition at EuroPVM/MPI.
Accelerating Dust Storm Simulation by Balancing Task Allocation in Parallel Computing Environment
NASA Astrophysics Data System (ADS)
Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.
2013-12-01
Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm simulation is a data and computing intensive process. Normally, a simulation for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a parallel fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm simulation, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the paralleling. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; Parallel Computing; Load Balance; Optimization; Cost Model
NASA Astrophysics Data System (ADS)
Zhou, Jun
The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP-ODC code is ready now for real world petascale earthquake simulations. This GPU-based code has demonstrated excellent weak scaling up to the full Titan scale and achieved 2.3 PetaFLOPs sustained computation performance in single precision. The production simulation demonstrated the first 0-10Hz deterministic rough fault simulation. Using the accelerated AWP-ODC, Southern California Earthquake Center (SCEC) has recently created the physics-based probablistic seismic hazard analysis model of the Los Angeles region, CyberShake 14.2, as of the time of the dissertation writing. The tensor-valued wavefield code based on this GPU research has dramatically reduced time-to-solution, making a statewide hazard model a goal reachable with existing heterogeneous supercomputers.
Weaver, R. P.; Gittings, M. L.
2004-01-01
The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively parallel hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-parallel versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest simulations we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these simulations. Key features of any massively parallel system include the processors, the disks, the interconnection between processors, the operating system, libraries for message passing and parallel 1/0 and other fundamental units of the system. We will give an overview of the current status of the Crestone Project codes SAGE and RAGE. These codes are intended for general applications without tuning of algorithms or parameters. We have run a wide variety of physical applications from millimeter-scale laboratory laser experiments to the multikilometer-scale asteroid impacts into the Pacific Ocean to parsec-scale galaxy formation. Examples of these simulations will be shown. The goal of our effort is to avoid ad hoc models and attempt to rely on first-principles physics. In addition to the large effort on developing parallel code physics packages, a substantial effort in the project is devoted to improving the computer science and software quality engineering (SQE) of the Project codes as well as a sizable effort on the verification and validation (V&V) of the resulting codes. Examples of these efforts for our project will be discussed.
Infrastructure for distributed enterprise simulation
Johnson, M.M.; Yoshimura, A.S.; Goldsby, M.E.
1998-01-01
Traditional discrete-event simulations employ an inherently sequential algorithm and are run on a single computer. However, the demands of many real-world problems exceed the capabilities of sequential simulation systems. Often the capacity of a computer`s primary memory limits the size of the models that can be handled, and in some cases parallel execution on multiple processors could significantly reduce the simulation time. This paper describes the development of an Infrastructure for Distributed Enterprise Simulation (IDES) - a large-scale portable parallel simulation framework developed to support Sandia National Laboratories` mission in stockpile stewardship. IDES is based on the Breathing-Time-Buckets synchronization protocol, and maps a message-based model of distributed computing onto an object-oriented programming model. IDES is portable across heterogeneous computing architectures, including single-processor systems, networks of workstations and multi-processor computers with shared or distributed memory. The system provides a simple and sufficient application programming interface that can be used by scientists to quickly model large-scale, complex enterprise systems. In the background and without involving the user, IDES is capable of making dynamic use of idle processing power available throughout the enterprise network. 16 refs., 14 figs.
NASA Astrophysics Data System (ADS)
Wang, Wenlong; Machta, Jonathan; Katzgraber, Helmut G.
2015-07-01
Population annealing is a Monte Carlo algorithm that marries features from simulated-annealing and parallel-tempering Monte Carlo. As such, it is ideal to overcome large energy barriers in the free-energy landscape while minimizing a Hamiltonian. Thus, population-annealing Monte Carlo can be used as a heuristic to solve combinatorial optimization problems. We illustrate the capabilities of population-annealing Monte Carlo by computing ground states of the three-dimensional Ising spin glass with Gaussian disorder, while comparing to simulated-annealing and parallel-tempering Monte Carlo. Our results suggest that population annealing Monte Carlo is significantly more efficient than simulated annealing but comparable to parallel-tempering Monte Carlo for finding spin-glass ground states.
MPI parallelization of Vlasov codes for the simulation of nonlinear laser-plasma interactions
NASA Astrophysics Data System (ADS)
Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.
2003-10-01
The simulation of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive parallelization. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for parallel computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking
NASA Astrophysics Data System (ADS)
Hussein, I.; MacMillan, R.
2014-09-01
Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
NASA Astrophysics Data System (ADS)
Powell, Melissa; Mason, Grant; Spencer, Ross
2007-10-01
A Malmberg-Penning trap is a cylindrical apparatus which confines non-neutral plasma (electrons only) with an axial magnetic field and negative electric potentials on both ends. It is a simple system for studying basic plasma behavior, so simple that theory and experiment ought to agree. Theory predicts that a hollow plasma density profile is unstable, and experiments agree. However, the experimental growth rate of the m =1 diocotron mode of the instability is much larger than the theoretical growth rate, by a factor of around 2-4. We are collaborating with Travis Mitchell's experimental research group at the University of Delaware to find the cause for this discrepancy by recreating experimental conditions in our simulation. The growth rates of our simulation test cases have remained less than half the growth rates of Mitchell's experiments. I will report the results of parallelizing the simulation to increase the number of particles to 2 billion. We also optimize the code by converting the field solver from a two grid to a three grid multigrid solver in order to increase the number of grid points.
Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices
Wang Jianguo; Chen Zaigao; Wang Yue; Zhang Dianhui; Qiao Hailiang; Fu Meiyan; Yuan Yuan; Liu Chunliang; Li Yongdong; Wang Hongguang
2010-07-15
This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.
NASA Technical Reports Server (NTRS)
Moxon, Bruce C.; Green, John A.
1990-01-01
A high-performance platform for development of real-time helicopter flight simulations based on a simulation development and analysis platform combining a parallel simulation development and analysis environment with a scalable multiprocessor computer system is described. Simulation functional decomposition is covered, including the sequencing and data dependency of simulation modules and simulation functional mapping to multiple processors. The multiprocessor-based implementation of a blade-element simulation of the UH-60 helicopter is presented, and a prototype developed for a TC2000 computer is generalized in order to arrive at a portable multiprocessor software architecture. It is pointed out that the proposed approach coupled with a pilot's station creates a setting in which simulation engineers, computer scientists, and pilots can work together in the design and evaluation of advanced real-time helicopter simulations.
NASA Astrophysics Data System (ADS)
?idký, Václav; Šidlof, Petr
2014-03-01
The work is devoted to 3D and 2D parallel numerical computation of pressure and velocity fields around an elastically supported airfoil self-oscillating due to interaction with the airflow. Numerical solution is computed in the OpenFOAM package, an open-source software package based on finite volume method. Movement of airfoil is described by translation and rotation, identified from experimental data. A new boundary condition for the 2DOF motion of the airfoil was implemented. The results of numerical simulations (velocity) are compared with data measured in a wind tunnel, where a physical model of NACA0015 airfoil was mounted and tuned to exhibit the flutter instability. The experimental results were obtained previously in the Institute of Thermomechanics by interferographic measurements in a subsonic wind tunnel in Nový Knín.
NASA Astrophysics Data System (ADS)
Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu
2014-11-01
A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-parallel laser pulses is proposed. Particle-in-cell simulation shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 × 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4° is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.
Three-dimensional parallel simulation of cornstarch-oxygen two-phase detonation
NASA Astrophysics Data System (ADS)
Tsuboi, N.; Hayashi, A. K.; Matsumoto, Y.
Numerical simulations were performed with a parallel computer to solve for the behavior of a three-dimensional gas-solid two-phase detonation. The numerical method is a second-order modified Harten-Yee TVD upwind scheme and time integration uses a first order Euler integration. A two-step chemical reaction model represents the reaction of cornstarch-particles and oxygen. The numerical results show that a periodic two-headed detonation appears with a three-dimensional propagation mechanism before and after a triple point collisions. A comparison between the numerical and experimental results reveals that the detonation velocity of numerical results agrees well with that of experimental results.
A package of Linux scripts for the parallelization of Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Badal, Andreu; Sempau, Josep
2006-09-01
Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) simulation involves a prohibitively large amount of time. This limitation can be overcome by having recourse to parallel computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a parallelization scheme of a MC simulation that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the parallel calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to parallelize PENELOPE, a code for MC simulation of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in parallel easily, without requiring specific libraries or significant alterations of the sequential code. Program summary 1Title of program:clonEasy Catalogue identifier:ADYD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYD_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a Unix style shell (bash), support for the Secure Shell protocol and a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1) Compilers:GNU FORTRAN g77 (Linux); g95 (Linux); Intel Fortran Compiler 7.1 (Linux) Programming language used:Linux shell (bash) script, FORTRAN 77 No. of bits in a word:32 No. of lines in distributed program, including test data, etc.:1916 No. of bytes in distributed program, including test data, etc.:18 202 Distribution format:tar.gz Nature of the physical problem:There are many situations where a Monte Carlo simulation involves a huge amount of CPU time. The parallelization of such calculations is a simple way of obtaining a relatively low statistical uncertainty using a reasonable amount of time. Method of solution:The presented collection of Linux scripts and auxiliary FORTRAN programs implement Secure Shell-based communication between a "master" computer and a set of "clones". The aim of this communication is to execute a code that performs a Monte Carlo simulation on all the clones simultaneously. The code is unique, but each clone is fed with a different set of random seeds. Hence, clonEasy effectively permits the parallelization of the calculation. Restrictions on the complexity of the program:clonEasy can only be used with programs that produce statistically independent results using the same code, but with a different sequence of random numbers. Users must choose the initialization values for the random number generator on each computer and combine the output from the different executions. A FORTRAN program to combine the final results is also provided. Typical running time:The execution time
Implementation of a parallel algorithm for thermo-chemical nonequilibrium flow simulations
Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.
1995-01-01
Massively parallel (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to simulate a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale simulation with a large number of computer nodes. This paper will discuss the implementation of a massively parallel computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.
NASA Astrophysics Data System (ADS)
Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail
2013-12-01
Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.
Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail
2013-12-14
Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion. PMID:24329081
NASA Astrophysics Data System (ADS)
Vu, H. X.
1998-08-01
A massively parallel three-dimensional hybrid particle-in-cell (PIC) code, implemented on the CRAY-T3D, is presented. The code is based on a physical model described in a previous report where the electrons are modeled as an adiabatic fluid with an arbitrary ratio of specific heats ? and the electromagnetic field model is based on a temporal Wentzel-Krammers-Brillouin (WKB) approximation. On a CRAY-T3D with 512 processors, the code requires about 0.6 ?s/particle/time step. The largest test problem performed with this code consists of a computational mesh of 4096 × 64 × 64 (16 million) cells, a total of 256 million particles, and corresponds to a plasma volume of 50 ? m × 20 ? m × 20 ? m (approximately 150 ?×60 ?× 60?, where ? is the laser's vacuum wavelength). We believe this code is the first PIC computational tool capable of simulating low-frequency ion-driven parametric instabilities in a large, three-dimensional plasma volume and offers a unique opportunity for examining issues that are potentially vital to inertial confinement fusion (ICF), e.g., nonlinear ion kinetic effects and their role in nonlinear saturation mechanisms in three dimensions. Test simulations of the self-focusing (SF) instability and of the self-focusing-induced deflection of a laser beam are presented.
Computational aeroacoustic simulation of flow-induced cavity noise using parallel computers
NASA Astrophysics Data System (ADS)
Shieh, Chingwei M.; Morris, Philip J.
1998-11-01
A parallel, multiblock, high-order accurate code has been developed for cavity noise prediction. An implicit, second-order time accurate, dual time-stepping algorithm that has shown promise in viscous aeroacoustic simulations has been implemented for the long time integration. The accuracy of the solution obtained with this method is comparable to typical explicit computational aeroacoustic algorithms, but eliminate the stringent time step requirement due to numerical stability for such schemes. Inner fictitious subiterations are performed with a four-stage Runge-Kutta method, with the implementation of multigrid and implicit residual smoothing to accelerate convergence. To account for the turbulent nature of the flow, the one-equation turbulence model has been used in the analysis. Far-field acoustic data are extrapolated from the near-field flow solution with the use of the Ffowcs-Williams and Hawkings equation. Sound generation has been simulated from two-dimensional cavities of various length-to-depth ratios in the subsonic flow regime. The mechanisms for cavity noise generation are discussed.
MONTE CARLO SIMULATIONS OF NONLINEAR PARTICLE ACCELERATION IN PARALLEL TRANS-RELATIVISTIC SHOCKS
Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M. E-mail: ambykov@yahoo.com
2013-10-10
We present results from a Monte Carlo simulation of a parallel collisionless shock undergoing particle acceleration. Our simulation, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and 'smooth' the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor ?{sub 0} = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.
Simulated Wake Characteristics Data for Closely Spaced Parallel Runway Operations Analysis
NASA Technical Reports Server (NTRS)
Guerreiro, Nelson M.; Neitzke, Kurt W.
2012-01-01
A simulation experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced parallel runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were simulated to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.
High Fidelity Simulations of Large-Scale Wireless Networks
Onunkwo, Uzoma; Benz, Zachary
2015-11-01
The worldwide proliferation of wireless connected devices continues to accelerate. There are 10s of billions of wireless links across the planet with an additional explosion of new wireless usage anticipated as the Internet of Things develops. Wireless technologies do not only provide convenience for mobile applications, but are also extremely cost-effective to deploy. Thus, this trend towards wireless connectivity will only continue and Sandia must develop the necessary simulation technology to proactively analyze the associated emerging vulnerabilities. Wireless networks are marked by mobility and proximity-based connectivity. The de facto standard for exploratory studies of wireless networks is discrete event simulations (DES). However, the simulation of large-scale wireless networks is extremely difficult due to prohibitively large turnaround time. A path forward is to expedite simulations with parallel discrete event simulation (PDES) techniques. The mobility and distance-based connectivity associated with wireless simulations, however, typically doom PDES and fail to scale (e.g., OPNET and ns-3 simulators). We propose a PDES-based tool aimed at reducing the communication overhead between processors. The proposed solution will use light-weight processes to dynamically distribute computation workload while mitigating communication overhead associated with synchronizations. This work is vital to the analytics and validation capabilities of simulation and emulation at Sandia. We have years of experience in Sandia’s simulation and emulation projects (e.g., MINIMEGA and FIREWHEEL). Sandia’s current highly-regarded capabilities in large-scale emulations have focused on wired networks, where two assumptions prevent scalable wireless studies: (a) the connections between objects are mostly static and (b) the nodes have fixed locations.
Numerical field simulation for parallel transmission in MRI at 7 tesla
Bernier, Jessica A. (Jessica Ashley)
2011-01-01
Parallel transmission (pTx) is a promising improvement to coil design that has been demonstrated to mitigate B1* inhomogeneity, manifest as center brightening, for high-field magnetic resonance imaging (MRI). Parallel ...
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad
1995-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Automated integration of genomic physical mapping data via parallel simulated annealing
Slezak, T.
1994-06-01
The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via simulated annealing performed on a network of 40+ Unix machines in parallel, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the parallel net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.
Cai, Xiao-Chuan
A parallel two-level method for simulating blood flows in branching arteries with the resistive modeling of blood flows in the arteries is an important and very challenging problem. In order to understand, computationally, the sophisticated hemodynamics in the arteries, it is essential to couple
Cai, Wei
Massively-Parallel Dislocation Dynamics Simulations Wei Cai, Vasily V. Bulatov, Tim G. Pierce based on the collective dynamics of dislocations has been a challenge for computational materials science for a number of years. The difficulty lies in the inability of the existing dislocation dynamics
2004-01-01
Mathematics and Computers in Simulation 65 (2004) 557577 Parallel runs of a large air pollution 20 January 2004; accepted 21 January 2004 Abstract Large-scale air pollution models can successfully. The mathematical description of a large-scale air pollution model will be discussed in this paper. The principles
Direct numerical simulation of instabilities in parallel flow with spherical roughness elements
NASA Technical Reports Server (NTRS)
Deanna, R. G.
1992-01-01
Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.
Direct numerical simulation of instabilities in parallel flow with spherical roughness elements
NASA Astrophysics Data System (ADS)
Deanna, R. G.
1992-08-01
Results from a direct numerical simulation of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical simulation approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces simulate an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the parallel-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.
Kadoya, Y.; Abe, H.
1988-04-01
A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied parallel to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The parallel electric field E/sub parallel/ perturbs the electron velocity v/sub parallel/ parallel to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} ? f(x{sub (i?1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
NASA Astrophysics Data System (ADS)
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-01
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution time/parallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.
NASA Technical Reports Server (NTRS)
Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke
1989-01-01
Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Mesoscale Simulations of Particulate Flows with Parallel Distributed Lagrange Multiplier Technique
Kanarska, Y
2010-03-24
Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical simulation of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle contact physics algorithm, we performed simulations of a representative experiment conducted at the University of California at Berkley for pebble flow through a narrow opening.
A Three Dimensional Parallel Time Accurate Turbopump Simulation Procedure Using Overset Grid Systems
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Chan, William; Kwak, Dochan
2001-01-01
The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete simulation of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate simulations with moving boundary capability will be presented along with the performance of parallel versions of the code.
A Three-Dimensional Parallel Time-Accurate Turbopump Simulation Procedure Using Overset Grid System
NASA Technical Reports Server (NTRS)
Kiris, Cetin; Chan, William; Kwak, Dochan
2002-01-01
The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete simulation of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate simulations with moving boundary capability are presented along with the performance of parallel versions of the code.
NASA Astrophysics Data System (ADS)
Lugovsky, A. Yu.; Popov, Yu. P.
2015-08-01
The Roe-Einfeldt-Osher scheme is considered, which has the third order of accuracy. Its advantages over the first-order accurate Roe scheme are demonstrated, and its choice for the simulation of accretion disk flows is justified. The Roe-Einfeldt-Osher scheme is shown to be efficient as applied to the simulation of real-world problems on parallel computers. Results of simulation of flows in accretion disks in two and three dimensions are presented. Limited capabilities of two-dimensional disk models are noted.
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.; Ko, K.; /SLAC
2009-06-19
Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) parallel higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale simulation of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell) approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.
Singhal, R.P.; Bhardwaj, A. )
1991-09-01
A Monte Carlo simulation of photoelectron energization and energy degradation in H{sub 2} gas in the presence of parallel electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.
Carter, Jonathan; Oliker, Leonid
2006-01-09
The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on such platforms has become major concern in high performance computing. The latest generation of custom-built parallel vector systems have the potential to address this concern for numerical algorithms with sufficient regularity in their computational structure. In this work, we explore two and three dimensional implementations of a lattice-Boltzmann magnetohydrodynamics (MHD) physics application, on some of today's most powerful supercomputing platforms. Results compare performance between the vector-based Cray X1, Earth Simulator, and newly-released NEC SX-8, with the commodity-based superscalar platforms of the IBM Power3, IntelItanium2, and AMD Opteron. Overall results show that the SX-8 attains unprecedented aggregate performance across our evaluated applications.
Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji
2015-01-01
GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NP?T ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220
Discrete Event Dyn Syst (2011) 21:547576 DOI 10.1007/s10626-011-0105-z
Cassandras, Christos G.
2011-01-01
resource whose service capacity is to be optimally shared by N competing users. In a queueing The authors. Kebarighotbi (B) · C. G. Cassandras Division of Systems Engineering, Center for Information and Systems as a system of N parallel queues, each with its own arrival process, connected to a single server. The server
Bylaska, Eric J; Weare, Jonathan Q; Weare, John H
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial?execution/timeparallel?execution?time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step. PMID:23968079
Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.
2013-08-21
Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using a variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl+4H2O AIMD simulation at the MP2 level. The maximum speedup obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.
A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations
NASA Astrophysics Data System (ADS)
Smith, Luke; Liang, Qiuhua
2015-04-01
Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed on a 2m grid within a few hours. In the context of a rapid pluvial flood event in Newcastle upon Tyne during 2012, the technique allows simulation of inundation for a 31km2 of the city centre in less than an hour on a 2m grid; however, further grid refinement is required to fully capture important smaller flow pathways. Good agreement between the model and observed inundation is achieved for a variety of dam failure, slow fluvial inundation, rapid pluvial inundation, and defence breach scenarios in the UK.
Furumura, Takashi
for future earthquake scenarios. keyword: Earth Simulator, Earthquake, Parallel Com- puting, Seismic Waves the process of strong motion generation during damaging earthquakes, high-resolutioncomputer simulations Simulation and Visualization of 3D Seismic Wavefield Using the Earth Simulator T. Furumura1 and L. Chen2
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T; Hammond, Glenn; Mahinthakumar, Kumar
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.
NASA Astrophysics Data System (ADS)
Pfund, R. E. W.; Lichters, R.; Meyer-ter-Vehn, J.
1998-02-01
We report on a recently developed electromagnetic relativistic 1D3V (one spatial, three velocity dimensions) Particle-In-Cell code for simulating laser-plasma interaction at normal and oblique incidence. The code is written in C++ and easy to extend. The data structure is characterized by the use of chained lists for the grid cells as well as particles belonging to one cell. The parallel version of the code is based on PVM. It splits the grid into several spatial domains each belonging to one processor. Since particles can cross boundaries of cells as well as domains, the processor loads will generally change in time. This is counteracted by adjusting the domain sizes dynamically, for which the use of chained lists has proven to be very convenient. Moreover, an option for restarting the simulation from intermediate stages of the time evolution has been implemented even in the parallel version. The code will be published and distributed freely.
NASA Astrophysics Data System (ADS)
Blake, Douglas Clifton
A new methodology is presented for conducting numerical simulations of electromagnetic scattering and wave-propagation phenomena on massively parallel computing platforms. A process is constructed which is rooted in the Finite-Volume Time-Domain (FVTD) technique to create a simulation capability that is both versatile and practical. In terms of versatility, the method is platform independent, is easily modifiable, and is capable of solving a large number of problems with no alterations. In terms of practicality, the method is sophisticated enough to solve problems of engineering significance and is not limited to mere academic exercises. In order to achieve this capability, techniques are integrated from several scientific disciplines including computational fluid dynamics, computational electromagnetics, and parallel computing. The end result is the first FVTD solver capable of utilizing the highly flexible overset-gridding process in a distributed-memory computing environment. In the process of creating this capability, work is accomplished to conduct the first study designed to quantify the effects of domain-decomposition dimensionality on the parallel performance of hyperbolic partial differential equations solvers; to develop a new method of partitioning a computational domain comprised of overset grids; and to provide the first detailed assessment of the applicability of overset grids to the field of computational electromagnetics. Using these new methods and capabilities, results from a large number of wave propagation and scattering simulations are presented. The overset-grid FVTD algorithm is demonstrated to produce results of comparable accuracy to single-grid simulations while simultaneously shortening the grid-generation process and increasing the flexibility and utility of the FVTD technique. Furthermore, the new domain-decomposition approaches developed for overset grids are shown to be capable of producing partitions that are better load balanced and require less interprocessor communication than did previously used overset-grid decomposition methods. This results in parallel efficiencies routinely in excess of 90 percent, even for relatively small problems and large numbers of processors.
NASA Astrophysics Data System (ADS)
Schroeder, Matthias; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim
2014-05-01
Thousands of numerical tsunami simulations allow the computation of inundation and run-up along the coast for vulnerable areas over the time. A so-called Matching Scenario Database (MSDB) [1] contains this large number of simulations in text file format. In order to visualize these wave propagations the scenarios have to be reprocessed automatically. In the TRIDEC project funded by the seventh Framework Programme of the European Union a Virtual Scenario Database (VSDB) and a Matching Scenario Database (MSDB) were established amongst others by the working group of the University of Bologna (UniBo) [1]. One part of TRIDEC was the developing of a new generation of a Decision Support System (DSS) for tsunami Early Warning Systems (TEWS) [2]. A working group of the GFZ German Research Centre for Geosciences was responsible for developing the Command and Control User Interface (CCUI) as central software application which support operator activities, incident management and message disseminations. For the integration and visualization in the CCUI, the numerical tsunami simulations from MSDB must be converted into the shapefiles format. The usage of shapefiles enables a much easier integration into standard Geographic Information Systems (GIS). Since also the CCUI is based on two widely used open source products (GeoTools library and uDig), whereby the integration of shapefiles is provided by these libraries a priori. In this case, for an example area around the Western Iberian margin several thousand tsunami variations were processed. Due to the mass of data only a program-controlled process was conceivable. In order to optimize the computing efforts and operating time the use of an existing GFZ High Performance Computing Cluster (HPC) had been chosen. Thus, a geospatial software was sought after that is capable for parallel processing. The FOSS tool Geospatial Data Abstraction Library (GDAL/OGR) was used to match the coordinates with the wave heights and generates the different shapefiles for certain time steps. The shapefiles contain afterwards lines for visualizing the isochrones of the wave propagation and moreover, data about the maximum wave height and the Estimated Time of Arrival (ETA) at the coast. Our contribution shows the entire workflow and the visualizing results of the-processing for the example region Western Iberian ocean margin. [1] Armigliato A., Pagnoni G., Zaniboni F, Tinti S. (2013), Database of tsunami scenario simulations for Western Iberia: a tool for the TRIDEC Project Decision Support System for tsunami early warning, Vol. 15, EGU2013-5567, EGU General Assembly 2013, Vienna (Austria). [2] Löwe, P., Wächter, J., Hammitzsch, M., Lendholt, M., Häner, R. (2013): The Evolution of Service-oriented Disaster Early Warning Systems in the TRIDEC Project, 23rd International Ocean and Polar Engineering Conference - ISOPE-2013, Anchorage (USA).
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-01
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus. PMID:26575558
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine
2015-01-01
With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted. PMID:25680098
Jolliet, S.; McMillan, B. F.; Vernay, T.; Villard, L.; Hatzky, R.; Bottino, A.; Angelino, P.
2009-07-15
In this paper, the influence of the parallel nonlinearity on zonal flows and heat transport in global particle-in-cell ion-temperature-gradient simulations is studied. Although this term is in theory orders of magnitude smaller than the others, several authors [L. Villard, P. Angelino, A. Bottino et al., Plasma Phys. Contr. Fusion 46, B51 (2004); L. Villard, S. J. Allfrey, A. Bottino et al., Nucl. Fusion 44, 172 (2004); J. C. Kniep, J. N. G. Leboeuf, and V. C. Decyck, Comput. Phys. Commun. 164, 98 (2004); J. Candy, R. E. Waltz, S. E. Parker et al., Phys. Plasmas 13, 074501 (2006)] found different results on its role. The study is performed using the global gyrokinetic particle-in-cell codes TORB (theta-pinch) [R. Hatzky, T. M. Tran, A. Koenies et al., Phys. Plasmas 9, 898 (2002)] and ORB5 (tokamak geometry) [S. Jolliet, A. Bottino, P. Angelino et al., Comput. Phys. Commun. 177, 409 (2007)]. In particular, it is demonstrated that the parallel nonlinearity, while important for energy conservation, affects the zonal electric field only if the simulation is noise dominated. When a proper convergence is reached, the influence of parallel nonlinearity on the zonal electric field, if any, is shown to be small for both the cases of decaying and driven turbulence.
Reddy, A.V.; Kothe, D.B.; Lam, K.L.
1997-06-01
The Los Alamos National Laboratory (LANL) is currently developing a new casting simulation tool (known as Telluride) that employs robust, high-resolution finite volume algorithms for incompressible fluid flow, volume tracking of interfaces, and solidification physics on three-dimensional (3-D) unstructured meshes. Their finite volume algorithms are based on colocated cell-centered schemes that are formally second order in time and space. The flow algorithm is a 3-D extension of recent work on projection method solutions of the Navier-Stokes (NS) equations. Their volume tracking algorithm can accurately track topologically complex interfaces by approximating the interface geometry as piecewise planar. Coupled to their fluid flow algorithm is a comprehensive binary alloy solidification model that incorporates macroscopic descriptions of heat transfer, solute redistribution, and melt convection as well as a microscopic description of segregation. The finite volume algorithms, which are efficient, parallel, and robust, can yield high-fidelity solutions on a variety of meshes, ranging from those that are structured orthogonal to fully unstructured (finite element). The authors discuss key computer science issues that have enabled them to efficiently parallelize their unstructured mesh algorithms on both distributed and shared memory computing platforms. These include their functionally object-oriented use of Fortran 90 and new parallel libraries for gather/scatter functions (PGSLib) and solutions of linear systems of equations (JTpack90). Examples of their current capabilities are illustrated with simulations of mold filling and solidification of complex 3-D components currently being poured in LANL foundries.
Large-eddy simulation of the Rayleigh-Taylor instability on a massively parallel computer
Amala, P.A.K.
1995-03-01
A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively parallel computer. Programming models on massively parallel computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively parallel computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data parallel code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data parallel approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.
Coupled models and parallel simulations for three-dimensional full-Stokes ice sheet modeling
Zhang, Huai; Ju, Lili; Gunzburger, Max; Ringler, Todd; Price, Stephen
2011-01-01
A three-dimensional full-Stokes computational model is considered for determining the dynamics, temperature, and thickness of ice sheets. The governing thermomechanical equations consist of the three-dimensional full-Stokes system with nonlinear rheology for the momentum, an advective-diffusion energy equation for temperature evolution, and a mass conservation equation for icethickness changes. Here, we discuss the variable resolution meshes, the finite element discretizations, and the parallel algorithms employed by the model components. The solvers are integrated through a well-designed coupler for the exchange of parametric data between components. The discretization utilizes high-quality, variable-resolution centroidal Voronoi Delaunay triangulation meshing and existing parallel solvers. We demonstrate the gridding technology, discretization schemes, and the efficiency and scalability of the parallel solvers through computational experiments using both simplified geometries arising from benchmark test problems and a realistic Greenland ice sheet geometry.
Heffelfinger, G.S.; Lewitt, M.E.
1994-05-01
We present a new massively parallel decomposition for grand canonical Monte Carlo computer simulation (GCMC) suitable for short ranged fluids. Our spatial algorithm relies on the fact that for short-ranged fluids, molecules separated by a greater distance than the reach of the potential act independently, thus different processors can work concurrently in regions of the same system which are sufficiently far apart. Several parallelization issues unique to GCMC are addressed such as the handling of the three different types of Monte Carlo move used in GCMC: the displacement of a molecule, the creation of a molecule, and the destruction of a molecule. The decomposition is shown to scale with system size, making it especially useful for systems where the physical problem dictates the system size, for example, fluid behavior in mesopores.
ROSI—an object-oriented and parallel-computing Monte Carlo simulation for X-ray imaging
NASA Astrophysics Data System (ADS)
Giersch, Jürgen; Weidemann, Andreas; Anton, Gisela
2003-08-01
In the field of X-ray imaging, Monte Carlo simulation is an important tool. It gives the possibility of understanding experimental results and it allows the construction of virtual imaging setups with predictions of their quality. For these reasons, we developed the Roentgen Simulation (ROSI) which is based on the object-oriented C++ class library GISMO. The interaction algorithms are based on the established EGS4-code and its current LSCAT-extension. ROSI introduces random variables for modelling physical parameters by a given random distribution, e.g. the source position or the direction and energy of the photons to be emitted. It is possible to run ROSI in parallel on a local computer network (Beowulf cluster) to obtain simulation data in shorter time. Finally, it has an easy-to-use interface. We will present the concept of ROSI and demonstrate its flexibility by an example.
Parallelized Multi-Worm Algorithm for Large Scale Quantum Monte-Carlo simulations
NASA Astrophysics Data System (ADS)
Suzuki, Takafumi; Masaki-Kato, Akiko; Harada, Kenji; Todo, Synge; Kawashima, Naoki
2014-03-01
The quantum Monte Carlo (QMC) calculation is a powerful and accurate method for quantum many body interacting systems. In this study, we present a new algorithm for the worldline Monte Carlo method based on the Feynman path integral. While the worm algorithm (WA) has been used widely because of its broader range of applicability, the parallelization of WA is not straightforward. We present a general QMC algorithm based on the directed-loop algorithm with the domain decomposition. This new algorithm is referred to as Parallelized Multi-Worm Algorithm (PMWA). In PMWA, a large number of worms are introduced by controlling a fictitious transverse field. For a benchmark, we applied the PMWA to the hardcore Bose-Hubbard model on the square lattice, and computed the system-size dependence of the Bose-condensation order parameter up to L2 =102402 by using 3200 processors. The benchmark results showed high parallelization efficiency. This indicates that the PMWA is suitable for parallelizing on a distributed-memory computer.
On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing ?
Feitelson, Dror
that are conjectured to be correlated with user satisfaction, with the premise that this will result in a higher CREASY that exploits knowledge on user behavior to directly improve user satisfaction, and compare its behavior, feedback. I. INTRODUCTION AN important goal of any parallel-system scheduler is to promote
Milind Deo; Chung-Kan Huang; Huabing Wang
2008-08-31
Black-oil, compositional and thermal simulators have been developed to address different physical processes in reservoir simulation. A number of different types of discretization methods have also been proposed to address issues related to representing the complex reservoir geometry. These methods are more significant for fractured reservoirs where the geometry can be particularly challenging. In this project, a general modular framework for reservoir simulation was developed, wherein the physical models were efficiently decoupled from the discretization methods. This made it possible to couple any discretization method with different physical models. Oil characterization methods are becoming increasingly sophisticated, and it is possible to construct geologically constrained models of faulted/fractured reservoirs. Discrete Fracture Network (DFN) simulation provides the option of performing multiphase calculations on spatially explicit, geologically feasible fracture sets. Multiphase DFN simulations of and sensitivity studies on a wide variety of fracture networks created using fracture creation/simulation programs was undertaken in the first part of this project. This involved creating interfaces to seamlessly convert the fracture characterization information into simulator input, grid the complex geometry, perform the simulations, and analyze and visualize results. Benchmarking and comparison with conventional simulators was also a component of this work. After demonstration of the fact that multiphase simulations can be carried out on complex fracture networks, quantitative effects of the heterogeneity of fracture properties were evaluated. Reservoirs are populated with fractures of several different scales and properties. A multiscale fracture modeling study was undertaken and the effects of heterogeneity and storage on water displacement dynamics in fractured basements were investigated. In gravity-dominated systems, more oil could be recovered at a given pore volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase simulations in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional simulators were developed for fractured reservoirs using the generalized framework. The thermal-compositional simulator was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The simulators were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently parallel. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.
A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers
NASA Technical Reports Server (NTRS)
Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl
2010-01-01
Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.
NASA Technical Reports Server (NTRS)
Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon
2014-01-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a probability of collision Pc > 10 (sup -6) can be mitigated.
NASA Astrophysics Data System (ADS)
Reuter, K.; Jenko, F.; Forest, C. B.; Bayliss, R. A.
2008-08-01
A parallel implementation of a nonlinear pseudo-spectral MHD code for the simulation of turbulent dynamos in spherical geometry is reported. It employs a dual domain decomposition technique in both real and spectral space. It is shown that this method shows nearly ideal scaling going up to 128 CPUs on Beowulf-type clusters with fast interconnect. Furthermore, the potential of exploiting single precision arithmetic on standard x86 processors is examined. It is pointed out that the MHD code thereby achieves a maximum speedup of 1.7, whereas the validity of the computations is still granted. The combination of both measures will allow for the direct numerical simulation of highly turbulent cases ( 1500
Voelz, Vincent A; Luttmann, Edgar; Bowman, Gregory R; Pande, Vijay S
2009-03-01
Recently a temperature-jump FTIR study of a designed three-stranded sheet showing a fast relaxation time of approximately 140 +/- 20 ns was published. We performed massively parallel molecular dynamics simulations in explicit solvent to probe the structural events involved in this relaxation. While our simulations produce similar relaxation rates, the structural ensemble is broad. We observe the formation of turn structure, but only very weak interaction in the strand regions, which is consistent with the lack of strong backbone-backbone NOEs in previous structural NMR studies. These results suggest that either (D)P(D)P-II folds at time scales longer than 240 ns, or that (D)P(D)P-II is not a well-defined three-stranded beta-sheet. This work also provides an opportunity to compare the performance of several popular forcefield models against one another. PMID:19399235
Method for distributed agent-based non-expert simulation of manufacturing process behavior
Ivezic, Nenad; Potok, Thomas E.
2004-11-30
A method for distributed agent based non-expert simulation of manufacturing process behavior on a single-processor computer comprises the steps of: object modeling a manufacturing technique having a plurality of processes; associating a distributed agent with each the process; and, programming each the agent to respond to discrete events corresponding to the manufacturing technique, wherein each discrete event triggers a programmed response. The method can further comprise the step of transmitting the discrete events to each agent in a message loop. In addition, the programming step comprises the step of conditioning each agent to respond to a discrete event selected from the group consisting of a clock tick message, a resources received message, and a request for output production message.
NASA Astrophysics Data System (ADS)
Debolt, Stephen Edward
Solvent effects were studied and described via molecular dynamics (MD) and free energy perturbation (FEP) simulations using the molecular mechanics program AMBER. The following specific topics were explored:. Polar solvents cause a blue shift of the rm nto pi^* transition band of simple alkyl carbonyl compounds. The ground- versus excited-state solvation effects responsible for the observed solvatochromism are described in terms of the molecular level details of solute-solvent interactions in several modeled solvents spanning the range from polar to nonpolar, including water, methanol, and carbon tetrachloride. The structure and dynamics of octanol media were studied to explore the question: "why is octanol/water media such a good biophase analog?". The formation of linear and cyclic polymers of hydrogen-bonded solvent molecules, micelle-like clusters, and the effects of saturating waters are described. Two small drug-sized molecules, benzene and phenol, were solvated in water-saturated octanol. The solute-solvent structure and dynamics were analysed. The difference in their partitioning free energies was calculated. MD and FEP calculations were adapted for parallel computation, increasing their "speed" or the time span accessible by a simulation. The non-cyclic polyether ionophore salinomycin was studied in methanol solvent via parallel FEP. The path of binding and release for a potassium ion was investigated by calculating the potential of mean force along the "exit vector".
Byers, J.A.; Williams, T.J.; Cohen, B.I.; Dimits, A.M.
1994-04-27
One of the programs of the Magnetic fusion Energy (MFE) Theory and computations Program is studying the anomalous transport of thermal energy across the field lines in the core of a tokamak. We use the method of gyrokinetic particle-in-cell simulation in this study. For this LDRD project we employed massively parallel processing, new algorithms, and new algorithms, and new formal techniques to improve this research. Specifically, we sought to take steps toward: researching experimentally-relevant parameters in our simulations, learning parallel computing to have as a resource for our group, and achieving a 100 {times} speedup over our starting-point Cray2 simulation code`s performance.
?BUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios
2014-01-01
Background Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results We present a flexible Monte Carlo simulation tool, called ?BUSS, that employs the BEAGLE high performance library for phylogenetic computations to rapidly generate large sequence alignments under complex evolutionary models. ?BUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. ?BUSS may serve as an easy-to-use, standard sequence simulation tool, but the available models and data types are particularly useful to assess the performance of complex BEAST inferences. The connection with BEAST is further strengthened through the use of a common extensible markup language (XML), allowing to specify also more advanced evolutionary models. To support simulation under the latter, as well as to support simulation and analysis in a single run, we also add the ?BUSS core simulation routine to the list of BEAST XML parsers. Conclusions ?BUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, ?BUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. Applications are not restricted to the BEAST framework, or even time-measured evolutionary histories, and ?BUSS can be connected to various other programs using standard input and output format. PMID:24885610
Event-based simulation of quantum physics experiments K. Michielsen
Event-based simulation of quantum physics experiments Â¤ K. Michielsen Institute for Advanced techniques; discrete event simulation; quantum theory. PACS Nos.: 02.70.Ã?c, 03.65.Ã?w, 03.65.Ud. 1 by constructing an event-based simulation model that reproduces the statistical distributions of quantum (and
Prinz, Jan-Hendrik; Chondera, John D; Pande, Vijay S; Swope, William C; Smith, Jeremy C; Noe, F
2011-01-01
Parallel tempering (PT) molecular dynamics simulations have been extensively investigated as a means of efficient sampling of the configurations of biomolecular systems. Recent work has demonstrated how the short physical trajectories generated in PT simulations of biomolecules can be used to construct the Markov models describing biomolecular dynamics at each simulated temperature. While this approach describes the temperature-dependent kinetics, it does not make optimal use of all available PT data, instead estimating the rates at a given temperature using only data from that temperature. This can be problematic, as some relevant transitions or states may not be sufficiently sampled at the temperature of interest, but might be readily sampled at nearby temperatures. Further, the comparison of temperature-dependent properties can suffer from the false assumption that data collected from different temperatures are uncorrelated. We propose here a strategy in which, by a simple modification of the PT protocol, the harvested trajectories can be reweighted, permitting data from all temperatures to contribute to the estimated kinetic model. The method reduces the statistical uncertainty in the kinetic model relative to the single temperature approach and provides estimates of transition probabilities even for transitions not observed at the temperature of interest. Further, the method allows the kinetics to be estimated at temperatures other than those at which simulations were run. We illustrate this method by applying it to the generation of a Markov model of the conformational dynamics of the solvated terminally blocked alanine peptide.
Implementing system simulation of C3 systems using autonomous objects
NASA Technical Reports Server (NTRS)
Rogers, Ralph V.
1987-01-01
The basis of all conflict recognition in simulation is a common frame of reference. Synchronous discrete-event simulation relies on the fixed points in time as the basic frame of reference. Asynchronous discrete-event simulation relies on fixed-points in the model space as the basic frame of reference. Neither approach provides sufficient support for autonomous objects. The use of a spatial template as a frame of reference is proposed to address these insufficiencies. The concept of a spatial template is defined and an implementation approach offered. Discussed are the uses of this approach to analyze the integration of sensor data associated with Command, Control, and Communication systems.
Romano, Paul K. (Paul Kollath)
2013-01-01
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there ...
Massively-Parallel Spectral Element Large Eddy Simulation of a Ring-Type Gas Turbine Combustor
Camp, Joshua Lane
2012-07-16
c Work CFD Computational Fluid Dynamics DNS Direct Numerical Simulation DOF Degree of Freedom FEM Finite Element Method GLL Gauss-Lobatto-Legendre GT Gas Turbine LES Large Eddy Simulation PVC Precessing Vortex Core RANS Reynolds... that causes the rotor to spin. A similar example of this type of power generation machine is a hydroelectric dam. In this case, the potential energy from the height of the water is extracted. These machines are great concepts because of the fact...
One-dimensional Vlasov simulation of parallel electric fields in two-electron population plasma
Saharia, K.; Goswami, K. S.
2007-09-15
One-dimensional Vlasov simulation in electron current carrying multicomponent plasma seeded with a density depression is presented. Considering two electron populations [one is sufficiently hot ({approx}keV) and the other is cold along with cold background ions], the formation of weak double layers is investigated. Simulation results show that in this numerical setting, formation of such double layers needs the majority of the hot electrons.
NASA Technical Reports Server (NTRS)
Wendel, Deirdre E.; Olson, D. K.; Hesse, Michael; Aunai, N.; Kuznetsova, M.; Karimabadi, H.; Daughton, W.; Adrian, M. L.
2013-01-01
We investigate the distribution of parallel electric fields and their relationship to the location and rate of magnetic reconnection in a large particle-in-cell simulation of 3D turbulent magnetic reconnection with open boundary conditions. The simulation's guide field geometry inhibits the formation of simple topological features such as null points. Therefore, we derive the location of potential changes in magnetic connectivity by finding the field lines that experience a large relative change between their endpoints, i.e., the quasi-separatrix layer. We find a good correspondence between the locus of changes in magnetic connectivity or the quasi-separatrix layer and the map of large gradients in the integrated parallel electric field (or quasi-potential). Furthermore, we investigate the distribution of the parallel electric field along the reconnecting field lines. We find the reconnection rate is controlled by only the low-amplitude, zeroth and first-order trends in the parallel electric field while the contribution from fluctuations of the parallel electric field, such as electron holes, is negligible. The results impact the determination of reconnection sites and reconnection rates in models and in situ spacecraft observations of 3D turbulent reconnection. It is difficult through direct observation to isolate the loci of the reconnection parallel electric field amidst the large amplitude fluctuations. However, we demonstrate that a positive slope of the running sum of the parallel electric field along the field line as a function of field line length indicates where reconnection is occurring along the field line.
Scalability of the parallel CFD simulations of flow past a fluttering airfoil in OpenFOAM
NASA Astrophysics Data System (ADS)
Šidlof, Petr; ?idký, Václav
2015-05-01
The paper is devoted to investigation of unsteady subsonic airflow past an elastically supported airfoil during onset of the flutter instability. Based on the geometry, boundary conditions and airfoil motion data identified from wind-tunnel measurements, a 3D CFD model has been set up in OpenFOAM. The model is based on incompressible Navier-Stokes equations. The turbulence is modelled by the Menter's k-omega shear stress transport turbulence model. The computational mesh was generated in GridPro, a mesh generator capable of producing highly orthogonal structured C-type meshes. The mesh totals 3.1 million elements. Parallel scalability was measured on a small shared-memory SGI Altix UV 100 supercomputer.
NASA Technical Reports Server (NTRS)
Nishikawa, K.-I.; Ganguli, G.; Lee, Y. C.; Palmadesso, P. J.
1989-01-01
A spatially two-dimensional electrostatic PIC simulation code was used to study the stability of a plasma equilibrium characterized by a localized transverse dc electric field and a field-aligned drift for L is much less than Lx, where Lx is the simulation length in the x direction and L is the scale length associated with the dc electric field. It is found that the dc electric field and the field-aligned current can together play a synergistic role to enable the excitation of electrostatic waves even when the threshold values of the field aligned drift and the E x B drift are individually subcritical. The simulation results show that the growing ion waves are associated with small vortices in the linear stage, which evolve to the nonlinear stage dominated by larger vortices with lower frequencies.
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
Warren, Michael S.
2014-01-01
We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2 18 ) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (4096 3 ) particle cosmological simulations, accounting for 4×10 20 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracymore »and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods. « less
NASA Technical Reports Server (NTRS)
Lake, George; Quinn, Thomas; Richardson, Derek C.; Stadel, Joachim
1999-01-01
"The orbit of any one planet depends on the combined motion of all the planets, not to mention the actions of all these on each other. To consider simultaneously all these causes of motion and to define these motions by exact laws allowing of convenient calculation exceeds, unless I am mistaken, the forces of the entire human intellect" -Isaac Newton 1687. Epochal surveys are throwing down the gauntlet for cosmological simulation. We describe three keys to meeting the challenge of N-body simulation: adaptive potential solvers, adaptive integrators and volume renormalization. With these techniques and a dedicated Teraflop facility, simulation can stay even with observation of the Universe. We also describe some problems in the formation and stability of planetary systems. Here, the challenge is to perform accurate integrations that retain Hamiltonian properties for 10(exp 13) timesteps.
Parallel adaptive Cartesian upwind methods for shock-driven multiphysics simulation
Deiterding, Ralf
2011-01-01
The multiphysics fluid-structure interaction simulation of shock-loaded thin-walled structures requires the dynamic coupling of a shock-capturing flow solver to a solid mechanics solver for large deformations. By combining a Cartesian embedded boundary approach with dynamic mesh adaptation a generic software framework for such flow solvers has been constructed that allows easy exchange of the specific hydrodynamic finite volume upwind scheme and coupling to various explicit finite element solid dynamics solvers. The paper gives an overview of the computational approach and presents first simulations that couple the software to the general purpose solid dynamics code DYNA3D.
Parallel code NSBC: Simulations of relativistic nuclei scattering by a bent crystal
NASA Astrophysics Data System (ADS)
Babaev, A. A.
2014-01-01
The presented program was designed to simulate the passage of relativistic nuclei through a bent crystal. Namely, the input data is related to a nuclei beam. The nuclei move into the crystal under planar channeling and quasichanneling conditions. The program realizes the numerical algorithm to evaluate the trajectory of nucleus in the bent crystal. The program output is formed by the projectile motion data including the angular distribution of nuclei behind the crystal. The program could be useful to simulate the particle tracking at the accelerator facilities used the crystal collimation systems. The code has been written on C++ and designed for the multiprocessor systems (clusters).
NASA Astrophysics Data System (ADS)
Gedney, Stephen D.
1987-09-01
The Electromagnetic Pulse (EMP) produced by a high-altitude nuclear blast presents a severe threat to electronic systems due to its extreme characteristics. To test the vulnerability of large systems, such as airplanes, missiles, or satellites, they must be subjected to a simulated EMP environment. One type of simulator that has been used to approximate the EMP environment is the Large Parallel-Plate Bounded-Wave Simulator. It is a guided wave simulator which has properties of transmission line and supports a single TEM model at sufficiently low frequencies. This type of simulator consists of finite-width parallel-plate waveguides, which are excited by a wave launcher and terminated by a wave receptor. This study addresses the field distribution within a finite-width parallel-plate waveguide that is matched to a conical tapered waveguide at either end. Characteristics of a parallel-plate bounded-wave EMP simulator were developed using scattering theory, thin-wire mesh approximation of the conducting surfaces, and the Numerical Electronics Code (NEC). Background is provided for readers to use the NEC as a tool in solving thin wire scattering problems.
Melbourne, University of
and Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Laboratory Department of Computer Science--As interest in adopting Cloud computing for various applications is rapidly growing, it is important important in the evaluation of the Cloud computing model. Simulation tools allow researchers to rapidly
NASA Astrophysics Data System (ADS)
Branicio, Paulo S.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Rino, Jose P.
Atomistic mechanisms of damage initiation during hypervelocity (15 km/s) impact on an AlN coating is investigated using parallel molecular-dynamics simulations involving 209 million atoms. On impact a strong shock wave is generated, which then splits into an elastic precursor and a structural phase transformation (SPT) waves, the latter driving a wurtzite to rocksalt structural transition. During its development, the SPT wave induces plastic processes in the intact wurtzite material, which in turn facilitate the nucleation and growth of brittle cracks. Specifically, the interface between the transformed (rocksalt) and untransformed (wurtzite) regions acts as a source of nanocavities and kink bands. They further interact with stress release waves reflected from the back surface and create cracks in mode I, from the nanocavities, and in mode II, from the kink band superdislocation boundary. Stresses are evaluated using a stoichiometric-preserving formula for virial local averages on inhomogeneous binary systems. Defects are analyzed using shortest-path ring statistics.
Massively Parallel Simulation of Uranium Migration at the Hanford 300 Area
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Lichtner, P. C.
2009-12-01
Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to simulate uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to simulate geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to simulating radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration direction, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components simulated, 3D simulation of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.
Parallelization of Rocket Engine Simulator Software (P.R.E.S.S.)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1999-01-01
Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The project has started on October 19, 1995, and after a three-year period corresponding to project phases and fiscal-year funding by NASA Lewis Research Center (now Glenn Research Center), has ended on October 18, 1998. The one-year no-cost extension period was granted on June 7, 1998, until October 19, 1999. The aim of this one year no-cost extension period was to carry out further research to complete the work and lay the groundwork for subsequent research in the area of aerospace engine design optimization software tools. The previous progress for the research has been reported in great detail in respective interim and final research progress reports, seven of them, in all. While the purpose of this report is to be a final summary and an valuative view of the entire work since the first year funding, the following is a quick recap of the most important sections of the interim report dated April 30, 1999.
Dynamic temperature selection for parallel tempering in Markov chain Monte Carlo simulations
NASA Astrophysics Data System (ADS)
Vousden, W. D.; Farr, W. M.; Mandel, I.
2016-01-01
Modern problems in astronomical Bayesian inference require efficient methods for sampling from complex, high-dimensional, often multimodal probability distributions. Most popular methods, such as MCMC sampling, perform poorly on strongly multimodal probability distributions, rarely jumping between modes or settling on just one mode without finding others. Parallel tempering addresses this problem by sampling simultaneously with separate Markov chains from tempered versions of the target distribution with reduced contrast levels. Gaps between modes can be traversed at higher temperatures, while individual modes can be efficiently explored at lower temperatures. In this paper, we investigate how one might choose the ladder of temperatures to achieve more efficient sampling, as measured by the autocorrelation time of the sampler. In particular, we present a simple, easily implemented algorithm for dynamically adapting the temperature configuration of a sampler while sampling. This algorithm dynamically adjusts the temperature spacing to achieve a uniform rate of exchanges between chains at neighbouring temperatures. We compare the algorithm to conventional geometric temperature configurations on a number of test distributions and on an astrophysical inference problem, reporting efficiency gains by a factor of 1.2-2.5 over a well-chosen geometric temperature configuration and by a factor of 1.5-5 over a poorly chosen configuration. On all of these problems, a sampler using the dynamical adaptations to achieve uniform acceptance ratios between neighbouring chains outperforms one that does not.
Modeling and simulation of a Stewart platform type parallel structure robot
NASA Technical Reports Server (NTRS)
Lim, Gee Kwang; Freeman, Robert A.; Tesar, Delbert
1989-01-01
The kinematics and dynamics of a Stewart Platform type parallel structure robot (NASA's Dynamic Docking Test System) were modeled using the method of kinematic influence coefficients (KIC) and isomorphic transformations of system dependence from one set of generalized coordinates to another. By specifying the end-effector (platform) time trajectory, the required generalized input forces which would theoretically yield the desired motion were determined. It was found that the relationship between the platform motion and the actuators motion was nonlinear. In addition, the contribution to the total generalized forces, required at the actuators, from the acceleration related terms were found to be more significant than the velocity related terms. Hence, the curve representing the total required actuator force generally resembled the curve for the acceleration related force. Another observation revealed that the acceleration related effective inertia matrix I sub dd had the tendency to decouple, with the elements on the main diagonal of I sub dd being larger than the off-diagonal elements, while the velocity related inertia power array P sub ddd did not show such tendency. This tendency results in the acceleration related force curve of a given actuator resembling the acceleration profile of that particular actuator. Furthermore, it was indicated that the effective inertia matrix for the legs is more decoupled than that for the platform. These observations provide essential information for further research to develop an effective control strategy for real-time control of the Dynamic Docking Test System.
Final Report for 'ParSEC-Parallel Simulation of Electron Cooling"
David L Bruhwiler
2005-09-16
The Department of Energy has plans, during the next two or three years, to design an electron cooling section for the collider ring at RHIC (Relativistic Heavy Ion Collider) [1]. Located at Brookhaven National Laboratory (BNL), RHIC is the premier nuclear physics facility. The new cooling section would be part of a proposed luminosity upgrade [2] for RHIC. This electron cooling section will be different from previous electron cooling facilities in three fundamental ways. First, the electron energy will be 50 MeV, as opposed to 100's of keV (or 4 MeV for the electron cooling system now operating at Fermilab [3]). Second, both the electron beam and the ion beam will be bunched, rather than being essentially continuous. Third, the cooling will take place in a collider rather than in a storage ring. Analytical work, in combination with the use and further development of the semi-analytical codes BETACOOL [4,5] and SimCool [6,7] are being pursued at BNL [8] and at other laboratories around the world. However, there is a growing consensus in the field that high-fidelity 3-D particle simulations are required to fully understand the critical cooling physics issues in this new regime. Simulations of the friction coefficient, using the VORPAL code [9], for single gold ions passing once through the interaction region, have been compared with theoretical calculations [10,11], and the results have been presented in conference proceedings papers [8,12,13,14] and presentations [15,16,17]. Charged particles are advanced using a fourth-order Hermite predictor corrector algorithm [18]. The fields in the beam frame are obtained from direct calculation of Coulomb's law, which is more efficient than multipole-type algorithms for less than {approx} 10{sup 6} particles. Because the interaction time is so short, it is necessary to suppress the diffusive aspect of the ion dynamics through the careful use of positrons in the simulations, and to run 100's of simulations with the same physical parameters but with different ''seeds'' for the particle loading. VORPAL can now be used to simulate other electron cooling facilities around the world, and it is also suitable for other accelerator modeling applications of direct interest to the Department of Energy. For example: (a) the Boersch effect in transport of strongly-magnetized electron beams for electron cooling sections, (b) the intrabeam scattering (IBS) effect in heavy ion accelerators, (c) the formation of crystalline beams and (d) target physics for heavy-ion fusion (HIF).
Research in parallel computing
NASA Technical Reports Server (NTRS)
Ortega, James M.; Henderson, Charles
1994-01-01
This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
NASA Technical Reports Server (NTRS)
Bruno, John
1984-01-01
The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.
DSMC Simulations Assessing the ES-BGK Kinetic Model for Gas-Phase Transport between Parallel Walls
NASA Astrophysics Data System (ADS)
Gallis, M. A.; Torczynski, J. R.
2011-11-01
Bird's Direct Simulation Monte Carlo (DSMC) method is used to simulate gas-phase diffusive transport at near-continuum conditions. The molecules collide using either the Boltzmann collision term or the ellipsoidal-statistical Bhatnagar-Gross-Krook (ES-BGK) kinetic model. Momentum, heat, and mass transport between parallel walls (i.e., Couette, Fourier, and Fickian flows) are investigated. The ES-BGK model produces values of the viscosity and the thermal conductivity outside the Knudsen layers that agree closely with the corresponding values from the Boltzmann collision term (also implemented in DSMC). However, the ES-BGK model produces less accurate values for the mass self-diffusivity, with a modest difference for the Maxwell interaction but a large difference for the hard-sphere interaction. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Rodgers, A; Matzel, E; Pasyanos, M; Petersson, A; Sjogreen, B; Bono, C; Vorobiev, O; Antoun, T; Walter, W; Myers, S; Lomov, I
2008-07-07
The development of accurate numerical methods to simulate wave propagation in three-dimensional (3D) earth models and advances in computational power offer exciting possibilities for modeling the motions excited by underground nuclear explosions. This presentation will describe recent work to use new numerical techniques and parallel computing to model earthquakes and underground explosions to improve understanding of the wave excitation at the source and path-propagation effects. Firstly, we are using the spectral element method (SEM, SPECFEM3D code of Komatitsch and Tromp, 2002) to model earthquakes and explosions at regional distances using available 3D models. SPECFEM3D simulates anelastic wave propagation in fully 3D earth models in spherical geometry with the ability to account for free surface topography, anisotropy, ellipticity, rotation and gravity. Results show in many cases that 3D models are able to reproduce features of the observed seismograms that arise from path-propagation effects (e.g. enhanced surface wave dispersion, refraction, amplitude variations from focusing and defocusing, tangential component energy from isotropic sources). We are currently investigating the ability of different 3D models to predict path-specific seismograms as a function of frequency. A number of models developed using a variety of methodologies are available for testing. These include the WENA/Unified model of Eurasia (e.g. Pasyanos et al 2004), the global CUB 2.0 model (Shapiro and Ritzwoller, 2002), the partitioned waveform model for the Mediterranean (van der Lee et al., 2007) and stochastic models of the Yellow Sea Korean Peninsula region (Pasyanos et al., 2006). Secondly, we are extending our Cartesian anelastic finite difference code (WPP of Nilsson et al., 2007) to model the effects of free-surface topography. WPP models anelastic wave propagation in fully 3D earth models using mesh refinement to increase computational speed and improve memory efficiency. Thirdly, we are modeling non-linear near-source shock wave propagation with GEODYN, an Eulerian Godunov finite-difference code (Antoun et al., 2001). This code accounts for shock wave propagation and a variety of effects including cavity formation, rock fracture and plastic deformation. We are exploring the coupling of GEODYN to WPP to propagate motions from the near-source (non-linear) region to the (linear anelastic) region where seismic observations are made at local, regional and teleseismic distances. This effort has just begun and we show preliminary results in this paper (with more to follow in our poster). These simulation tools are supported by massively parallel computers operated by Livermore Computing.
PERSPECTIVES ON THE EVOLUTION OF SIMULATION RICHARD E. NANCE
. A third objective is acquisition and system acceptance, where the simulation model is intended to answer on discrete event modeling for systems analysis is dominant as it has been during the evolution, sponsored primarily by the U.S. Air Force over some 25 years, focused on modeling and simulation. REN wrote
Fialho, Andre S.
Background: Recent reforms in Portugal aimed at strengthening the role of the primary care system, in order to improve the quality of the health care system. Since 2006 new policies aiming to change the organization, ...
Final Report for "Simulation Tools for Parallel Microwave Particle in Cell Modeling"
Peter H Stoltz
2008-09-25
Transport of high-power rf fields and the subsequent deposition of rf power into plasma is an important component of developing tokamak fusion energy. Two limitations on rf heating are: (i) breakdown of the metallic structures used to deliver rf power to the plasma, and (ii) a detailed understanding of how rf power couples into a plasma. Computer simulation is a main tool for helping solve both of these problems, but one of the premier tools, VORPAL, is traditionally too difficult to use for non-experts. During this Phase II project, we developed the VorpalView user interface tool. This tool allows Department of Energy researchers a fully graphical interface for analyzing VORPAL output to more easily model rf power delivery and deposition in plasmas.
A comparison between parallelization approaches in molecular dynamics simulations on GPUs.
Rovigatti, Lorenzo; Sulc, Petr; Reguly, István Z; Romano, Flavio
2015-01-01
We test the relative performances of two different approaches to the computation of forces for molecular dynamics simulations on graphics processing units. A "vertex-based" approach, where a computing thread is started per particle, is compared to an "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former is more efficient for systems with many simple interactions per particle while the latter is more efficient if the system has more complicated interactions or fewer of them. By comparing computation times on more and less recent graphics processing unit technology, we predict that, if the current trend of increasing the number of processing cores--as opposed to their computing power--remains, the "edge-based" approach will gradually become the most efficient choice in an increasing number of cases. PMID:25355527
A Moving Window Technique in Parallel Finite Element Time Domain Electromagnetic Simulation
Lee, Lie-Quan; Candel, Arno; Ng, Cho; Ko, Kwok; ,
2010-06-07
A moving window technique for the finite element time domain (FETD) method is developed to simulate the propagation of electromagnetic waves induced by the transit of a charged particle beam inside large and long structures. The window moving along with the beam in the computational domain adopts high-order finite-element basis functions through p refinement and/or a high-resolution mesh through h refinement so that a sufficient accuracy is attained with substantially reduced computational costs. Algorithms to transfer discretized fields from one mesh to another, which are the key to implementing a moving window in a finite-element unstructured mesh, are presented. Numerical experiments are carried out using the moving window technique to compute short-range wakefields in long accelerator structures. The results are compared with those obtained from the normal FETD method and the advantages of using the moving window technique are discussed.
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-01-01
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412
MASON: A Java Multi-Agent Simulation Library Sean Luke, Gabriel Catalin Balan, and Liviu Panait
Luke, Sean
MASON: A Java Multi-Agent Simulation Library Sean Luke, Gabriel Catalin Balan, and Liviu Panait George Mason University http://www.cs.gmu.edu/eclab We present MASON, a new multiagent simulation library written for Java. MASON is a general-purpose, single-process, discrete-event simulation library in- tended
A New Paradigm for Optimizing Hybrid Simulations of Rare Event Modeling for Complex Systems
Cook, Jeanine
A New Paradigm for Optimizing Hybrid Simulations of Rare Event Modeling for Complex Systems Paola, these systems have been modeled and simulated using either purely phenomenological models or discrete occurring discrete events. Traditionally, these systems have been modeled and simulated using either purely
NASA Astrophysics Data System (ADS)
English, Niall J.
2014-12-01
Ice crystallisation and melting was studied via massively parallel molecular dynamics under periodic boundary conditions, using approximately spherical ice nano-particles (both "isolated" and as a series of heterogeneous "seeds") of varying size, surrounded by liquid water and at a variety of temperatures. These studies were performed for a series of systems ranging in size from ˜1 × 106 to 8.6 × 106 molecules, in order to establish system-size effects upon the nano-clusters" crystallisation and dissociation kinetics. Both "traditional" four-site and "single-site" and water models were used, with and without formal point charges, dipoles, and electrostatics, respectively. Simulations were carried out in the microcanonical and isothermal-isobaric ensembles, to assess the influence of "artificial" thermo- and baro-statting, and important disparities were observed, which declined upon using larger systems. It was found that there was a dependence upon system size for both ice growth and dissociation, in that larger systems favoured slower growth and more rapid melting, given the lower extent of "communication" of ice nano-crystallites with their periodic replicae in neighbouring boxes. Although the single-site model exhibited less variation with system size vis-à-vis the multiple-site representation with explicit electrostatics, its crystallisation-dissociation kinetics was artificially fast.
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations
Osei-Kuffuor, Daniel; Fattebert, Jean-Luc
2014-01-01
Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N^{3}) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix, based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao; Winterfeld, Philip H.; Zhang, Keni; Wu, Yu-Shu
2013-12-01
TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporated into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.
NASA Astrophysics Data System (ADS)
Maronga, B.; Gryschka, M.; Heinze, R.; Hoffmann, F.; Kanani-Sühring, F.; Keck, M.; Ketelsen, K.; Letzel, M. O.; Sühring, M.; Raasch, S.
2015-08-01
In this paper we present the current version of the Parallelized Large-Eddy Simulation Model (PALM) whose core has been developed at the Institute of Meteorology and Climatology at Leibniz Universität Hannover (Germany). PALM is a Fortran 95-based code with some Fortran 2003 extensions and has been applied for the simulation of a variety of atmospheric and oceanic boundary layers for more than 15 years. PALM is optimized for use on massively parallel computer architectures and was recently ported to general-purpose graphics processing units. In the present paper we give a detailed description of the current version of the model and its features, such as an embedded Lagrangian cloud model and the possibility to use Cartesian topography. Moreover, we discuss recent model developments and future perspectives for LES applications.
NASA Astrophysics Data System (ADS)
Maronga, B.; Gryschka, M.; Heinze, R.; Hoffmann, F.; Kanani-Sühring, F.; Keck, M.; Ketelsen, K.; Letzel, M. O.; Sühring, M.; Raasch, S.
2015-02-01
In this paper we present the current version of the Parallelized Large-Eddy Simulation Model (PALM) whose core has been developed at the Institute of Meteorology and Climatology at Leibniz Universität Hannover (Germany). PALM is a Fortran 95-based code with some Fortran 2003 extensions and has been applied for the simulation of a variety of atmospheric and oceanic boundary layers for more than 15 years. PALM is optimized for use on massively parallel computer architectures and was recently ported to general-purpose graphics processing units. In the present paper we give a detailed description of the current version of the model and its features, such as an embedded Lagrangian cloud model and the possibility to use Cartesian topography. Moreover, we discuss recent model developments and future perspectives for LES applications.
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost. PMID:23877155
Tinkertoy Parallel Programming: Complicated Applications
Plimpton, Steve
parallel algorithms for particle modeling, crash simulations and transferring data between two independent of activity within the parallel computing community. A key \\Lambda This work was funded by the Applied
NASA Astrophysics Data System (ADS)
Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.
2005-03-01
The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.
Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.
2005-03-15
The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.
GENERAL TECHNICAL REPORT PSW-GTR-245 Simulation Analysis of a Wildfire
GENERAL TECHNICAL REPORT PSW-GTR-245 36 Simulation Analysis of a Wildfire Suppression System1 are unusually high in the Portuguese wildfire management system, representing a high burden on suppression a discrete- event simulation model of a wildfire suppression system, designed to analyze the joint impact
DEVS-FIRE: Towards an Integrated Simulation Environment for Surface Wildfire Spread and
Ntaimo, Lewis
DEVS-FIRE: Towards an Integrated Simulation Environment for Surface Wildfire Spread and Containment to the complexity of fire behavior. In this paper, the authors present an integrated simulation environment for surface wildfire spread and containment called DEVS-FIRE. DEVS-FIRE is based on the discrete event system
NASA Astrophysics Data System (ADS)
Hao, Yufei; Lu, Quanming; Lembege, Bertrand; Huang, Can; Wu, Mingyu; Guo, Fan; Shan, Lican; Zheng, Jian; Wang, Shui
2015-04-01
Experimental observations from space missions (including Cluster more recently) have clearly revealed the existence of high speed jets (HSJ) in the downstream region of the quasi-parallel terrestrial bow shock. Presently, two-dimensional (2-D) hybrid simulations are performed to reproduce and investigate the formation of such HSJ through a rippled quasi-parallel shock front. The simulation results show (i) that such shock fronts are strongly nonstationary (self reformation) along the shock normal, and (ii) that ripples are evidenced along the shock front as the upstream ULF waves (excited by interaction between incoming and reflected ions) are convected back to the front by the solar wind and contribute to the rippling formation. Then, these ripples are inherent structures of a quasi-parallel shock and the self reformation of the shock is not synchronous along the surface of the shock front. As a consequence, new incoming solar wind ions interact differently at different locations along the shock surface, and some can be only deflected (instead of being decelerated) at locations where ripples are large enough to play the role of local « secondary » shock. Therefore, the ion bulk velocity is also different locally after ions are transmitted dowstream, and local high-speed jets patterns are formed somewhere downstream. After a short reminder of main quasi-parallel shock features, this presentation will focus (i) on experimental observations of HSJ, (ii) on our preliminary simulation results obtained on HSJ, (iii) on their relationship with local bursty patterns of (turbulent) magnetic field evidenced at the front, and (iv) on the spatial and time scales of HSJ to be compared later on with experimental observations. Such downstream HSJ are shown to be generated by the nonstationary shock front itself and do not require any upstream perturbations (such as tangential/rotational discontinuity, HFA, etc..) to be convected by the solar wind and to interact with the shock front before penetrating downstream.
NASA Astrophysics Data System (ADS)
Lee, Young-Jin; Kim, Dae-Hong; Kim, Hee-Joung
2014-04-01
In nuclear medicine, the use of a pixelated semiconductor detector with cadmium telluride (CdTe) or cadmium zinc telluride (CdZnTe) is of growing interest for new devices. Especially, the spatial resolution can be improved by using a pixelated parallel-hole collimator with equal holes and pixel sizes based on the above-mentioned detector. High-absorption and high-stopping-power pixelated parallel-hole collimator materials are often chosen because of their good spatial resolution. Capturing more gamma rays, however, may result in decreased sensitivity with the same collimator geometric designs. Therefore, a trade-off between spatial resolution and sensitivity is very important in nuclear medicine imaging. The purpose of this study was to compare spatial resolutions using a pixelated semiconductor single photon emission computed tomography (SPECT) system with lead, tungsten, gold, and depleted uranium pixelated parallel-hole collimators at equal sensitivity. We performed a simulation study of the PID 350 (Ajat Oy Ltd., Finland) CdTe pixelated semiconductor detector (pixel size: 0.35 × 0.35 mm2) by using a Geant4 Application for Tomographic Emission (GATE) simulation. Spatial resolutions were measured with different collimator materials at equivalent sensitivities. Additionally, hot-rod phantom images were acquired for each source-to-collimator distance by using a GATE simulation. At equivalent sensitivities, measured averages of the full width at half maximum (FWHM) using lead, tungsten, and gold were 4.32, 2.93, and 2.23% higher than that of depleted uranium, respectively. Furthermore, for the full width at tenth maximum (FWTM), measured averages when using lead, tungsten, and gold were 6.29, 4.10, and 2.65% higher than that of depleted uranium, respectively. Although, the spatial resolution showed little differences among the different pixelated parallel-hole collimator materials, lower absorption and stopping power materials such as lead and tungsten had relatively better characteristics at specific sensitivities.
The Parallel System for Integrating Impact Models and Sectors (pSIMS)
a framework for massively parallel simulations of climate impact models in agriculture and forestry and Subject Descriptors I.6.8 [Simulation and Modeling]: Parallel parallel simulation of climate Climate change Impacts, Adaptation, and Vulnerabilities (VIA); Parallel Computing; Data processing
NASA Astrophysics Data System (ADS)
Lee, Young-Jin; Kim, Dae-Hong; Rhee, Yong-Chun; Kim, Hee-Joung
2014-06-01
Recently, many studies have investigated the use of a pixelated semiconductor detector to improve spatial resolution. The purpose of this study was to evaluate novel parallel-hole collimator geometric designs with a CdTe pixelated semiconductor single-photon-emission computed tomography (SPECT) system. The pixelated semiconductor detector was modeled as a PID 350 detector (Ajat Oy Ltd., Finland) with small pixels (0.35 × 0.35 mm2) by using Geant4 Application for Tomographic Emission (GATE) software. We designed a novel parallel-hole collimator consisting of two overlapping parallel-hole collimators. Each hole size was four times that of the pixelated parallel-hole collimator. The overlap ratios of these collimators were 1:1, 1:2, 2:1, 1:3, 3:1, 1:4, and 4:1. To evaluate the performance of this system, we evaluated the sensitivity and the spatial resolution. The results for our new parallel-hole collimator indicated that the evaluated sensitivity averages using overlap ratios of 1:1, 1:2, 2:1, 1:3, 3:1, 1:4, and 4:1 were 4.45, 7.56, 7.51, 12.76, 12.65, 20.01, and 19.90 times higher, respectively, than those of the pixelated parallel-hole collimator. The evaluated averages of the spatial resolution varied depending on the source-to-collimator distances. In conclusion, we successfully designed a novel parallel-hole collimator with various overlap ratios of the collimator septal heights with a CdTe pixelated semiconductor SPECT system. Based on our results, we recommend using this collimator with a CdTe pixelated semiconductor SPECT system.
Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten
2008-02-15
TMVOC-MP is a massively parallel version of the TMVOC code (Pruess and Battistelli, 2002), a numerical simulator for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively parallel computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted source remediation. With its sophisticated parallel computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL phases. Any combination of the three phases may present, and phases may appear and disappear in the course of a simulation. In addition, VOCs may be adsorbed by the porous medium, and may biodegrade according to a simple half-life model. Detailed discussion of physical processes, assumptions, and fluid properties used in TMVOC-MP can be found in the TMVOC user's guide (Pruess and Battistelli, 2002). TMVOC-MP was developed based on the parallel framework of the TOUGH2-MP code (Zhang et al. 2001, Wu et al. 2002). It uses the MPI (Message Passing Forum, 1994) for parallel implementation. A domain decomposition approach is adopted for the parallelization. The code partitions a simulation domain, defined by an unstructured grid, using partitioning algorithm from the METIS software package (Karypsis and Kumar, 1998). In parallel simulation, each processor is in charge of one part of the simulation domain for assembling mass and energy balance equations, solving linear equation systems, updating thermophysical properties, and performing other local computations. The local linear-equation systems are solved in parallel by multiple processors with the Aztec linear solver package (Tuminaro et al., 1999). Although each processor solves the linearized equations of subdomains independently, the entire linear equation system is solved together by all processors collaboratively via communication between neighboring processors during each iteration. Detailed discussion of the prototype of the data-exchange scheme can be found in Elmroth et al. (2001). In addition, FORTRAN 90 features are introduced to TMVOC-MP, such as dynamic memory allocation, array operation, matrix manipulation, and replacing 'common blocks' (used in the original TMVOC) with modules. All new subroutines are written in FORTRAN 90. Program units imported from the original TMVOC remain in standard FORTRAN 77. This report provides a quick starting guide for using the TMVOC-MP program. We suppose that the users have basic knowledge of using the original TMVOC code. The users can find the detailed technical descrip
NASA Astrophysics Data System (ADS)
Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.
2009-12-01
One approach for immobilizing subsurface metal contaminants involves stimulating the in situ production of mineral phases that sequester or isolate contaminants. One example is using calcium carbonate to immobilize strontium. The success of such approaches depends on understanding how various processes of flow, transport, reaction and resulting porosity-permeability change couple in subsurface systems. Reactive transport models are often used for such purpose. Current subsurface reactive transport simulators typically involve a de-coupled solution approach, such as operator-splitting, that solves the transport equations for components and batch chemistry sequentially, which has limited applicability for many biogeochemical processes with fast kinetics and strong medium property-reaction interactions. A massively parallel, fully coupled, fully implicit reactive transport simulator has been developed based on a parallel multi-physics object oriented software environment computing framework (MOOSE) developed at the Idaho National Laboratory. Within this simulator, the system of transport and reaction equations is solved simultaneously in a fully coupled manner using the Jacobian Free Newton-Krylov (JFNK) method with preconditioning. The simulator was applied to model reactive transport in a one-dimensional column where conditions that favor calcium carbonate precipitation are generated by urea hydrolysis that is catalyzed by urease enzyme. Simulation results are compared to both laboratory column experiments and those obtained using the reactive transport simulator STOMP in terms of: the spatial and temporal distributions of precipitates and reaction rates and other major species in the reaction system; the changes in porosity and permeability; and the computing efficiency based on wall clock simulation time.
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
NASA Astrophysics Data System (ADS)
Lee, Y.-J.; Park, S.-J.; Lee, S.-W.; Kim, D.-H.; Kim, Y.-S.; Jo, B.-D.; Kim, H.-J.
2013-03-01
It is recommended that a pixelated parallel-hole collimator in which the hole and pixel sizes are equal be used to improve the sensitivity and spatial resolution when using a small pixel size and a single-photon emission computed tomography (SPECT) system with pixelated semiconductor detector materials (e.g., CdTe and CZT). However, some significant problems arise in the manufacturing of a pixelated parallel-hole collimator. Therefore, we sought to simulate a pixelated semiconductor SPECT system with various collimator geometric designs. The purpose of this study was to compare the quality of images generated with a pixelated semiconductor SPECT system simulated with pixelated parallel-hole collimators of various geometric designs. The sensitivity and spatial resolution of the various collimator geometric designs with varying septal heights and hole sizes were measured. Moreover, to evaluate the overall performance of the imaging system, a hot-rod phantom was designed using a Monte Carlo simulation. According to the results, the average sensitivity using a 15 mm septal height was 1.80, 2.87, and 4.16 times higher than that obtained with septal heights of 20, 25, and 30 mm, respectively. Also, the average spatial resolution using the 30 mm septal height was 44.33, 22.08, and 9.26% better than that attained with 15, 20, and 25 mm septal heights, respectively. When the results acquired with 0.3 and 0.6 mm hole sizes were compared, the average sensitivity with the 0.6 mm hole size was 3.97 times higher than that obtained with the 0.3 mm hole size, and the average spatial resolution with the 0.3 mm hole size was 45.76% better than that with the 0.6 mm hole size. We have presented the pixelated parallel-hole collimators of various collimator geometric designs and evaluations. Our results showed that the effect of various collimator geometric designs can be investigated by Monte Carlo simulation so as to evaluate the feasibility of a high resolution parallel-hole collimator with a CdTe pixelated semiconductor SPECT system.
Cai, Y.; Navon, I.M.
1995-11-01
In this paper, the authors report their work on applying Krylov iterative methods, accelerated by parallelizable domain-decomposed (DD) preconditioners, to the solution of nonsymmetric linear algebraic equations arising from implicit time discretization of a finite element model of the shallow water equations on a limited-area domain. Two types of previously proposed DD preconditioners are employed and a novel one is advocated to accelerate, with post-preconditioning, the convergence of three popular and competitive Krylov iterative linear solvers. Performance sensitivities of these preconditioners to inexact subdomain solvers are also reported. Autotasking, the parallel processing capability representing the third phase of multitasking libraries on CRAY Y-MP, has been exploited and successfully applied to both loop and subroutine level parallelization. Satisfactory speedup results were obtained. On the other hand, automatic loop-level parallelization, made possible by the autotasking preprocessor, attained only a speedup smaller than a factor of two. 39 refs., 2 figs., 6 tabs.
Parallel processor engine model program
NASA Technical Reports Server (NTRS)
Mclaughlin, P.
1984-01-01
The Parallel Processor Engine Model Program is a generalized engineering tool intended to aid in the design of parallel processing real-time simulations of turbofan engines. It is written in the FORTRAN programming language and executes as a subset of the SOAPP simulation system. Input/output and execution control are provided by SOAPP; however, the analysis, emulation and simulation functions are completely self-contained. A framework in which a wide variety of parallel processing architectures could be evaluated and tools with which the parallel implementation of a real-time simulation technique could be assessed are provided.
NASA Astrophysics Data System (ADS)
He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.
2015-10-01
The open-source scientific software packages OpenGeoSys and IPhreeqc have been coupled to set up and simulate thermo-hydro-mechanical-chemical coupled processes with simultaneous consideration of aqueous geochemical reactions faster and easier on high-performance computers. In combination with the elaborated and extendable chemical database of IPhreeqc, it will be possible to set up a wide range of multiphysics problems with numerous chemical reactions that are known to influence water quality in porous and fractured media. A flexible parallelization scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand and the underlying processes such as for groundwater flow or solute transport on the other. This technical paper presents the implementation, verification, and parallelization scheme of the coupling interface, and discusses its performance and precision.
NASA Astrophysics Data System (ADS)
Hur, Min Young; Lee, Ho-Jun; Lee, Hae June; Choe, Won Ho; Seon, Jong Ho
2013-09-01
Oscillations of the plasma potential have been observed in many Hall thruster experiments. It was estimated that the oscillations are triggered by the interaction between the plasma and the dielectric materials such as secondary electron emission, but detailed mechanism has not been proven. In this paper, the effects of the interaction between the plasma and dielectric material are simulated with a two-dimensional particle-in-cell (PIC) code for the acceleration channel of the hall thruster. Especially, the simulation code is parallelized using graphic processing units (GPUs). To analyze the effect, the simulation is confirmed to change following two parameters, magnetic flux density and secondary electron emission coefficient (SEEC). The particle trajectory is presented with the variation of the SEEC and magnetic flux density as well as its curvature. This research is supported by a ``Core technology development of high Isp electric propulsion system for space exploration'' from National Space Lab. sponsored by the National Reshearch Foundation of korea (NRF).