Science.gov

Sample records for parallel discrete-event simulation

  1. Running Parallel Discrete Event Simulators on Sierra

    SciTech Connect

    Barnes, P. D.; Jefferson, D. R.

    2015-12-03

    In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.

  2. An adaptive synchronization protocol for parallel discrete event simulation

    SciTech Connect

    Bisset, K.R.

    1998-12-01

    Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.

  3. Parallel discrete-event simulation of FCFS stochastic queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.

  4. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  5. The cost of conservative synchronization in parallel discrete event simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.

  6. Synchronous parallel system for emulation and discrete event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor)

    1992-01-01

    A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

  7. Model for the evolution of the time profile in optimistic parallel discrete event simulations

    NASA Astrophysics Data System (ADS)

    Ziganurova, L.; Novotny, M. A.; Shchur, L. N.

    2016-02-01

    We investigate synchronisation aspects of an optimistic algorithm for parallel discrete event simulations (PDES). We present a model for the time evolution in optimistic PDES. This model evaluates the local virtual time profile of the processing elements. We argue that the evolution of the time profile is reminiscent of the surface profile in the directed percolation problem and in unrestricted surface growth. We present results of the simulation of the model and emphasise predictive features of our approach.

  8. SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeff S.

    1992-01-01

    Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

  9. Small-World Synchronized Computing Networks for Scalable Parallel Discrete-Event Simulations

    NASA Astrophysics Data System (ADS)

    Guclu, Hasan; Korniss, Gyorgy; Toroczkai, Zoltan; Novotny, Mark A.

    We study the scalability of parallel discrete-event simulations for arbitrary short-range interacting systems with asynchronous dynamics. When the synchronization topology mimics that of the short-range interacting underlying system, the virtual time horizon (corresponding to the progress of the processing elements) exhibits Kardar-Parisi-Zhang-like kinetic roughening. Although the virtual times, on average, progress at a nonzero rate, their statistical spread diverges with the number of processing elements, hindering efficient data collection. We show that when the synchronization topology is extended to include quenched random communication links between the processing elements, they make a close-to-uniform progress with a nonzero rate, without global synchronization. We discuss in detail a coarse-grained description for the small-world synchronized virtual time horizon and compare the findings to those obtained by simulating the simulations based on the exact algorithmic rules.

  10. Application of Parallel Discrete Event Simulation to the Space Surveillance Network

    NASA Astrophysics Data System (ADS)

    Jefferson, D.; Leek, J.

    2010-09-01

    In this paper we describe how and why we chose parallel discrete event simulation (PDES) as the paradigm for modeling the Space Surveillance Network (SSN) in our modeling framework, TESSA (Testbed Environment for Space Situational Awareness). DES is a simulation paradigm appropriate for systems dominated by discontinuous state changes at times that must be calculated dynamically. It is used primarily for complex man-made systems like telecommunications, vehicular traffic, computer networks, economic models etc., although it is also useful for natural systems that are not described by equations, such as particle systems, population dynamics, epidemics, and combat models. It is much less well known than simple time-stepped simulation methods, but has the great advantage of being time scale independent, so that one can freely mix processes that operate at time scales over many orders of magnitude with no runtime performance penalty. In simulating the SSN we model in some detail: (a) the orbital dynamics of up to 105 objects, (b) their reflective properties, (c) the ground- and space-based sensor systems in the SSN, (d) the recognition of orbiting objects and determination of their orbits, (e) the cueing and scheduling of sensor observations, (f) the 3-d structure of satellites, and (g) the generation of collision debris. TESSA is thus a mixed continuous-discrete model. But because many different types of discrete objects are involved with such a wide variation in time scale (milliseconds for collisions, hours for orbital periods) it is suitably described using discrete events. The PDES paradigm is surprising and unusual. In any instantaneous runtime snapshot some parts my be far ahead in simulation time while others lag behind, yet the required causal relationships are always maintained and synchronized correctly, exactly as if the simulation were executed sequentially. The TESSA simulator is custom-built, conservatively synchronized, and designed to scale to

  11. Explicit spatial scattering for load balancing in conservatively synchronized parallel discrete-event simulations

    SciTech Connect

    Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip

    2010-01-01

    We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that

  12. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.

  13. Synchronous Parallel Emulation and Discrete Event Simulation System with Self-Contained Simulation Objects and Active Event Objects

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor)

    1998-01-01

    The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.

  14. The IDES framework: A case study in development of a parallel discrete-event simulation system

    SciTech Connect

    Nicol, D.M.; Johnson, M.M.; Yoshimura, A.S.

    1997-12-31

    This tutorial describes considerations in the design and development of the IDES parallel simulation system. IDES is a Java-based parallel/distributed simulation system designed to support the study of complex large-scale enterprise systems. Using the IDES system as an example, the authors discuss how anticipated model and system constraints molded the design decisions with respect to modeling, synchronization, and communication strategies.

  15. A discrete event method for wave simulation

    SciTech Connect

    Nutaro, James J

    2006-01-01

    This article describes a discrete event interpretation of the finite difference time domain (FDTD) and digital wave guide network (DWN) wave simulation schemes. The discrete event method is formalized using the discrete event system specification (DEVS). The scheme is shown to have errors that are proportional to the resolution of the spatial grid. A numerical example demonstrates the relative efficiency of the scheme with respect to FDTD and DWN schemes. The potential for the discrete event scheme to reduce numerical dispersion and attenuation errors is discussed.

  16. Distributed discrete event simulation. Final report

    SciTech Connect

    De Vries, R.C.

    1988-02-01

    The presentation given here is restricted to discrete event simulation. The complexity of and time required for many present and potential discrete simulations exceeds the reasonable capacity of most present serial computers. The desire, then, is to implement the simulations on a parallel machine. However, certain problems arise in an effort to program the simulation on a parallel machine. In one category of methods deadlock care arise and some method is required to either detect deadlock and recover from it or to avoid deadlock through information passing. In the second category of methods, potentially incorrect simulations are allowed to proceed. If the situation is later determined to be incorrect, recovery from the error must be initiated. In either case, computation and information passing are required which would not be required in a serial implementation. The net effect is that the parallel simulation may not be much better than a serial simulation. In an effort to determine alternate approaches, important papers in the area were reviewed. As a part of that review process, each of the papers was summarized. The summary of each paper is presented in this report in the hopes that those doing future work in the area will be able to gain insight that might not otherwise be available, and to aid in deciding which papers would be most beneficial to pursue in more detail. The papers are broken down into categories and then by author. Conclusions reached after examining the papers and other material, such as direct talks with an author, are presented in the last section. Also presented there are some ideas that surfaced late in the research effort. These promise to be of some benefit in limiting information which must be passed between processes and in better understanding the structure of a distributed simulation. Pursuit of these ideas seems appropriate.

  17. Performance bounds on parallel self-initiating discrete-event

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    The use is considered of massively parallel architectures to execute discrete-event simulations of what is termed self-initiating models. A logical process in a self-initiating model schedules its own state re-evaluation times, independently of any other logical process, and sends its new state to other logical processes following the re-evaluation. The interest is in the effects of that communication on synchronization. The performance is considered of various synchronization protocols by deriving upper and lower bounds on optimal performance, upper bounds on Time Warp's performance, and lower bounds on the performance of a new conservative protocol. The analysis of Time Warp includes the overhead costs of state-saving and rollback. The analysis points out sufficient conditions for the conservative protocol to outperform Time Warp. The analysis also quantifies the sensitivity of performance to message fan-out, lookahead ability, and the probability distributions underlying the simulation.

  18. Discrete-Event Simulation in Chemical Engineering.

    ERIC Educational Resources Information Center

    Schultheisz, Daniel; Sommerfeld, Jude T.

    1988-01-01

    Gives examples, descriptions, and uses for various types of simulation systems, including the Flowtran, Process, Aspen Plus, Design II, GPSS, Simula, and Simscript. Explains similarities in simulators, terminology, and a batch chemical process. Tables and diagrams are included. (RT)

  19. Optimization of Operations Resources via Discrete Event Simulation Modeling

    NASA Technical Reports Server (NTRS)

    Joshi, B.; Morris, D.; White, N.; Unal, R.

    1996-01-01

    The resource levels required for operation and support of reusable launch vehicles are typically defined through discrete event simulation modeling. Minimizing these resources constitutes an optimization problem involving discrete variables and simulation. Conventional approaches to solve such optimization problems involving integer valued decision variables are the pattern search and statistical methods. However, in a simulation environment that is characterized by search spaces of unknown topology and stochastic measures, these optimization approaches often prove inadequate. In this paper, we have explored the applicability of genetic algorithms to the simulation domain. Genetic algorithms provide a robust search strategy that does not require continuity and differentiability of the problem domain. The genetic algorithm successfully minimized the operation and support activities for a space vehicle, through a discrete event simulation model. The practical issues associated with simulation optimization, such as stochastic variables and constraints, were also taken into consideration.

  20. On constructing optimistic simulation algorithms for the discrete event system specification

    SciTech Connect

    Nutaro, James J

    2008-01-01

    This article describes a Time Warp simulation algorithm for discrete event models that are described in terms of the Discrete Event System Specification (DEVS). The article shows how the total state transition and total output function of a DEVS atomic model can be transformed into an event processing procedure for a logical process. A specific Time Warp algorithm is constructed around this logical process, and it is shown that the algorithm correctly simulates a DEVS coupled model that consists entirely of interacting atomic models. The simulation algorithm is presented abstractly; it is intended to provide a basis for implementing efficient and scalable parallel algorithms that correctly simulate DEVS models.

  1. Reversible Discrete Event Formulation and Optimistic Parallel Execution of Vehicular Traffic Models

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2009-01-01

    Vehicular traffic simulations are useful in applications such as emergency planning and traffic management. High speed of traffic simulations translates to speed of response and level of resilience in those applications. Discrete event formulation of traffic flow at the level of individual vehicles affords both the flexibility of simulating complex scenarios of vehicular flow behavior as well as rapid simulation time advances. However, efficient parallel/distributed execution of the models becomes challenging due to synchronization overheads. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Our approach resolves the challenges that arise in parallel execution of microscopic, vehicular-level models of traffic. We apply a reverse computation-based optimistic execution approach to address the parallel synchronization problem. This is achieved by formulating a reversible version of a discrete event model of vehicular traffic, and by utilizing this reversible model in an optimistic execution setting. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed close to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance. The benefits of optimistic execution are demonstrated, including a speed up of nearly 20 on 32 processors observed on a vehicular network of over 65,000 intersections and over 13 million vehicles.

  2. Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

    SciTech Connect

    Perumalla, Kalyan S; Seal, Sudip K

    2010-01-01

    The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over the $\\mu$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.

  3. Reversible Parallel Discrete Event Formulation of a TLM-based Radio Signal Propagation Model

    SciTech Connect

    Seal, Sudip K; Perumalla, Kalyan S

    2011-01-01

    Radio signal strength estimation is essential in many applications, including the design of military radio communications and industrial wireless installations. For scenarios with large or richly- featured geographical volumes, parallel processing is required to meet the memory and computa- tion time demands. Here, we present a scalable and efficient parallel execution of the sequential model for radio signal propagation recently developed by Nutaro et al. Starting with that model, we (a) provide a vector-based reformulation that has significantly lower computational overhead for event handling, (b) develop a parallel decomposition approach that is amenable to reversibility with minimal computational overheads, (c) present a framework for transparently mapping the conservative time-stepped model into an optimistic parallel discrete event execution, (d) present a new reversible method, along with its analysis and implementation, for inverting the vector-based event model to be executed in an optimistic parallel style of execution, and (e) present performance results from implementation on Cray XT platforms. We demonstrate scalability, with the largest runs tested on up to 127,500 cores of a Cray XT5, enabling simulation of larger scenarios and with faster execution than reported before on the radio propagation model. This also represents the first successful demonstration of the ability to efficiently map a conservative time-stepped model to an optimistic discrete-event execution.

  4. Discrete-Event Simulation Models of Plasmodium falciparum Malaria

    PubMed Central

    McKenzie, F. Ellis; Wong, Roger C.; Bossert, William H.

    2008-01-01

    We develop discrete-event simulation models using a single “timeline” variable to represent the Plasmodium falciparum lifecycle in individual hosts and vectors within interacting host and vector populations. Where they are comparable our conclusions regarding the relative importance of vector mortality and the durations of host immunity and parasite development are congruent with those of classic differential-equation models of malaria, epidemiology. However, our results also imply that in regions with intense perennial transmission, the influence of mosquito mortality on malaria prevalence in humans may be rivaled by that of the duration of host infectivity. PMID:18668185

  5. Quality Improvement With Discrete Event Simulation: A Primer for Radiologists.

    PubMed

    Booker, Michael T; O'Connell, Ryan J; Desai, Bhushan; Duddalwar, Vinay A

    2016-04-01

    The application of simulation software in health care has transformed quality and process improvement. Specifically, software based on discrete-event simulation (DES) has shown the ability to improve radiology workflows and systems. Nevertheless, despite the successful application of DES in the medical literature, the power and value of simulation remains underutilized. For this reason, the basics of DES modeling are introduced, with specific attention to medical imaging. In an effort to provide readers with the tools necessary to begin their own DES analyses, the practical steps of choosing a software package and building a basic radiology model are discussed. In addition, three radiology system examples are presented, with accompanying DES models that assist in analysis and decision making. Through these simulations, we provide readers with an understanding of the theory, requirements, and benefits of implementing DES in their own radiology practices. PMID:26922594

  6. Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

    NASA Technical Reports Server (NTRS)

    Stocker, John C.; Golomb, Andrew M.

    2011-01-01

    Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.

  7. Enhancing Complex System Performance Using Discrete-Event Simulation

    SciTech Connect

    Allgood, Glenn O; Olama, Mohammed M; Lake, Joe E

    2010-01-01

    In this paper, we utilize discrete-event simulation (DES) merged with human factors analysis to provide the venue within which the separation and deconfliction of the system/human operating principles can occur. A concrete example is presented to illustrate the performance enhancement gains for an aviation cargo flow and security inspection system achieved through the development and use of a process DES. The overall performance of the system is computed, analyzed, and optimized for the different system dynamics. Various performance measures are considered such as system capacity, residual capacity, and total number of pallets waiting for inspection in the queue. These metrics are performance indicators of the system's ability to service current needs and respond to additional requests. We studied and analyzed different scenarios by changing various model parameters such as the number of pieces per pallet ratio, number of inspectors and cargo handling personnel, number of forklifts, number and types of detection systems, inspection modality distribution, alarm rate, and cargo closeout time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted performance measures identified effective ways to meet inspection requirements while maintaining or reducing overall operational cost and eliminating any shipping delays associated with any proposed changes in inspection requirements. With this understanding effective operational strategies can be developed to optimally use personnel while still maintaining plant efficiency, reducing process interruptions, and holding or reducing costs.

  8. Desktop Modeling and Simulation: Parsimonious, yet Effective Discrete-Event Simulation Analysis

    NASA Technical Reports Server (NTRS)

    Bradley, James R.

    2012-01-01

    This paper evaluates how quickly students can be trained to construct useful discrete-event simulation models using Excel The typical supply chain used by many large national retailers is described, and an Excel-based simulation model is constructed of it The set of programming and simulation skills required for development of that model are then determined we conclude that six hours of training are required to teach the skills to MBA students . The simulation presented here contains all fundamental functionallty of a simulation model, and so our result holds for any discrete-event simulation model. We argue therefore that Industry workers with the same technical skill set as students having completed one year in an MBA program can be quickly trained to construct simulation models. This result gives credence to the efficacy of Desktop Modeling and Simulation whereby simulation analyses can be quickly developed, run, and analyzed with widely available software, namely Excel.

  9. DISCRETE EVENT SIMULATION OF OPTICAL SWITCH MATRIX PERFORMANCE IN COMPUTER NETWORKS

    SciTech Connect

    Imam, Neena; Poole, Stephen W

    2013-01-01

    In this paper, we present application of a Discrete Event Simulator (DES) for performance modeling of optical switching devices in computer networks. Network simulators are valuable tools in situations where one cannot investigate the system directly. This situation may arise if the system under study does not exist yet or the cost of studying the system directly is prohibitive. Most available network simulators are based on the paradigm of discrete-event-based simulation. As computer networks become increasingly larger and more complex, sophisticated DES tool chains have become available for both commercial and academic research. Some well-known simulators are NS2, NS3, OPNET, and OMNEST. For this research, we have applied OMNEST for the purpose of simulating multi-wavelength performance of optical switch matrices in computer interconnection networks. Our results suggest that the application of DES to computer interconnection networks provides valuable insight in device performance and aids in topology and system optimization.

  10. An extension of the OpenModelica compiler for using Modelica models in a discrete event simulation

    DOE PAGESBeta

    Nutaro, James

    2014-11-03

    In this article, a new back-end and run-time system is described for the OpenModelica compiler. This new back-end transforms a Modelica model into a module for the adevs discrete event simulation package, thereby extending adevs to encompass complex, hybrid dynamical systems. The new run-time system that has been built within the adevs simulation package supports models with state-events and time-events and that comprise differential-algebraic systems with high index. Finally, although the procedure for effecting this transformation is based on adevs and the Discrete Event System Specification, it can be adapted to any discrete event simulation package.

  11. Using Discrete Event Simulation to predict KPI's at a Projected Emergency Room.

    PubMed

    Concha, Pablo; Neriz, Liliana; Parada, Danilo; Ramis, Francisco

    2015-01-01

    Discrete Event Simulation (DES) is a powerful factor in the design of clinical facilities. DES enables facilities to be built or adapted to achieve the expected Key Performance Indicators (KPI's) such as average waiting times according to acuity, average stay times and others. Our computational model was built and validated using expert judgment and supporting statistical data. One scenario studied resulted in a 50% decrease in the average cycle time of patients compared to the original model, mainly by modifying the patient's attention model. PMID:26262262

  12. Tutorial in medical decision modeling incorporating waiting lines and queues using discrete event simulation.

    PubMed

    Jahn, Beate; Theurl, Engelbert; Siebert, Uwe; Pfeiffer, Karl-Peter

    2010-01-01

    In most decision-analytic models in health care, it is assumed that there is treatment without delay and availability of all required resources. Therefore, waiting times caused by limited resources and their impact on treatment effects and costs often remain unconsidered. Queuing theory enables mathematical analysis and the derivation of several performance measures of queuing systems. Nevertheless, an analytical approach with closed formulas is not always possible. Therefore, simulation techniques are used to evaluate systems that include queuing or waiting, for example, discrete event simulation. To include queuing in decision-analytic models requires a basic knowledge of queuing theory and of the underlying interrelationships. This tutorial introduces queuing theory. Analysts and decision-makers get an understanding of queue characteristics, modeling features, and its strength. Conceptual issues are covered, but the emphasis is on practical issues like modeling the arrival of patients. The treatment of coronary artery disease with percutaneous coronary intervention including stent placement serves as an illustrative queuing example. Discrete event simulation is applied to explicitly model resource capacities, to incorporate waiting lines and queues in the decision-analytic modeling example. PMID:20345550

  13. A Framework for the Optimization of Discrete-Event Simulation Models

    NASA Technical Reports Server (NTRS)

    Joshi, B. D.; Unal, R.; White, N. H.; Morris, W. D.

    1996-01-01

    With the growing use of computer modeling and simulation, in all aspects of engineering, the scope of traditional optimization has to be extended to include simulation models. Some unique aspects have to be addressed while optimizing via stochastic simulation models. The optimization procedure has to explicitly account for the randomness inherent in the stochastic measures predicted by the model. This paper outlines a general purpose framework for optimization of terminating discrete-event simulation models. The methodology combines a chance constraint approach for problem formulation, together with standard statistical estimation and analyses techniques. The applicability of the optimization framework is illustrated by minimizing the operation and support resources of a launch vehicle, through a simulation model.

  14. DeMO: An Ontology for Discrete-event Modeling and Simulation

    PubMed Central

    Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

    2011-01-01

    Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

  15. CONFIG - Adapting qualitative modeling and discrete event simulation for design of fault management systems

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Basham, Bryan D.

    1989-01-01

    CONFIG is a modeling and simulation tool prototype for analyzing the normal and faulty qualitative behaviors of engineered systems. Qualitative modeling and discrete-event simulation have been adapted and integrated, to support early development, during system design, of software and procedures for management of failures, especially in diagnostic expert systems. Qualitative component models are defined in terms of normal and faulty modes and processes, which are defined by invocation statements and effect statements with time delays. System models are constructed graphically by using instances of components and relations from object-oriented hierarchical model libraries. Extension and reuse of CONFIG models and analysis capabilities in hybrid rule- and model-based expert fault-management support systems are discussed.

  16. A conceptual modeling framework for discrete event simulation using hierarchical control structures

    PubMed Central

    Furian, N.; O’Sullivan, M.; Walker, C.; Vössner, S.; Neubacher, D.

    2015-01-01

    Conceptual Modeling (CM) is a fundamental step in a simulation project. Nevertheless, it is only recently that structured approaches towards the definition and formulation of conceptual models have gained importance in the Discrete Event Simulation (DES) community. As a consequence, frameworks and guidelines for applying CM to DES have emerged and discussion of CM for DES is increasing. However, both the organization of model-components and the identification of behavior and system control from standard CM approaches have shortcomings that limit CM’s applicability to DES. Therefore, we discuss the different aspects of previous CM frameworks and identify their limitations. Further, we present the Hierarchical Control Conceptual Modeling framework that pays more attention to the identification of a models’ system behavior, control policies and dispatching routines and their structured representation within a conceptual model. The framework guides the user step-by-step through the modeling process and is illustrated by a worked example. PMID:26778940

  17. Developing Flexible Discrete Event Simulation Models in an Uncertain Policy Environment

    NASA Technical Reports Server (NTRS)

    Miranda, David J.; Fayez, Sam; Steele, Martin J.

    2011-01-01

    On February 1st, 2010 U.S. President Barack Obama submitted to Congress his proposed budget request for Fiscal Year 2011. This budget included significant changes to the National Aeronautics and Space Administration (NASA), including the proposed cancellation of the Constellation Program. This change proved to be controversial and Congressional approval of the program's official cancellation would take many months to complete. During this same period an end-to-end discrete event simulation (DES) model of Constellation operations was being built through the joint efforts of Productivity Apex Inc. (PAl) and Science Applications International Corporation (SAIC) teams under the guidance of NASA. The uncertainty in regards to the Constellation program presented a major challenge to the DES team, as to: continue the development of this program-of-record simulation, while at the same time remain prepared for possible changes to the program. This required the team to rethink how it would develop it's model and make it flexible enough to support possible future vehicles while at the same time be specific enough to support the program-of-record. This challenge was compounded by the fact that this model was being developed through the traditional DES process-orientation which lacked the flexibility of object-oriented approaches. The team met this challenge through significant pre-planning that led to the "modularization" of the model's structure by identifying what was generic, finding natural logic break points, and the standardization of interlogic numbering system. The outcome of this work resulted in a model that not only was ready to be easily modified to support any future rocket programs, but also a model that was extremely structured and organized in a way that facilitated rapid verification. This paper discusses in detail the process the team followed to build this model and the many advantages this method provides builders of traditional process-oriented discrete

  18. Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling

    NASA Technical Reports Server (NTRS)

    Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.

    2010-01-01

    NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand

  19. The effects of indoor environmental exposures on pediatric asthma: a discrete event simulation model

    PubMed Central

    2012-01-01

    Background In the United States, asthma is the most common chronic disease of childhood across all socioeconomic classes and is the most frequent cause of hospitalization among children. Asthma exacerbations have been associated with exposure to residential indoor environmental stressors such as allergens and air pollutants as well as numerous additional factors. Simulation modeling is a valuable tool that can be used to evaluate interventions for complex multifactorial diseases such as asthma but in spite of its flexibility and applicability, modeling applications in either environmental exposures or asthma have been limited to date. Methods We designed a discrete event simulation model to study the effect of environmental factors on asthma exacerbations in school-age children living in low-income multi-family housing. Model outcomes include asthma symptoms, medication use, hospitalizations, and emergency room visits. Environmental factors were linked to percent predicted forced expiratory volume in 1 second (FEV1%), which in turn was linked to risk equations for each outcome. Exposures affecting FEV1% included indoor and outdoor sources of NO2 and PM2.5, cockroach allergen, and dampness as a proxy for mold. Results Model design parameters and equations are described in detail. We evaluated the model by simulating 50,000 children over 10 years and showed that pollutant concentrations and health outcome rates are comparable to values reported in the literature. In an application example, we simulated what would happen if the kitchen and bathroom exhaust fans were improved for the entire cohort, and showed reductions in pollutant concentrations and healthcare utilization rates. Conclusions We describe the design and evaluation of a discrete event simulation model of pediatric asthma for children living in low-income multi-family housing. Our model simulates the effect of environmental factors (combustion pollutants and allergens), medication compliance, seasonality

  20. Examining Passenger Flow Choke Points at Airports Using Discrete Event Simulation

    NASA Technical Reports Server (NTRS)

    Brown, Jeremy R.; Madhavan, Poomima

    2011-01-01

    The movement of passengers through an airport quickly, safely, and efficiently is the main function of the various checkpoints (check-in, security. etc) found in airports. Human error combined with other breakdowns in the complex system of the airport can disrupt passenger flow through the airport leading to lengthy waiting times, missing luggage and missed flights. In this paper we present a model of passenger flow through an airport using discrete event simulation that will provide a closer look into the possible reasons for breakdowns and their implications for passenger flow. The simulation is based on data collected at Norfolk International Airport (ORF). The primary goal of this simulation is to present ways to optimize the work force to keep passenger flow smooth even during peak travel times and for emergency preparedness at ORF in case of adverse events. In this simulation we ran three different scenarios: real world, increased check-in stations, and multiple waiting lines. Increased check-in stations increased waiting time and instantaneous utilization. while the multiple waiting lines decreased both the waiting time and instantaneous utilization. This simulation was able to show how different changes affected the passenger flow through the airport.

  1. Towards High Performance Discrete-Event Simulations of Smart Electric Grids

    SciTech Connect

    Perumalla, Kalyan S; Nutaro, James J; Yoginath, Srikanth B

    2011-01-01

    Future electric grid technology is envisioned on the notion of a smart grid in which responsive end-user devices play an integral part of the transmission and distribution control systems. Detailed simulation is often the primary choice in analyzing small network designs, and the only choice in analyzing large-scale electric network designs. Here, we identify and articulate the high-performance computing needs underlying high-resolution discrete event simulation of smart electric grid operation large network scenarios such as the entire Eastern Interconnect. We focus on the simulator's most computationally intensive operation, namely, the dynamic numerical solution for the electric grid state, for both time-integration as well as event-detection. We explore solution approaches using general-purpose dense and sparse solvers, and propose a scalable solver specialized for the sparse structures of actual electric networks. Based on experiments with an implementation in the THYME simulator, we identify performance issues and possible solution approaches for smart grid experimentation in the large.

  2. Discrete event simulation tool for analysis of qualitative models of continuous processing systems

    NASA Technical Reports Server (NTRS)

    Malin, Jane T. (Inventor); Basham, Bryan D. (Inventor); Harris, Richard A. (Inventor)

    1990-01-01

    An artificial intelligence design and qualitative modeling tool is disclosed for creating computer models and simulating continuous activities, functions, and/or behavior using developed discrete event techniques. Conveniently, the tool is organized in four modules: library design module, model construction module, simulation module, and experimentation and analysis. The library design module supports the building of library knowledge including component classes and elements pertinent to a particular domain of continuous activities, functions, and behavior being modeled. The continuous behavior is defined discretely with respect to invocation statements, effect statements, and time delays. The functionality of the components is defined in terms of variable cluster instances, independent processes, and modes, further defined in terms of mode transition processes and mode dependent processes. Model construction utilizes the hierarchy of libraries and connects them with appropriate relations. The simulation executes a specialized initialization routine and executes events in a manner that includes selective inherency of characteristics through a time and event schema until the event queue in the simulator is emptied. The experimentation and analysis module supports analysis through the generation of appropriate log files and graphics developments and includes the ability of log file comparisons.

  3. Discrete Event Simulation Models for CT Examination Queuing in West China Hospital

    PubMed Central

    Luo, Li; Tang, Shijun; Shi, Yingkang; Guo, Huili

    2016-01-01

    In CT examination, the emergency patients (EPs) have highest priorities in the queuing system and thus the general patients (GPs) have to wait for a long time. This leads to a low degree of satisfaction of the whole patients. The aim of this study is to improve the patients' satisfaction by designing new queuing strategies for CT examination. We divide the EPs into urgent type and emergency type and then design two queuing strategies: one is that the urgent patients (UPs) wedge into the GPs' queue with fixed interval (fixed priority model) and the other is that the patients have dynamic priorities for queuing (dynamic priority model). Based on the data from Radiology Information Database (RID) of West China Hospital (WCH), we develop some discrete event simulation models for CT examination according to the designed strategies. We compare the performance of different strategies on the basis of the simulation results. The strategy that patients have dynamic priorities for queuing makes the waiting time of GPs decrease by 13 minutes and the degree of satisfaction increase by 40.6%. We design a more reasonable CT examination queuing strategy to decrease patients' waiting time and increase their satisfaction degrees. PMID:27547237

  4. StratBAM: A Discrete-Event Simulation Model to Support Strategic Hospital Bed Capacity Decisions.

    PubMed

    Devapriya, Priyantha; Strömblad, Christopher T B; Bailey, Matthew D; Frazier, Seth; Bulger, John; Kemberling, Sharon T; Wood, Kenneth E

    2015-10-01

    The ability to accurately measure and assess current and potential health care system capacities is an issue of local and national significance. Recent joint statements by the Institute of Medicine and the Agency for Healthcare Research and Quality have emphasized the need to apply industrial and systems engineering principles to improving health care quality and patient safety outcomes. To address this need, a decision support tool was developed for planning and budgeting of current and future bed capacity, and evaluating potential process improvement efforts. The Strategic Bed Analysis Model (StratBAM) is a discrete-event simulation model created after a thorough analysis of patient flow and data from Geisinger Health System's (GHS) electronic health records. Key inputs include: timing, quantity and category of patient arrivals and discharges; unit-level length of care; patient paths; and projected patient volume and length of stay. Key outputs include: admission wait time by arrival source and receiving unit, and occupancy rates. Electronic health records were used to estimate parameters for probability distributions and to build empirical distributions for unit-level length of care and for patient paths. Validation of the simulation model against GHS operational data confirmed its ability to model real-world data consistently and accurately. StratBAM was successfully used to evaluate the system impact of forecasted patient volumes and length of stay in terms of patient wait times, occupancy rates, and cost. The model is generalizable and can be appropriately scaled for larger and smaller health care settings. PMID:26310949

  5. Discrete Event Simulation Models for CT Examination Queuing in West China Hospital.

    PubMed

    Luo, Li; Liu, Hangjiang; Liao, Huchang; Tang, Shijun; Shi, Yingkang; Guo, Huili

    2016-01-01

    In CT examination, the emergency patients (EPs) have highest priorities in the queuing system and thus the general patients (GPs) have to wait for a long time. This leads to a low degree of satisfaction of the whole patients. The aim of this study is to improve the patients' satisfaction by designing new queuing strategies for CT examination. We divide the EPs into urgent type and emergency type and then design two queuing strategies: one is that the urgent patients (UPs) wedge into the GPs' queue with fixed interval (fixed priority model) and the other is that the patients have dynamic priorities for queuing (dynamic priority model). Based on the data from Radiology Information Database (RID) of West China Hospital (WCH), we develop some discrete event simulation models for CT examination according to the designed strategies. We compare the performance of different strategies on the basis of the simulation results. The strategy that patients have dynamic priorities for queuing makes the waiting time of GPs decrease by 13 minutes and the degree of satisfaction increase by 40.6%. We design a more reasonable CT examination queuing strategy to decrease patients' waiting time and increase their satisfaction degrees. PMID:27547237

  6. Discrete event simulation for healthcare organizations: a tool for decision making.

    PubMed

    Hamrock, Eric; Paige, Kerrie; Parks, Jennifer; Scheulen, James; Levin, Scott

    2013-01-01

    Healthcare organizations face challenges in efficiently accommodating increased patient demand with limited resources and capacity. The modern reimbursement environment prioritizes the maximization of operational efficiency and the reduction of unnecessary costs (i.e., waste) while maintaining or improving quality. As healthcare organizations adapt, significant pressures are placed on leaders to make difficult operational and budgetary decisions. In lieu of hard data, decision makers often base these decisions on subjective information. Discrete event simulation (DES), a computerized method of imitating the operation of a real-world system (e.g., healthcare delivery facility) over time, can provide decision makers with an evidence-based tool to develop and objectively vet operational solutions prior to implementation. DES in healthcare commonly focuses on (1) improving patient flow, (2) managing bed capacity, (3) scheduling staff, (4) managing patient admission and scheduling procedures, and (5) using ancillary resources (e.g., labs, pharmacies). This article describes applicable scenarios, outlines DES concepts, and describes the steps required for development. An original DES model developed to examine crowding and patient flow for staffing decision making at an urban academic emergency department serves as a practical example. PMID:23650696

  7. Improving outpatient phlebotomy service efficiency and patient experience using discrete-event simulation.

    PubMed

    Yip, Kenneth; Pang, Suk-King; Chan, Kui-Tim; Chan, Chi-Kuen; Lee, Tsz-Leung

    2016-08-01

    Purpose - The purpose of this paper is to present a simulation modeling application to reconfigure the outpatient phlebotomy service of an acute regional and teaching hospital in Hong Kong, with an aim to improve service efficiency, shorten patient queuing time and enhance workforce utilization. Design/methodology/approach - The system was modeled as an inhomogeneous Poisson process and a discrete-event simulation model was developed to simulate the current setting, and to evaluate how various performance metrics would change if switched from a decentralized to a centralized model. Variations were then made to the model to test different workforce arrangements for the centralized service, so that managers could decide on the service's final configuration via an evidence-based and data-driven approach. Findings - This paper provides empirical insights about the relationship between staffing arrangement and system performance via a detailed scenario analysis. One particular staffing scenario was chosen by manages as it was considered to strike the best balance between performance and workforce scheduled. The resulting centralized phlebotomy service was successfully commissioned. Practical implications - This paper demonstrates how analytics could be used for operational planning at the hospital level. The authors show that a transparent and evidence-based scenario analysis, made available through analytics and simulation, greatly facilitates management and clinical stakeholders to arrive at the ideal service configuration. Originality/value - The authors provide a robust method in evaluating the relationship between workforce investment, queuing reduction and workforce utilization, which is crucial for managers when deciding the delivery model for any outpatient-related service. PMID:27477930

  8. Discrete-event simulation of a wide-area health care network.

    PubMed Central

    McDaniel, J G

    1995-01-01

    OBJECTIVE: Predict the behavior and estimate the telecommunication cost of a wide-area message store-and-forward network for health care providers that uses the telephone system. DESIGN: A tool with which to perform large-scale discrete-event simulations was developed. Network models for star and mesh topologies were constructed to analyze the differences in performances and telecommunication costs. The distribution of nodes in the network models approximates the distribution of physicians, hospitals, medical labs, and insurers in the Province of Saskatchewan, Canada. Modeling parameters were based on measurements taken from a prototype telephone network and a survey conducted at two medical clinics. Simulation studies were conducted for both topologies. RESULTS: For either topology, the telecommunication cost of a network in Saskatchewan is projected to be less than $100 (Canadian) per month per node. The estimated telecommunication cost of the star topology is approximately half that of the mesh. Simulations predict that a mean end-to-end message delivery time of two hours or less is achievable at this cost. A doubling of the data volume results in an increase of less than 50% in the mean end-to-end message transfer time. CONCLUSION: The simulation models provided an estimate of network performance and telecommunication cost in a specific Canadian province. At the expected operating point, network performance appeared to be relatively insensitive to increases in data volume. Similar results might be anticipated in other rural states and provinces in North America where a telephone-based network is desired. PMID:7583646

  9. Random vs. Combinatorial Methods for Discrete Event Simulation of a Grid Computer Network

    NASA Technical Reports Server (NTRS)

    Kuhn, D. Richard; Kacker, Raghu; Lei, Yu

    2010-01-01

    This study compared random and t-way combinatorial inputs of a network simulator, to determine if these two approaches produce significantly different deadlock detection for varying network configurations. Modeling deadlock detection is important for analyzing configuration changes that could inadvertently degrade network operations, or to determine modifications that could be made by attackers to deliberately induce deadlock. Discrete event simulation of a network may be conducted using random generation, of inputs. In this study, we compare random with combinatorial generation of inputs. Combinatorial (or t-way) testing requires every combination of any t parameter values to be covered by at least one test. Combinatorial methods can be highly effective because empirical data suggest that nearly all failures involve the interaction of a small number of parameters (1 to 6). Thus, for example, if all deadlocks involve at most 5-way interactions between n parameters, then exhaustive testing of all n-way interactions adds no additional information that would not be obtained by testing all 5-way interactions. While the maximum degree of interaction between parameters involved in the deadlocks clearly cannot be known in advance, covering all t-way interactions may be more efficient than using random generation of inputs. In this study we tested this hypothesis for t = 2, 3, and 4 for deadlock detection in a network simulation. Achieving the same degree of coverage provided by 4-way tests would have required approximately 3.2 times as many random tests; thus combinatorial methods were more efficient for detecting deadlocks involving a higher degree of interactions. The paper reviews explanations for these results and implications for modeling and simulation.

  10. Using Discrete Event Simulation for Programming Model Exploration at Extreme-Scale: Macroscale Components for the Structural Simulation Toolkit (SST).

    SciTech Connect

    Wilke, Jeremiah J; Kenny, Joseph P.

    2015-02-01

    Discrete event simulation provides a powerful mechanism for designing and testing new extreme- scale programming models for high-performance computing. Rather than debug, run, and wait for results on an actual system, design can first iterate through a simulator. This is particularly useful when test beds cannot be used, i.e. to explore hardware or scales that do not yet exist or are inaccessible. Here we detail the macroscale components of the structural simulation toolkit (SST). Instead of depending on trace replay or state machines, the simulator is architected to execute real code on real software stacks. Our particular user-space threading framework allows massive scales to be simulated even on small clusters. The link between the discrete event core and the threading framework allows interesting performance metrics like call graphs to be collected from a simulated run. Performance analysis via simulation can thus become an important phase in extreme-scale programming model and runtime system design via the SST macroscale components.

  11. Using Discrete Event Computer Simulation to Improve Patient Flow in a Ghanaian Acute Care Hospital

    PubMed Central

    Best, Allyson M.; Dixon, Cinnamon A.; Kelton, W. David; Lindsell, Christopher J.

    2014-01-01

    Objectives Crowding and limited resources have increased the strain on acute care facilities and emergency departments (EDs) worldwide. These problems are particularly prevalent in developing countries. Discrete event simulation (DES) is a computer-based tool that can be used to estimate how changes to complex healthcare delivery systems, such as EDs, will affect operational performance. Using this modality, our objective was to identify operational interventions that could potentially improve patient throughput of one acute care setting in a developing country. Methods We developed a simulation model of acute care at a district level hospital in Ghana to test the effects of resource-neutral (e.g. modified staff start times and roles) and resource-additional (e.g. increased staff) operational interventions on patient throughput. Previously captured, de-identified time-and-motion data from 487 acute care patients were used to develop and test the model. The primary outcome was the modeled effect of interventions on patient length of stay (LOS). Results The base-case (no change) scenario had a mean LOS of 292 minutes (95% CI 291, 293). In isolation, neither adding staffing, changing staff roles, nor varying shift times affected overall patient LOS. Specifically, adding two registration workers, history takers, and physicians resulted in a 23.8 (95% CI 22.3, 25.3) minute LOS decrease. However, when shift start-times were coordinated with patient arrival patterns, potential mean LOS was decreased by 96 minutes (95% CI 94, 98); and with the simultaneous combination of staff roles (Registration and History-taking) there was an overall mean LOS reduction of 152 minutes (95% CI 150, 154). Conclusions Resource-neutral interventions identified through DES modeling have the potential to improve acute care throughput in this Ghanaian municipal hospital. DES offers another approach to identifying potentially effective interventions to improve patient flow in emergency and acute

  12. Modeling Temporal Processes in Early Spacecraft Design: Application of Discrete-Event Simulations for Darpa's F6 Program

    NASA Technical Reports Server (NTRS)

    Dubos, Gregory F.; Cornford, Steven

    2012-01-01

    While the ability to model the state of a space system over time is essential during spacecraft operations, the use of time-based simulations remains rare in preliminary design. The absence of the time dimension in most traditional early design tools can however become a hurdle when designing complex systems whose development and operations can be disrupted by various events, such as delays or failures. As the value delivered by a space system is highly affected by such events, exploring the trade space for designs that yield the maximum value calls for the explicit modeling of time.This paper discusses the use of discrete-event models to simulate spacecraft development schedule as well as operational scenarios and on-orbit resources in the presence of uncertainty. It illustrates how such simulations can be utilized to support trade studies, through the example of a tool developed for DARPA's F6 program to assist the design of "fractionated spacecraft".

  13. Discrete-event simulation of nuclear-waste transport in geologic sites subject to disruptive events. Final report

    SciTech Connect

    Aggarwal, S.; Ryland, S.; Peck, R.

    1980-06-19

    This report outlines a methodology to study the effects of disruptive events on nuclear waste material in stable geologic sites. The methodology is based upon developing a discrete events model that can be simulated on the computer. This methodology allows a natural development of simulation models that use computer resources in an efficient manner. Accurate modeling in this area depends in large part upon accurate modeling of ion transport behavior in the storage media. Unfortunately, developments in this area are not at a stage where there is any consensus on proper models for such transport. Consequently, our work is directed primarily towards showing how disruptive events can be properly incorporated in such a model, rather than as a predictive tool at this stage. When and if proper geologic parameters can be determined, then it would be possible to use this as a predictive model. Assumptions and their bases are discussed, and the mathematical and computer model are described.

  14. Using Discrete Event Simulation to Model Integrated Commodities Consumption for a Launch Campaign of the Space Launch System

    NASA Technical Reports Server (NTRS)

    Leonard, Daniel; Parsons, Jeremy W.; Cates, Grant

    2014-01-01

    In May 2013, NASA's GSDO Program requested a study to develop a discrete event simulation (DES) model that analyzes the launch campaign process of the Space Launch System (SLS) from an integrated commodities perspective. The scope of the study includes launch countdown and scrub turnaround and focuses on four core launch commodities: hydrogen, oxygen, nitrogen, and helium. Previously, the commodities were only analyzed individually and deterministically for their launch support capability, but this study was the first to integrate them to examine the impact of their interactions on a launch campaign as well as the effects of process variability on commodity availability. The study produced a validated DES model with Rockwell Arena that showed that Kennedy Space Center's ground systems were capable of supporting a 48-hour scrub turnaround for the SLS. The model will be maintained and updated to provide commodity consumption analysis of future ground system and SLS configurations.

  15. Using the Integration of Discrete Event and Agent-Based Simulation to Enhance Outpatient Service Quality in an Orthopedic Department.

    PubMed

    Kittipittayakorn, Cholada; Ying, Kuo-Ching

    2016-01-01

    Many hospitals are currently paying more attention to patient satisfaction since it is an important service quality index. Many Asian countries' healthcare systems have a mixed-type registration, accepting both walk-in patients and scheduled patients. This complex registration system causes a long patient waiting time in outpatient clinics. Different approaches have been proposed to reduce the waiting time. This study uses the integration of discrete event simulation (DES) and agent-based simulation (ABS) to improve patient waiting time and is the first attempt to apply this approach to solve this key problem faced by orthopedic departments. From the data collected, patient behaviors are modeled and incorporated into a massive agent-based simulation. The proposed approach is an aid for analyzing and modifying orthopedic department processes, allows us to consider far more details, and provides more reliable results. After applying the proposed approach, the total waiting time of the orthopedic department fell from 1246.39 minutes to 847.21 minutes. Thus, using the correct simulation model significantly reduces patient waiting time in an orthopedic department. PMID:27195606

  16. Improving Customer Waiting Time at a DMV Center Using Discrete-Event Simulation

    NASA Technical Reports Server (NTRS)

    Arnaout, Georges M.; Bowling, Shannon

    2010-01-01

    Virginia's Department of Motor Vehicles (DMV) serves a customer base of approximately 5.6 million licensed drivers and ID card holders and 7 million registered vehicle owners. DMV has more daily face-to-face contact with Virginia's citizens than any other state agency [1]. The DMV faces a major difficulty in keeping up with the excessively large customers' arrival rate. The consequences are queues building up, stretching out to the entrance doors (and sometimes even outside) and customers complaining. While the DMV state employees are trying to serve at their fastest pace, the remarkably large queues indicate that there is a serious problem that the DMV faces in its services, which must be dealt with rapidly. Simulation is considered as one of the best tools for evaluating and improving complex systems. In this paper, we use it to model one of the DMV centers located in Norfolk, VA. The simulation model is modeled in Arena 10.0 from Rockwell systems. The data used is collected from experts of the DMV Virginia headquarter located in Richmond. The model created was verified and validated. The intent of this study is to identify key problems causing the delays at the DMV centers and suggest possible solutions to minimize the customers' waiting time. In addition, two tentative hypotheses aiming to improve the model's design are tested and validated.

  17. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  18. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  19. Analysis hierarchical model for discrete event systems

    NASA Astrophysics Data System (ADS)

    Ciortea, E. M.

    2015-11-01

    The This paper presents the hierarchical model based on discrete event network for robotic systems. Based on the hierarchical approach, Petri network is analysed as a network of the highest conceptual level and the lowest level of local control. For modelling and control of complex robotic systems using extended Petri nets. Such a system is structured, controlled and analysed in this paper by using Visual Object Net ++ package that is relatively simple and easy to use, and the results are shown as representations easy to interpret. The hierarchical structure of the robotic system is implemented on computers analysed using specialized programs. Implementation of hierarchical model discrete event systems, as a real-time operating system on a computer network connected via a serial bus is possible, where each computer is dedicated to local and Petri model of a subsystem global robotic system. Since Petri models are simplified to apply general computers, analysis, modelling, complex manufacturing systems control can be achieved using Petri nets. Discrete event systems is a pragmatic tool for modelling industrial systems. For system modelling using Petri nets because we have our system where discrete event. To highlight the auxiliary time Petri model using transport stream divided into hierarchical levels and sections are analysed successively. Proposed robotic system simulation using timed Petri, offers the opportunity to view the robotic time. Application of goods or robotic and transmission times obtained by measuring spot is obtained graphics showing the average time for transport activity, using the parameters sets of finished products. individually.

  20. A Discrete Event Simulation Model for Evaluating the Performances of an M/G/C/C State Dependent Queuing System

    PubMed Central

    Khalid, Ruzelan; M. Nawawi, Mohd Kamal; Kawsar, Luthful A.; Ghani, Noraida A.; Kamil, Anton A.; Mustafa, Adli

    2013-01-01

    M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed. PMID:23560037

  1. Terminal Dynamics Approach to Discrete Event Systems

    NASA Technical Reports Server (NTRS)

    Zak, Michail; Meyers, Ronald

    1995-01-01

    This paper presents and discusses a mathematical formalism for simulation of discrete event dynamic (DED)-a special type of 'man-made' systems to serve specific purposes of information processing. The main objective of this work is to demonstrate that the mathematical formalism for DED can be based upon a terminal model of Newtonian dynamics which allows one to relax Lipschitz conditions at some discrete points.!.

  2. Budget Impact Analysis of Switching to Digital Mammography in a Population-Based Breast Cancer Screening Program: A Discrete Event Simulation Model

    PubMed Central

    Comas, Mercè; Arrospide, Arantzazu; Mar, Javier; Sala, Maria; Vilaprinyó, Ester; Hernández, Cristina; Cots, Francesc; Martínez, Juan; Castells, Xavier

    2014-01-01

    Objective To assess the budgetary impact of switching from screen-film mammography to full-field digital mammography in a population-based breast cancer screening program. Methods A discrete-event simulation model was built to reproduce the breast cancer screening process (biennial mammographic screening of women aged 50 to 69 years) combined with the natural history of breast cancer. The simulation started with 100,000 women and, during a 20-year simulation horizon, new women were dynamically entered according to the aging of the Spanish population. Data on screening were obtained from Spanish breast cancer screening programs. Data on the natural history of breast cancer were based on US data adapted to our population. A budget impact analysis comparing digital with screen-film screening mammography was performed in a sample of 2,000 simulation runs. A sensitivity analysis was performed for crucial screening-related parameters. Distinct scenarios for recall and detection rates were compared. Results Statistically significant savings were found for overall costs, treatment costs and the costs of additional tests in the long term. The overall cost saving was 1,115,857€ (95%CI from 932,147 to 1,299,567) in the 10th year and 2,866,124€ (95%CI from 2,492,610 to 3,239,638) in the 20th year, representing 4.5% and 8.1% of the overall cost associated with screen-film mammography. The sensitivity analysis showed net savings in the long term. Conclusions Switching to digital mammography in a population-based breast cancer screening program saves long-term budget expense, in addition to providing technical advantages. Our results were consistent across distinct scenarios representing the different results obtained in European breast cancer screening programs. PMID:24832200

  3. On extending parallelism to serial simulators

    NASA Technical Reports Server (NTRS)

    Nicol, David; Heidelberger, Philip

    1994-01-01

    This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.

  4. Expected lifetime numbers and costs of fractures in postmenopausal women with and without osteoporosis in Germany: a discrete event simulation model

    PubMed Central

    2014-01-01

    Background Osteoporotic fractures cause a large health burden and substantial costs. This study estimated the expected fracture numbers and costs for the remaining lifetime of postmenopausal women in Germany. Methods A discrete event simulation (DES) model which tracks changes in fracture risk due to osteoporosis, a previous fracture or institutionalization in a nursing home was developed. Expected lifetime fracture numbers and costs per capita were estimated for postmenopausal women (aged 50 and older) at average osteoporosis risk (AOR) and for those never suffering from osteoporosis. Direct and indirect costs were modeled. Deterministic univariate and probabilistic sensitivity analyses were conducted. Results The expected fracture numbers over the remaining lifetime of a 50 year old woman with AOR for each fracture type (% attributable to osteoporosis) were: hip 0.282 (57.9%), wrist 0.229 (18.2%), clinical vertebral 0.206 (39.2%), humerus 0.147 (43.5%), pelvis 0.105 (47.5%), and other femur 0.033 (52.1%). Expected discounted fracture lifetime costs (excess cost attributable to osteoporosis) per 50 year old woman with AOR amounted to €4,479 (€1,995). Most costs were accrued in the hospital €1,743 (€751) and long-term care sectors €1,210 (€620). Univariate sensitivity analysis resulted in percentage changes between -48.4% (if fracture rates decreased by 2% per year) and +83.5% (if fracture rates increased by 2% per year) compared to base case excess costs. Costs for women with osteoporosis were about 3.3 times of those never getting osteoporosis (€7,463 vs. €2,247), and were markedly increased for women with a previous fracture. Conclusion The results of this study indicate that osteoporosis causes a substantial share of fracture costs in postmenopausal women, which strongly increase with age and previous fractures. PMID:24981316

  5. Inflated speedups in parallel simulations via malloc()

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

  6. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  7. Parallel Atomistic Simulations

    SciTech Connect

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  8. Non-Lipschitz Dynamics Approach to Discrete Event Systems

    NASA Technical Reports Server (NTRS)

    Zak, M.; Meyers, R.

    1995-01-01

    This paper presents and discusses a mathematical formalism for simulation of discrete event dynamics (DED) - a special type of 'man- made' system designed to aid specific areas of information processing. A main objective is to demonstrate that the mathematical formalism for DED can be based upon the terminal model of Newtonian dynamics which allows one to relax Lipschitz conditions at some discrete points.

  9. Scaling Time Warp-based Discrete Event Execution to 104 Processors on Blue Gene Supercomputer

    SciTech Connect

    Perumalla, Kalyan S

    2007-01-01

    Lately, important large-scale simulation applications, such as emergency/event planning and response, are emerging that are based on discrete event models. The applications are characterized by their scale (several millions of simulated entities), their fine-grained nature of computation (microseconds per event), and their highly dynamic inter-entity event interactions. The desired scale and speed together call for highly scalable parallel discrete event simulation (PDES) engines. However, few such parallel engines have been designed or tested on platforms with thousands of processors. Here an overview is given of a unique PDES engine that has been designed to support Time Warp-style optimistic parallel execution as well as a more generalized mixed, optimistic-conservative synchronization. The engine is designed to run on massively parallel architectures with minimal overheads. A performance study of the engine is presented, including the first results to date of PDES benchmarks demonstrating scalability to as many as 16,384 processors, on an IBM Blue Gene supercomputer. The results show, for the first time, the promise of effectively sustaining very large scale discrete event execution on up to 104 processors.

  10. Parallel system simulation

    SciTech Connect

    Tai, H.M.; Saeks, R.

    1984-03-01

    A relaxation algorithm for solving large-scale system simulation problems in parallel is proposed. The algorithm, which is composed of both a time-step parallel algorithm and a component-wise parallel algorithm, is described. The interconnected nature of the system, which is characterized by the component connection model, is fully exploited by this approach. A technique for finding an optimal number of the time steps is also described. Finally, this algorithm is illustrated via several examples in which the possible trade-offs between the speed-up ratio, efficiency, and waiting time are analyzed.

  11. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  12. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  13. An algebra of discrete event processes

    NASA Technical Reports Server (NTRS)

    Heymann, Michael; Meyer, George

    1991-01-01

    This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.

  14. Optimal Discrete Event Supervisory Control of Aircraft Gas Turbine Engines

    NASA Technical Reports Server (NTRS)

    Litt, Jonathan (Technical Monitor); Ray, Asok

    2004-01-01

    This report presents an application of the recently developed theory of optimal Discrete Event Supervisory (DES) control that is based on a signed real measure of regular languages. The DES control techniques are validated on an aircraft gas turbine engine simulation test bed. The test bed is implemented on a networked computer system in which two computers operate in the client-server mode. Several DES controllers have been tested for engine performance and reliability.

  15. Discrete Events as Units of Perceived Time

    ERIC Educational Resources Information Center

    Liverence, Brandon M.; Scholl, Brian J.

    2012-01-01

    In visual images, we perceive both space (as a continuous visual medium) and objects (that inhabit space). Similarly, in dynamic visual experience, we perceive both continuous time and discrete events. What is the relationship between these units of experience? The most intuitive answer may be similar to the spatial case: time is perceived as an…

  16. Multiple Autonomous Discrete Event Controllers for Constellations

    NASA Technical Reports Server (NTRS)

    Esposito, Timothy C.

    2003-01-01

    The Multiple Autonomous Discrete Event Controllers for Constellations (MADECC) project is an effort within the National Aeronautics and Space Administration Goddard Space Flight Center's (NASA/GSFC) Information Systems Division to develop autonomous positioning and attitude control for constellation satellites. It will be accomplished using traditional control theory and advanced coordination algorithms developed by the Johns Hopkins University Applied Physics Laboratory (JHU/APL). This capability will be demonstrated in the discrete event control test-bed located at JHU/APL. This project will be modeled for the Leonardo constellation mission, but is intended to be adaptable to any constellation mission. To develop a common software architecture. the controllers will only model very high-level responses. For instance, after determining that a maneuver must be made. the MADECC system will output B (Delta)V (velocity change) value. Lower level systems must then decide which thrusters to fire and for how long to achieve that (Delta)V.

  17. Parallel implementation of VHDL simulations on the Intel iPSC/2 hypercube. Master's thesis

    SciTech Connect

    Comeau, R.C.

    1991-12-01

    VHDL models are executed sequentially in current commercial simulators. As chip designs grow larger and more complex, simulations must run faster. One approach to increasing simulation speed is through parallel processors. This research transforms the behavioral and structural models created by Intermetrics' sequential VHDL simulator into models for parallel execution. The models are simulated on an Intel iPSC/2 hypercube with synchronization of the nodes being achieved by utilizing the Chandy Misra paradigm for discrete-event simulations. Three eight-bit adders, the ripple carry, the carry save, and the carry-lookahead, are each run through the parallel simulator. Simulation time is cut in at least half for all three test cases over the sequential Intermetrics model. Results with regard to speedup are given to show effects of different mappings, varying workloads per node, and overhead due to output messages.

  18. Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J; Tipparaju, Vinod

    2014-01-01

    Global virtual time (GVT) computation is a key determinant of the efficiency and runtime dynamics of parallel discrete event simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of up to 54 billion events executed per second is registered. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.

  19. Planning and supervision of reactor defueling using discrete event techniques

    SciTech Connect

    Garcia, H.E.; Imel, G.R.; Houshyar, A.

    1995-12-31

    New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.

  20. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  1. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  2. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  3. A new parallel simulation technique

    NASA Astrophysics Data System (ADS)

    Blanco-Pillado, Jose J.; Olum, Ken D.; Shlaer, Benjamin

    2012-01-01

    We develop a "semi-parallel" simulation technique suggested by Pretorius and Lehner, in which the simulation spacetime volume is divided into a large number of small 4-volumes that have only initial and final surfaces. Thus there is no two-way communication between processors, and the 4-volumes can be simulated independently and potentially at different times. This technique allows us to simulate much larger volumes than we otherwise could, because we are not limited by total memory size. No processor time is lost waiting for other processors. We compare a cosmic string simulation we developed using the semi-parallel technique with our previous MPI-based code for several test cases and find a factor of 2.6 improvement in the total amount of processor time required to accomplish the same job for strings evolving in the matter-dominated era.

  4. LAN attack detection using Discrete Event Systems.

    PubMed

    Hubballi, Neminath; Biswas, Santosh; Roopa, S; Ratti, Ritesh; Nandi, Sukumar

    2011-01-01

    Address Resolution Protocol (ARP) is used for determining the link layer or Medium Access Control (MAC) address of a network host, given its Internet Layer (IP) or Network Layer address. ARP is a stateless protocol and any IP-MAC pairing sent by a host is accepted without verification. This weakness in the ARP may be exploited by malicious hosts in a Local Area Network (LAN) by spoofing IP-MAC pairs. Several schemes have been proposed in the literature to circumvent these attacks; however, these techniques either make IP-MAC pairing static, modify the existing ARP, patch operating systems of all the hosts etc. In this paper we propose a Discrete Event System (DES) approach for Intrusion Detection System (IDS) for LAN specific attacks which do not require any extra constraint like static IP-MAC, changing the ARP etc. A DES model is built for the LAN under both a normal and compromised (i.e., spoofed request/response) situation based on the sequences of ARP related packets. Sequences of ARP events in normal and spoofed scenarios are similar thereby rendering the same DES models for both the cases. To create different ARP events under normal and spoofed conditions the proposed technique uses active ARP probing. However, this probing adds extra ARP traffic in the LAN. Following that a DES detector is built to determine from observed ARP related events, whether the LAN is operating under a normal or compromised situation. The scheme also minimizes extra ARP traffic by probing the source IP-MAC pair of only those ARP packets which are yet to be determined as genuine/spoofed by the detector. Also, spoofed IP-MAC pairs determined by the detector are stored in tables to detect other LAN attacks triggered by spoofing namely, man-in-the-middle (MiTM), denial of service etc. The scheme is successfully validated in a test bed. PMID:20804980

  5. Modelling machine ensembles with discrete event dynamical system theory

    NASA Technical Reports Server (NTRS)

    Hunter, Dan

    1990-01-01

    Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).

  6. Discrete Event Supervisory Control Applied to Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Litt, Jonathan S.; Shah, Neerav

    2005-01-01

    The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.

  7. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  8. Parallel Network Simulations with NEURON

    PubMed Central

    Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.

    2009-01-01

    The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488

  9. Empirical Evaluation of Conservative and Optimistic Discrete Event Execution on Cloud and VM Platforms

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offs in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.

  10. Hierarchical Discrete Event Supervisory Control of Aircraft Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Yasar, Murat; Tolani, Devendra; Ray, Asok; Shah, Neerav; Litt, Jonathan S.

    2004-01-01

    This paper presents a hierarchical application of Discrete Event Supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the mission objectives. Each engine is operated under a continuously varying control system that maintains the specified performance and a local discrete-event supervisor for condition monitoring and life extending control. A global upper level DES controller is designed for load balancing and overall health management of the propulsion system.

  11. CAISSON: Interconnect Network Simulator

    NASA Technical Reports Server (NTRS)

    Springer, Paul L.

    2006-01-01

    Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.

  12. Parallel execution and scriptability in micromagnetic simulations

    NASA Astrophysics Data System (ADS)

    Fischbacher, Thomas; Franchin, Matteo; Bordignon, Giuliano; Knittel, Andreas; Fangohr, Hans

    2009-04-01

    We demonstrate the feasibility of an "encapsulated parallelism" approach toward micromagnetic simulations that combines offering a high degree of flexibility to the user with the efficient utilization of parallel computing resources. While parallelization is obviously desirable to address the high numerical effort required for realistic micromagnetic simulations through utilizing now widely available multiprocessor systems (including desktop multicore CPUs and computing clusters), conventional approaches toward parallelization impose strong restrictions on the structure of programs: numerical operations have to be executed across all processors in a synchronized fashion. This means that from the user's perspective, either the structure of the entire simulation is rigidly defined from the beginning and cannot be adjusted easily, or making modifications to the computation sequence requires advanced knowledge in parallel programming. We explain how this dilemma is resolved in the NMAG simulation package in such a way that the user can utilize without any additional effort on his side both the computational power of multiple CPUs and the flexibility to tailor execution sequences for specific problems: simulation scripts written for single-processor machines can just as well be executed on parallel machines and behave in precisely the same way, up to increased speed. We provide a simple instructive magnetic resonance simulation example that demonstrates utilizing both custom execution sequences and parallelism at the same time. Furthermore, we show that this strategy of encapsulating parallelism even allows to benefit from speed gains through parallel execution in simulations controlled by interactive commands given at a command line interface.

  13. Structured building model reduction toward parallel simulation

    SciTech Connect

    Dobbs, Justin R.; Hencey, Brondon M.

    2013-08-26

    Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

  14. Data parallel sorting for particle simulation

    NASA Technical Reports Server (NTRS)

    Dagum, Leonardo

    1992-01-01

    Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.

  15. Acoustic simulation in architecture with parallel algorithm

    NASA Astrophysics Data System (ADS)

    Li, Xiaohong; Zhang, Xinrong; Li, Dan

    2004-03-01

    In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

  16. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  17. Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†

    PubMed Central

    Khan, Md. Ashfaquzzaman; Herbordt, Martin C.

    2011-01-01

    Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327

  18. Control of discrete event systems modeled as hierarchical state machines

    NASA Technical Reports Server (NTRS)

    Brave, Y.; Heymann, M.

    1991-01-01

    The authors examine a class of discrete event systems (DESs) modeled as asynchronous hierarchical state machines (AHSMs). For this class of DESs, they provide an efficient method for testing reachability, which is an essential step in many control synthesis procedures. This method utilizes the asynchronous nature and hierarchical structure of AHSMs, thereby illustrating the advantage of the AHSM representation as compared with its equivalent (flat) state machine representation. An application of the method is presented where an online minimally restrictive solution is proposed for the problem of maintaining a controlled AHSM within prescribed legal bounds.

  19. State-space supervision of reconfigurable discrete event systems

    SciTech Connect

    Garcia, H.E.; Ray, A.

    1995-12-31

    The Discrete Event Systems (DES) theory of supervisory and state feedback control offers many advantages for implementing supervisory systems. Algorithmic concepts have been introduced to assure that the supervising algorithms are correct and meet the specifications. It is often assumed that the supervisory specifications are invariant or, at least, until a given supervisory task is completed. However, there are many practical applications where the supervising specifications update at real time. For example, in a Reconfigurable Discrete Event System (RDES) architecture, a bank of supervisors is defined to accommodate each identified operational condition or different supervisory specifications. This adaptive supervisory control system changes the supervisory configuration to accept coordinating commands or to adjust for changes in the controlled process. This paper addresses reconfiguration at the supervisory level of hybrid systems along with a RDES underlying architecture. It reviews the state-based supervisory control theory and extends it to the paradigm of RDES and in view of process control applications. The paper addresses theoretical issues with a limited number of practical examples. This control approach is particularly suitable for hierarchical reconfigurable hybrid implementations.

  20. Stochastic Parallel PARticle Kinetic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2008-07-01

    SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore » simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less

  1. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  2. Parallel processing of a rotating shaft simulation

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.

    1989-01-01

    A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

  3. The Xyce Parallel Electronic Simulator - An Overview

    SciTech Connect

    HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.

    2000-12-08

    The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.

  4. Parallel and Distributed System Simulation

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack

    1998-01-01

    This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.

  5. Xyce parallel electronic simulator release notes.

    SciTech Connect

    Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  6. Safety Discrete Event Models for Holonic Cyclic Manufacturing Systems

    NASA Astrophysics Data System (ADS)

    Ciufudean, Calin; Filote, Constantin

    In this paper the expression “holonic cyclic manufacturing systems” refers to complex assembly/disassembly systems or fork/join systems, kanban systems, and in general, to any discrete event system that transforms raw material and/or components into products. Such a system is said to be cyclic if it provides the same sequence of products indefinitely. This paper considers the scheduling of holonic cyclic manufacturing systems and describes a new approach using Petri nets formalism. We propose an approach to frame the optimum schedule of holonic cyclic manufacturing systems in order to maximize the throughput while minimize the work in process. We also propose an algorithm to verify the optimum schedule.

  7. Parallel Performance of a Combustion Chemistry Simulation

    DOE PAGESBeta

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  8. Parallel Simulation of Unsteady Turbulent Flames

    NASA Technical Reports Server (NTRS)

    Menon, Suresh

    1996-01-01

    Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics

  9. Parallel algorithm strategies for circuit simulation.

    SciTech Connect

    Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

    2010-01-01

    Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

  10. Efficient massively parallel simulation of dynamic channel assignment schemes for wireless cellular communications

    NASA Technical Reports Server (NTRS)

    Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.

    1994-01-01

    Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.

  11. Xyce parallel electronic simulator : reference guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  12. Parallel Implicit Kinetic Simulation with PARSEK

    NASA Astrophysics Data System (ADS)

    Stefano, Markidis; Giovanni, Lapenta

    2004-11-01

    Kinetic plasma simulation is the ultimate tool for plasma analysis. One of the prime tools for kinetic simulation is the particle in cell (PIC) method. The explicit or semi-implicit (i.e. implicit only on the fields) PIC method requires exceedingly small time steps and grid spacing, limited by the necessity to resolve the electron plasma frequency, the Debye length and the speed of light (for fully explicit schemes). A different approach is to consider fully implicit PIC methods where both particles and fields are discretized implicitly. This approach allows radically larger time steps and grid spacing, reducing the cost of a simulation by orders of magnitude while keeping the full kinetic treatment. In our previous work, simulations impossible for the explicit PIC method even on massively parallel computers have been made possible on a single processor machine using the implicit PIC code CELESTE3D [1]. We propose here another quantum leap: PARSEK, a parallel cousin of CELESTE3D, based on the same approach but sporting a radically redesigned software architecture (object oriented C++, where CELESTE3D was structured and written in FORTRAN77/90) and fully parallelized using MPI for both particle and grid communication. [1] G. Lapenta, J.U. Brackbill, W.S. Daughton, Phys. Plasmas, 10, 1577 (2003).

  13. Parallel node placement method by bubble simulation

    NASA Astrophysics Data System (ADS)

    Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

    2014-03-01

    An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newton’s Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

  14. Parallel-distributed mobile robot simulator

    NASA Astrophysics Data System (ADS)

    Okada, Hiroyuki; Sekiguchi, Minoru; Watanabe, Nobuo

    1996-06-01

    The aim of this project is to achieve an autonomous learning and growth function based on active interaction with the real world. It should also be able to autonomically acquire knowledge about the context in which jobs take place, and how the jobs are executed. This article describes a parallel distributed movable robot system simulator with an autonomous learning and growth function. The autonomous learning and growth function which we are proposing is characterized by its ability to learn and grow through interaction with the real world. When the movable robot interacts with the real world, the system compares the virtual environment simulation with the interaction result in the real world. The system then improves the virtual environment to match the real-world result more closely. This the system learns and grows. It is very important that such a simulation is time- realistic. The parallel distributed movable robot simulator was developed to simulate the space of a movable robot system with an autonomous learning and growth function. The simulator constructs a virtual space faithful to the real world and also integrates the interfaces between the user, the actual movable robot and the virtual movable robot. Using an ultrafast CG (computer graphics) system (FUJITSU AG series), time-realistic 3D CG is displayed.

  15. Xyce(™) Parallel Electronic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  16. Xyce(™) Parallel Electronic Simulator

    SciTech Connect

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.

  17. Parallelism extraction and program restructuring for parallel simulation of digital systems

    SciTech Connect

    Vellandi, B.L.

    1990-01-01

    Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.

  18. Parallel Strategies for Crash and Impact Simulations

    SciTech Connect

    Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

    1998-12-07

    We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

  19. Optimal Parametric Discrete Event Control: Problem and Solution

    SciTech Connect

    Griffin, Christopher H

    2008-01-01

    We present a novel optimization problem for discrete event control, similar in spirit to the optimal parametric control problem common in statistical process control. In our problem, we assume a known finite state machine plant model $G$ defined over an event alphabet $\\Sigma$ so that the plant model language $L = \\LanM(G)$ is prefix closed. We further assume the existence of a \\textit{base control structure} $M_K$, which may be either a finite state machine or a deterministic pushdown machine. If $K = \\LanM(M_K)$, we assume $K$ is prefix closed and that $K \\subseteq L$. We associate each controllable transition of $M_K$ with a binary variable $X_1,\\dots,X_n$ indicating whether the transition is enabled or not. This leads to a function $M_K(X_1,\\dots,X_n)$, that returns a new control specification depending upon the values of $X_1,\\dots,X_n$. We exhibit a branch-and-bound algorithm to solve the optimization problem $\\min_{X_1,\\dots,X_n}\\max_{w \\in K} C(w)$ such that $M_K(X_1,\\dots,X_n) \\models \\Pi$ and $\\LanM(M_K(X_1,\\dots,X_n)) \\in \\Con(L)$. Here $\\Pi$ is a set of logical assertions on the structure of $M_K(X_1,\\dots,X_n)$, and $M_K(X_1,\\dots,X_n) \\models \\Pi$ indicates that $M_K(X_1,\\dots,X_n)$ satisfies the logical assertions; and, $\\Con(L)$ is the set of controllable sublanguages of $L$.

  20. Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype

    ERIC Educational Resources Information Center

    Sanchez, A.; Bucio, J.

    2012-01-01

    This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a…

  1. Massively Parallel Direct Simulation of Multiphase Flow

    SciTech Connect

    COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

    2000-08-10

    The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

  2. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  3. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  4. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  5. A polymorphic reconfigurable emulator for parallel simulation

    NASA Technical Reports Server (NTRS)

    Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

    1980-01-01

    Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

  6. Parallel multiscale simulations of a brain aneurysm

    SciTech Connect

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in

  7. Parallel multiscale simulations of a brain aneurysm

    NASA Astrophysics Data System (ADS)

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  8. Parallel multiscale simulations of a brain aneurysm.

    PubMed

    Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  9. A parallel algorithm for implicit depletant simulations

    NASA Astrophysics Data System (ADS)

    Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.

    2015-11-01

    We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ϕc < 0.50, and additionally enables simulation of the fluid-solid transition.

  10. A scalable parallel black oil simulator on distributed memory parallel computers

    NASA Astrophysics Data System (ADS)

    Wang, Kun; Liu, Hui; Chen, Zhangxin

    2015-11-01

    This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

  11. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1997-01-01

    Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be

  12. Parallel magnetic field perturbations in gyrokinetic simulations

    SciTech Connect

    Joiner, N.; Hirose, A.; Dorland, W.

    2010-07-15

    At low beta it is common to neglect parallel magnetic field perturbations on the basis that they are of order beta{sup 2}. This is only true if effects of order beta are canceled by a term in the nablaB drift also of order beta[H. L. Berk and R. R. Dominguez, J. Plasma Phys. 18, 31 (1977)]. To our knowledge this has not been rigorously tested with modern gyrokinetic codes. In this work we use the gyrokinetic code GS2[Kotschenreuther et al., Comput. Phys. Commun. 88, 128 (1995)] to investigate whether the compressional magnetic field perturbation B{sub ||} is required for accurate gyrokinetic simulations at low beta for microinstabilities commonly found in tokamaks. The kinetic ballooning mode (KBM) demonstrates the principle described by Berk and Dominguez strongly, as does the trapped electron mode, in a less dramatic way. The ion and electron temperature gradient (ETG) driven modes do not typically exhibit this behavior; the effects of B{sub ||} are found to depend on the pressure gradients. The terms which are seen to cancel at long wavelength in KBM calculations can be cumulative in the ion temperature gradient case and increase with eta{sub e}. The effect of B{sub ||} on the ETG instability is shown to depend on the normalized pressure gradient beta{sup '} at constant beta.

  13. Parallel Simulation of Explosion in AN Unlimited Atmosphere

    NASA Astrophysics Data System (ADS)

    Ma, Tianbao; Wang, Cheng; Fei, Guanglei; Ning, Jianguo

    In this paper, a parallel Eulerian hydrocode for the simulation of large scale complicated explosion and impact problem is developed. The data dependency in the parallel algorithm is studied in particular. As a test, the three dimensional numerical simulation of the explosion field in an unlimited atmosphere is performed. The numerical results are in good agreement with the empirical results, indicating that the proposed parallel algorithm in this paper is valid. Finally, the parallel speedup and parallel efficiency under different dividing domain areas are analyzed.

  14. Behavior coordination of mobile robotics using supervisory control of fuzzy discrete event systems.

    PubMed

    Jayasiri, Awantha; Mann, George K I; Gosine, Raymond G

    2011-10-01

    In order to incorporate the uncertainty and impreciseness present in real-world event-driven asynchronous systems, fuzzy discrete event systems (DESs) (FDESs) have been proposed as an extension to crisp DESs. In this paper, first, we propose an extension to the supervisory control theory of FDES by redefining fuzzy controllable and uncontrollable events. The proposed supervisor is capable of enabling feasible uncontrollable and controllable events with different possibilities. Then, the extended supervisory control framework of FDES is employed to model and control several navigational tasks of a mobile robot using the behavior-based approach. The robot has limited sensory capabilities, and the navigations have been performed in several unmodeled environments. The reactive and deliberative behaviors of the mobile robotic system are weighted through fuzzy uncontrollable and controllable events, respectively. By employing the proposed supervisory controller, a command-fusion-type behavior coordination is achieved. The observability of fuzzy events is incorporated to represent the sensory imprecision. As a systematic analysis of the system, a fuzzy-state-based controllability measure is introduced. The approach is implemented in both simulation and real time. A performance evaluation is performed to quantitatively estimate the validity of the proposed approach over its counterparts. PMID:21421445

  15. Discrete event command and control for networked teams with multiple missions

    NASA Astrophysics Data System (ADS)

    Lewis, Frank L.; Hudas, Greg R.; Pang, Chee Khiang; Middleton, Matthew B.; McMurrough, Christopher

    2009-05-01

    During mission execution in military applications, the TRADOC Pamphlet 525-66 Battle Command and Battle Space Awareness capabilities prescribe expectations that networked teams will perform in a reliable manner under changing mission requirements, varying resource availability and reliability, and resource faults. In this paper, a Command and Control (C2) structure is presented that allows for computer-aided execution of the networked team decision-making process, control of force resources, shared resource dispatching, and adaptability to change based on battlefield conditions. A mathematically justified networked computing environment is provided called the Discrete Event Control (DEC) Framework. DEC has the ability to provide the logical connectivity among all team participants including mission planners, field commanders, war-fighters, and robotic platforms. The proposed data management tools are developed and demonstrated on a simulation study and an implementation on a distributed wireless sensor network. The results show that the tasks of multiple missions are correctly sequenced in real-time, and that shared resources are suitably assigned to competing tasks under dynamically changing conditions without conflicts and bottlenecks.

  16. Improving the performance of molecular dynamics simulations on parallel clusters.

    PubMed

    Borstnik, Urban; Hodoscek, Milan; Janezic, Dusanka

    2004-01-01

    In this article a procedure is derived to obtain a performance gain for molecular dynamics (MD) simulations on existing parallel clusters. Parallel clusters use a wide array of interconnection technologies to connect multiple processors together, often at different speeds, such as multiple processor computers and networking. It is demonstrated how to configure existing programs for MD simulations to efficiently handle collective communication on parallel clusters with processor interconnections of different speeds. PMID:15032512

  17. Parallelization and automatic data distribution for nuclear reactor simulations

    SciTech Connect

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  18. Parallel methods for dynamic simulation of multiple manipulator systems

    NASA Technical Reports Server (NTRS)

    Mcmillan, Scott; Sadayappan, P.; Orin, David E.

    1993-01-01

    In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

  19. Theory and simulation of collisionless parallel shocks

    NASA Technical Reports Server (NTRS)

    Quest, K. B.

    1988-01-01

    This paper presents a self-consistent theoretical model for collisionless parallel shock structure, based on the hypothesis that shock dissipation and heating can be provided by electromagnetic ion beam-driven instabilities. It is shown that shock formation and plasma heating can result from parallel propagating electromagnetic ion beam-driven instabilities for a wide range of Mach numbers and upstream plasma conditions. The theoretical predictions are compared with recently published observations of quasi-parallel interplanetary shocks. It was found that low Mach number interplanetary shock observations were consistent with the explanation that group-standing waves are providing the dissipation; two high Mach number observations confirmed the theoretically predicted rapid thermalization across the shock.

  20. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  1. Xyce parallel electronic simulator : users' guide. Version 5.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical

  2. Xyce Parallel Electronic Simulator : users' guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical

  3. A conservative approach to parallelizing the Sharks World simulation

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Riffe, Scott E.

    1990-01-01

    Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.

  4. Iterative Schemes for Time Parallelization with Application to Reservoir Simulation

    SciTech Connect

    Garrido, I; Fladmark, G E; Espedal, M S; Lee, B

    2005-04-18

    Parallel methods are usually not applied to the time domain because of the inherit sequentialness of time evolution. But for many evolutionary problems, computer simulation can benefit substantially from time parallelization methods. In this paper, they present several such algorithms that actually exploit the sequential nature of time evolution through a predictor-corrector procedure. This sequentialness ensures convergence of a parallel predictor-corrector scheme within a fixed number of iterations. The performance of these novel algorithms, which are derived from the classical alternating Schwarz method, are illustrated through several numerical examples using the reservoir simulator Athena.

  5. n-body simulations using message passing parallel computers.

    NASA Astrophysics Data System (ADS)

    Grama, A. Y.; Kumar, V.; Sameh, A.

    The authors present new parallel formulations of the Barnes-Hut method for n-body simulations on message passing computers. These parallel formulations partition the domain efficiently incurring minimal communication overhead. This is in contrast to existing schemes that are based on sorting a large number of keys or on the use of global data structures. The new formulations are augmented by alternate communication strategies which serve to minimize communication overhead. The impact of these communication strategies is experimentally studied. The authors report on experimental results obtained from an astrophysical simulation on an nCUBE2 parallel computer.

  6. Traffic simulations on parallel computers using domain decomposition techniques

    SciTech Connect

    Hanebutte, U.R.; Tentner, A.M.

    1995-12-31

    Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.

  7. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  8. A CUDA based parallel multi-phase oil reservoir simulator

    NASA Astrophysics Data System (ADS)

    Zaza, Ayham; Awotunde, Abeeb A.; Fairag, Faisal A.; Al-Mouhamed, Mayez A.

    2016-09-01

    Forward Reservoir Simulation (FRS) is a challenging process that models fluid flow and mass transfer in porous media to draw conclusions about the behavior of certain flow variables and well responses. Besides the operational cost associated with matrix assembly, FRS repeatedly solves huge and computationally expensive sparse, ill-conditioned and unsymmetrical linear system. Moreover, as the computation for practical reservoir dimensions lasts for long times, speeding up the process by taking advantage of parallel platforms is indispensable. By considering the state of art advances in massively parallel computing and the accompanying parallel architecture, this work aims primarily at developing a CUDA-based parallel simulator for oil reservoir. In addition to the initial reported 33 times speed gain compared to the serial version, running experiments showed that BiCGSTAB is a stable and fast solver which could be incorporated in such simulations instead of the more expensive, storage demanding and usually utilized GMRES.

  9. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1998-01-01

    We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10

  10. Improved task scheduling for parallel simulations. Master's thesis

    SciTech Connect

    McNear, A.E.

    1991-12-01

    The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.

  11. Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

    SciTech Connect

    HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

    2002-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models

  12. Parallel Optimization with Large Eddy Simulations

    NASA Astrophysics Data System (ADS)

    Talnikar, Chaitanya; Blonigan, Patrick; Bodart, Julien; Wang, Qiqi; Alex Gorodetsky Collaboration; Jasper Snoek Collaboration

    2014-11-01

    For design optimization results to be useful, the model used must be trustworthy. For turbulent flows, Large Eddy Simulations (LES) can capture separation and other phenomena that traditional models such as RANS struggle with. However, optimization with LES can be challenging because of noisy objective function evaluations. This noise is a consequence of the sampling error of turbulent statistics, or long time averaged quantities of interest, such as the drag of an airfoil or heat transfer to a turbine blade. The sampling error causes the objective function to vary noisily with respect to design parameters for finite time simulations. Furthermore, the noise decays very slowly as computational time increases. Therefore, robustness with noisy objective functions is a crucial prerequisite to optimization candidates for LES. One way of dealing with noisy objective functions is to filter the noise using a surrogate model. Bayesian optimization, which uses Gaussian processes as surrogates, has shown promise in optimizing expensive objective functions. The following talk presents a new approach for optimization with LES incorporating these ideas. Applications to flow control of a turbulent channel and the design of a turbine blade trailing edge are also discussed.

  13. Applying Parallel Processing Techniques to Tether Dynamics Simulation

    NASA Technical Reports Server (NTRS)

    Wells, B. Earl

    1996-01-01

    The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

  14. Max-plus Algebraic Tools for Discrete Event Systems, Static Analysis, and Zero-Sum Games

    NASA Astrophysics Data System (ADS)

    Gaubert, Stéphane

    The max-plus algebraic approach of timed discrete event systems emerged in the eighties, after the discovery that synchronization phenomena can be modeled in a linear way in the max-plus setting. This led to a number of results, like the determination of long term characteristics (throughput, stationary regime) by spectral theory methods or the representation of the input-output behavior by rational series.

  15. Parallel Simulation of Underdense Plasma Photocathode Experiments

    NASA Astrophysics Data System (ADS)

    Bruhwiler, David; Hidding, Bernhard; Xi, Yunfeng; Andonian, Gerard; Rosenzweig, James; Cormier-Michel, Estelle

    2013-10-01

    The underdense plasma photocathode concept (aka Trojan horse) is a promising approach to achieving fs-scale electron bunches with pC-scale charge and transverse normalized emittance below 0.01 mm-mrad, yielding peak currents of order 100 A and beam brightness as high as 1019 A /m2 / rad2 , for a wide range of achievable beam energies up to 10 GeV. A proof-of-principle experiment will be conducted at the FACET user facility in early 2014. We present 2D and 3D simulations with physical parameters relevant to the planned experiment. Work supported by DOE under Contract Nos. DE-SC0009533, DE-FG02-07ER46272 and DEFG03-92ER40693, and by ONR under Contract No. N00014-06-1-0925. NERSC computing resources are supported by DOE.

  16. Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

    SciTech Connect

    Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

    2007-01-01

    An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.

  17. A network of discrete events for the representation and analysis of diffusion dynamics

    NASA Astrophysics Data System (ADS)

    Pintus, Alberto M.; Pazzona, Federico G.; Demontis, Pierfranco; Suffritti, Giuseppe B.

    2015-11-01

    We developed a coarse-grained description of the phenomenology of diffusive processes, in terms of a space of discrete events and its representation as a network. Once a proper classification of the discrete events underlying the diffusive process is carried out, their transition matrix is calculated on the basis of molecular dynamics data. This matrix can be represented as a directed, weighted network where nodes represent discrete events, and the weight of edges is given by the probability that one follows the other. The structure of this network reflects dynamical properties of the process of interest in such features as its modularity and the entropy rate of nodes. As an example of the applicability of this conceptual framework, we discuss here the physics of diffusion of small non-polar molecules in a microporous material, in terms of the structure of the corresponding network of events, and explain on this basis the diffusivity trends observed. A quantitative account of these trends is obtained by considering the contribution of the various events to the displacement autocorrelation function.

  18. Fault Diagnosis in Discrete-Event Systems with Incomplete Models: Learnability and Diagnosability.

    PubMed

    Kwong, Raymond H; Yonge-Mallo, David L

    2015-07-01

    Most model-based approaches to fault diagnosis of discrete-event systems require a complete and accurate model of the system to be diagnosed. However, the discrete-event model may have arisen from abstraction and simplification of a continuous time system, or through model building from input-output data. As such, it may not capture the dynamic behavior of the system completely. In a previous paper, we addressed the problem of diagnosing faults given an incomplete model of the discrete-event system. We presented the learning diagnoser which not only diagnoses faults, but also attempts to learn missing model information through parsimonious hypothesis generation. In this paper, we study the properties of learnability and diagnosability. Learnability deals with the issue of whether the missing model information can be learned, while diagnosability corresponds to the ability to detect and isolate a fault after it has occurred. We provide conditions under which the learning diagnoser can learn missing model information. We define the notions of weak and strong diagnosability and also give conditions under which they hold. PMID:25204002

  19. Massively parallel switch-level simulation: A feasibility study

    SciTech Connect

    Kravitz, S.A.

    1989-01-01

    This thesis addresses the feasibility of mapping the COSMOS switch-level simulator onto computers with thousands of simple processors. COSMOS Preprocesses transistor networks into equivalent Boolean behavioral models, capturing the switch-level behavior of a circuit in a set of Boolean formulas. The author shows that thousand-fold parallelism exists in the formulas derived by COSMOS for some actual circuits. He exposes this parallelism by eliminating the event list from the simulator, and he demonstrates that this represents an attractive tradeoff given sufficient parallelism in the circuit model. To investigate the feasibility of this approach, he has developed a prototype implementation of the COSMOS simulator on a 32k processor Connection Machine.

  20. Xyce Parallel Electronic Simulator : users' guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce

  1. A hybrid parallel framework for the cellular Potts model simulations

    SciTech Connect

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  2. Parallel runway requirement analysis study. Volume 2: Simulation manual

    NASA Technical Reports Server (NTRS)

    Ebrahimi, Yaghoob S.; Chun, Ken S.

    1993-01-01

    This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.

  3. Parallelization of a Monte Carlo particle transport simulation code

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  4. Efficient parallel CFD-DEM simulations using OpenMP

    NASA Astrophysics Data System (ADS)

    Amritkar, Amit; Deb, Surya; Tafti, Danesh

    2014-01-01

    The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.

  5. Parallel PDE-Based Simulations Using the Common Component Architecture

    SciTech Connect

    McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

    2006-03-05

    Summary. The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of componentbased software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and generalpurpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications.

  6. Xyce Parallel Electronic Simulator : reference guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  7. Xyce parallel electronic simulator reference guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  8. Xyce Parallel Electronic Simulator : reference guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  9. Xyce parallel electronic simulator reference guide, version 6.0.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.

    2013-08-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].

  10. Parallel Computing Environments and Methods for Power Distribution System Simulation

    SciTech Connect

    Lu, Ning; Taylor, Zachary T.; Chassin, David P.; Guttromson, Ross T.; Studham, Scott S.

    2005-11-10

    The development of cost-effective high-performance parallel computing on multi-processor super computers makes it attractive to port excessively time consuming simulation software from personal computers (PC) to super computes. The power distribution system simulator (PDSS) takes a bottom-up approach and simulates load at appliance level, where detailed thermal models for appliances are used. This approach works well for a small power distribution system consisting of a few thousand appliances. When the number of appliances increases, the simulation uses up the PC memory and its run time increases to a point where the approach is no longer feasible to model a practical large power distribution system. This paper presents an effort made to port a PC-based power distribution system simulator (PDSS) to a 128-processor shared-memory super computer. The paper offers an overview of the parallel computing environment and a description of the modification made to the PDSS model. The performances of the PDSS running on a standalone PC and on the super computer are compared. Future research direction of utilizing parallel computing in the power distribution system simulation is also addressed.

  11. Parallel Performance Optimization of the Direct Simulation Monte Carlo Method

    NASA Astrophysics Data System (ADS)

    Gao, Da; Zhang, Chonglin; Schwartzentruber, Thomas

    2009-11-01

    Although the direct simulation Monte Carlo (DSMC) particle method is more computationally intensive compared to continuum methods, it is accurate for conditions ranging from continuum to free-molecular, accurate in highly non-equilibrium flow regions, and holds potential for incorporating advanced molecular-based models for gas-phase and gas-surface interactions. As available computer resources continue their rapid growth, the DSMC method is continually being applied to increasingly complex flow problems. Although processor clock speed continues to increase, a trend of increasing multi-core-per-node parallel architectures is emerging. To effectively utilize such current and future parallel computing systems, a combined shared/distributed memory parallel implementation (using both Open Multi-Processing (OpenMP) and Message Passing Interface (MPI)) of the DSMC method is under development. The parallel implementation of a new state-of-the-art 3D DSMC code employing an embedded 3-level Cartesian mesh will be outlined. The presentation will focus on performance optimization strategies for DSMC, which includes, but is not limited to, modified algorithm designs, practical code-tuning techniques, and parallel performance optimization. Specifically, key issues important to the DSMC shared memory (OpenMP) parallel performance are identified as (1) granularity (2) load balancing (3) locality and (4) synchronization. Challenges and solutions associated with these issues as they pertain to the DSMC method will be discussed.

  12. Molecular simulation of rheological properties using massively parallel supercomputers

    SciTech Connect

    Bhupathiraju, R.K.; Cui, S.T.; Gupta, S.A.; Cummings, P.T.; Cochran, H.D.

    1996-11-01

    Advances in parallel supercomputing now make possible molecular-based engineering and science calculations that will soon revolutionize many technologies, such as those involving polymers and those involving aqueous electrolytes. We have developed a suite of message-passing codes for classical molecular simulation of such complex fluids and amorphous materials and have completed a number of demonstration calculations of problems of scientific and technological importance with each. In this paper, we will focus on the molecular simulation of rheological properties, particularly viscosity, of simple and complex fluids using parallel implementations of non-equilibrium molecular dynamics. Such calculations represent significant challenges computationally because, in order to reduce the thermal noise in the calculated properties within acceptable limits, large systems and/or long simulated times are required.

  13. PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

    SciTech Connect

    Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A; Popov, Emilian L

    2012-01-01

    At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM. To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.

  14. Parallelization of Program to Optimize Simulated Trajectories (POST3D)

    NASA Technical Reports Server (NTRS)

    Hammond, Dana P.; Korte, John J. (Technical Monitor)

    2001-01-01

    This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

  15. Numerical simulation of polymer flows: A parallel computing approach

    SciTech Connect

    Aggarwal, R.; Keunings, R.; Roux, F.X.

    1993-12-31

    We present a parallel algorithm for the numerical simulation of viscoelastic fluids on distributed memory computers. The algorithm has been implemented within a general-purpose commercial finite element package used in polymer processing applications. Results obtained on the Intel iPSC/860 computer demonstrate high parallel efficiency in complex flow problems. However, since the computational load is unknown a priori, load balancing is a challenging issue. We have developed an adaptive allocation strategy which dynamically reallocates the work load to the processors based upon the history of the computational procedure. We compare the results obtained with the adaptive and static scheduling schemes.

  16. Reusable Component Model Development Approach for Parallel and Distributed Simulation

    PubMed Central

    Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

    2014-01-01

    Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

  17. Determining the significance of associations between two series of discrete events : bootstrap methods /

    SciTech Connect

    Niehof, Jonathan T.; Morley, Steven K.

    2012-01-01

    We review and develop techniques to determine associations between series of discrete events. The bootstrap, a nonparametric statistical method, allows the determination of the significance of associations with minimal assumptions about the underlying processes. We find the key requirement for this method: one of the series must be widely spaced in time to guarantee the theoretical applicability of the bootstrap. If this condition is met, the calculated significance passes a reasonableness test. We conclude with some potential future extensions and caveats on the applicability of these methods. The techniques presented have been implemented in a Python-based software toolkit.

  18. Stochastic Event Counter for Discrete-Event Systems Under Unreliable Observations

    SciTech Connect

    Tae-Sic Yoo; Humberto E. Garcia

    2008-06-01

    This paper addresses the issues of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). First, we develop a noble recursive procedure that updates active counter information state sequentially with available observations. In general, the cardinality of active counter information state is unbounded, which makes the exact recursion infeasible computationally. To overcome this difficulty, we develop an approximated recursive procedure that regulates and bounds the size of active counter information state. Using the approximated active counting information state, we give an approximated minimum mean square error (MMSE) counter. The developed algorithms are then applied to count special routing events in a material flow system.

  19. Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems

    NASA Astrophysics Data System (ADS)

    Cai, K.; Wonham, W. M.

    2009-03-01

    A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the 'easiest' and 'hardest' boundary cases of localization are discussed.

  20. Sequential Window Diagnoser for Discrete-Event Systems Under Unreliable Observations

    SciTech Connect

    Wen-Chiao Lin; Humberto E. Garcia; David Thorsley; Tae-Sic Yoo

    2009-09-01

    This paper addresses the issue of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). Developed diagnosers referred to as sequential window diagnosers (SWDs) utilize the stochastic diagnoser probability transition matrices developed in [9] along with a resetting mechanism that allows on-line monitoring of special event occurrences. To illustrate their performance, the SWDs are applied to detect and count the occurrence of special events in a particular DEDS. Results show that SWDs are able to accurately track the number of times special events occur.

  1. Xyce Parallel Electronic Simulator Users Guide Version 6.2.

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-09-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  2. Parallelizing a DNA simulation code for the Cray MTA-2.

    PubMed

    Bokhari, Shahid H; Glaser, Matthew A; Jordan, Harry F; Lansac, Yves; Sauer, Jon R; Van Zeghbroeck, Bart

    2002-01-01

    The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by Field Effect Transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers. PMID:15838145

  3. Casting Pearls Ballistically: Efficient Massively Parallel Simulation of Particle Deposition

    NASA Astrophysics Data System (ADS)

    Lubachevsky, Boris D.; Privman, Vladimir; Roy, Subhas C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are "cast" vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation.

  4. Casting pearls ballistically: Efficient massively parallel simulation of particle deposition

    SciTech Connect

    Lubachevsky, B.D.; Privman, V.; Roy, S.C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.

  5. Numerical simulation of supersonic wake flow with parallel computers

    SciTech Connect

    Wong, C.C.; Soetrisno, M.

    1995-07-01

    Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.

  6. Modularized Parallel Neutron Instrument Simulation on the TeraGrid

    SciTech Connect

    Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E

    2007-01-01

    In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.

  7. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  8. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  9. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  10. Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

    SciTech Connect

    Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

    2005-06-01

    This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to

  11. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  12. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  13. Massively Parallel Processing for Fast and Accurate Stamping Simulations

    NASA Astrophysics Data System (ADS)

    Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

    2005-08-01

    The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

  14. Mapping a battlefield simulation onto message-passing parallel architectures

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1987-01-01

    Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.

  15. State feedback control of real-time discrete event systems with infinite states

    NASA Astrophysics Data System (ADS)

    Park, Seong-Jin; Cho, Kwang-Hyun

    2015-05-01

    In this paper, we study a state feedback supervisory control of timed discrete event systems (TDESs) with infinite number of states modelled as timed automata. To this end, we represent a timed automaton with infinite number of untimed states (called locations) by a finite set of conditional assignment statements. Predicates and predicate transformers are employed to finitely represent the behaviour and specification of a TDES with infinite number of locations. In addition, the notion of clock regions in timed automata is used to identify the reachable states of a TDES with an infinite time space. For a real-time specification described as a predicate, we present the controllability condition for the existence of a state feedback supervisor that restricts the behaviour of the controlled TDES within the specification.

  16. Decidability for a temporal logic used in discrete-event system analysis

    NASA Technical Reports Server (NTRS)

    Knight, J. F.; Passino, K. M.

    1990-01-01

    The type of plant considered is one that can be modeled by a nondeterministic finite-state machine P. The regulator is a deterministic finite state machine R. The closed-loop system is formed by connecting P and R in a regulator configuration. Formulas in a propositional temporal language are used to describe the behavior of the closed-loop system. It is shown that there is a mechanical procedure which, for a given P and R, and a temporal formula Psi, will determine in a finite number of steps whether or not Psi must be true. This 'decidability' result could be proven using other known results on temporal logic. The proof given here shows that the behavior of the closed-loop system may safely be assumed to be ultimately periodic. The results are illustrated on two discrete-event system examples.

  17. Fault detection and isolation in manufacturing systems with an identified discrete event model

    NASA Astrophysics Data System (ADS)

    Roth, Matthias; Schneider, Stefan; Lesage, Jean-Jacques; Litz, Lothar

    2012-10-01

    In this article a generic method for fault detection and isolation (FDI) in manufacturing systems considered as discrete event systems (DES) is presented. The method uses an identified model of the closed-loop of plant and controller built on the basis of observed fault-free system behaviour. An identification algorithm known from literature is used to determine the fault detection model in form of a non-deterministic automaton. New results of how to parameterise this algorithm are reported. To assess the fault detection capability of an identified automaton, probabilistic measures are proposed. For fault isolation, the concept of residuals adapted for DES is used by defining appropriate set operations representing generic fault symptoms. The method is applied to a case study system.

  18. Exception handling controllers: An application of pushdown systems to discrete event control

    SciTech Connect

    Griffin, Christopher H

    2008-01-01

    Recent work by the author has extended the Supervisory Control Theory to include the class of control languages defined by pushdown machines. A pushdown machine is a finite state machine extended by an infinite stack memory. In this paper, we define a specific type of deterministic pushdown machine that is particularly useful as a discrete event controller. Checking controllability of pushdown machines requires computing the complement of the controller machine. We show that Exception Handling Controllers have the property that algorithms for taking their complements and determining their prefix closures are nearly identical to the algorithms available for finite state machines. Further, they exhibit an important property that makes checking for controllability extremely simple. Hence, they maintain the simplicity of the finite state machine, while providing the extra power associated with a pushdown stack memory. We provide an example of a useful control specification that cannot be implemented using a finite state machine, but can be implemented using an Exception Handling Controller.

  19. A Decision Tool that Combines Discrete Event Software Process Models with System Dynamics Pieces for Software Development Cost Estimation and Analysis

    NASA Technical Reports Server (NTRS)

    Mizell, Carolyn Barrett; Malone, Linda

    2007-01-01

    The development process for a large software development project is very complex and dependent on many variables that are dynamic and interrelated. Factors such as size, productivity and defect injection rates will have substantial impact on the project in terms of cost and schedule. These factors can be affected by the intricacies of the process itself as well as human behavior because the process is very labor intensive. The complex nature of the development process can be investigated with software development process models that utilize discrete event simulation to analyze the effects of process changes. The organizational environment and its effects on the workforce can be analyzed with system dynamics that utilizes continuous simulation. Each has unique strengths and the benefits of both types can be exploited by combining a system dynamics model and a discrete event process model. This paper will demonstrate how the two types of models can be combined to investigate the impacts of human resource interactions on productivity and ultimately on cost and schedule.

  20. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  1. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David

    1992-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  2. Xyce Parallel Electronic Simulator Users Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are

  3. Numerical Simulation of Flow Field Within Parallel Plate Plastometer

    NASA Technical Reports Server (NTRS)

    Antar, Basil N.

    2002-01-01

    Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.

  4. Massively parallel simulations of multiphase flows using Lattice Boltzmann methods

    NASA Astrophysics Data System (ADS)

    Ahrenholz, Benjamin

    2010-03-01

    In the last two decades the lattice Boltzmann method (LBM) has matured as an alternative and efficient numerical scheme for the simulation of fluid flows and transport problems. Unlike conventional numerical schemes based on discretizations of macroscopic continuum equations, the LBM is based on microscopic models and mesoscopic kinetic equations. The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics of microscopic or mesoscopic processes so that the macroscopic averaged properties obey the desired macroscopic equations. Especially applications involving interfacial dynamics, complex and/or changing boundaries and complicated constitutive relationships which can be derived from a microscopic picture are suitable for the LBM. In this talk a modified and optimized version of a Gunstensen color model is presented to describe the dynamics of the fluid/fluid interface where the flow field is based on a multi-relaxation-time model. Based on that modeling approach validation studies of contact line motion are shown. Due to the fact that the LB method generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallelization. Hence, it is possible to perform efficient simulations in complex geometries at a large scale by massively parallel computations. Here, the results of drainage and imbibition (Degree of Freedom > 2E11) in natural porous media gained from microtomography methods are presented. Those fully resolved pore scale simulations are essential for a better understanding of the physical processes in porous media and therefore important for the determination of constitutive relationships.

  5. High Performance Parallel Methods for Space Weather Simulations

    NASA Technical Reports Server (NTRS)

    Hunter, Paul (Technical Monitor); Gombosi, Tamas I.

    2003-01-01

    This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.

  6. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  7. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  8. Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

    NASA Technical Reports Server (NTRS)

    Kiris, Cetin C.; Kwak, Dochan; Chan, William

    2000-01-01

    This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.

  9. A Generic Scheduling Simulator for High Performance Parallel Computers

    SciTech Connect

    Yoo, B S; Choi, G S; Jette, M A

    2001-08-01

    It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of

  10. Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

    SciTech Connect

    Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.

    1997-11-01

    An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

  11. Roadmap for efficient parallelization of breast anatomy simulation

    NASA Astrophysics Data System (ADS)

    Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.

    2012-03-01

    A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.

  12. Parallel Molecular Dynamics Stencil : a new parallel computing environment for a large-scale molecular dynamics simulation of solids

    NASA Astrophysics Data System (ADS)

    Shimizu, Futoshi; Kimizuka, Hajime; Kaburaki, Hideo

    2002-08-01

    A new parallel computing environment, called as ``Parallel Molecular Dynamics Stencil'', has been developed to carry out a large-scale short-range molecular dynamics simulation of solids. The stencil is written in C language using MPI for parallelization and designed successfully to separate and conceal parts of the programs describing cutoff schemes and parallel algorithms for data communication. This has been made possible by introducing the concept of image atoms. Therefore, only a sequential programming of the force calculation routine is required for executing the stencil in parallel environment. Typical molecular dynamics routines, such as various ensembles, time integration methods, and empirical potentials, have been implemented in the stencil. In the presentation, the performance of the stencil on parallel computers of Hitachi, IBM, SGI, and PC-cluster using the models of Lennard-Jones and the EAM type potentials for fracture problem will be reported.

  13. Parallel grid library for rapid and flexible simulation development

    NASA Astrophysics Data System (ADS)

    Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

    2013-04-01

    We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and

  14. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  15. Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations

    SciTech Connect

    Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia

    2010-08-01

    Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.

  16. Parallel finite element simulation of large ram-air parachutes

    NASA Astrophysics Data System (ADS)

    Kalro, V.; Aliabadi, S.; Garrard, W.; Tezduyar, T.; Mittal, S.; Stein, K.

    1997-06-01

    In the near future, large ram-air parachutes are expected to provide the capability of delivering 21 ton payloads from altitudes as high as 25,000 ft. In development and test and evaluation of these parachutes the size of the parachute needed and the deployment stages involved make high-performance computing (HPC) simulations a desirable alternative to costly airdrop tests. Although computational simulations based on realistic, 3D, time-dependent models will continue to be a major computational challenge, advanced finite element simulation techniques recently developed for this purpose and the execution of these techniques on HPC platforms are significant steps in the direction to meet this challenge. In this paper, two approaches for analysis of the inflation and gliding of ram-air parachutes are presented. In one of the approaches the point mass flight mechanics equations are solved with the time-varying drag and lift areas obtained from empirical data. This approach is limited to parachutes with similar configurations to those for which data are available. The other approach is 3D finite element computations based on the Navier-Stokes equations governing the airflow around the parachute canopy and Newtons law of motion governing the 3D dynamics of the canopy, with the forces acting on the canopy calculated from the simulated flow field. At the earlier stages of canopy inflation the parachute is modelled as an expanding box, whereas at the later stages, as it expands, the box transforms to a parafoil and glides. These finite element computations are carried out on the massively parallel supercomputers CRAY T3D and Thinking Machines CM-5, typically with millions of coupled, non-linear finite element equations solved simultaneously at every time step or pseudo-time step of the simulation.

  17. MPSim: A Massively Parallel General Simulation Program for Materials

    NASA Astrophysics Data System (ADS)

    Iotov, Mihail; Gao, Guanghua; Vaidehi, Nagarajan; Cagin, Tahir; Goddard, William A., III

    1997-08-01

    In this talk, we describe a general purpose Massively Parallel Simulation (MPSim) program used for computational materials science and life sciences. We also will present scaling aspects of the program along with several case studies. The program incorporates highly efficient CMM method to accurately calculate the interactions. For studying bulk materials, the program uses the Reduced CMM to account for infinite range sums. The software embodies various advanced molecular dynamics algorithms, energy and structure optimization techniques with a set of analysis tools suitable for large scale structures. The applications using the program range amorphous polymers, liquid-polymer interfaces, large viruses, million atom clusters, surfaces, gas diffusion in polymers. Program is originally developed on KSR in an object oriented fashion and is ported to SGI-PC, and HP-Examplar. Message Passing version is originally implemented on Intel Paragon using NX, then MPI and later tested on Cray T3D, and IBM SP2 platforms.

  18. Parallelizing N-Body Simulations on a Heterogeneous Cluster

    NASA Astrophysics Data System (ADS)

    Stenborg, T. N.

    2009-10-01

    This thesis evaluates quantitatively the effectiveness of a new technique for parallelising direct gravitational N-body simulations on a heterogeneous computing cluster. In addition to being an investigation into how a specific computational physics task can be optimally load balanced across the heterogeneity factors of a distributed computing cluster, it is also, more generally, a case study in effective heterogeneous parallelisation of an all-pairs programming task. If high-performance computing clusters are not designed to be heterogeneous initially, they tend to become so over time as new nodes are added, or existing nodes are replaced or upgraded. As a result, effective techniques for application parallelisation on heterogeneous clusters are needed if maximum cluster utilisation is to be achieved and is an active area of research. A custom C/MPI parallel particle-particle N-body simulator was developed, validated and deployed for this evaluation. Simulation communication proceeds over cluster nodes arranged in a logical ring and employs nonblocking message passing to encourage overlap of communication with computation. Redundant calculations arising from force symmetry given by Newton's third law are removed by combining chordal data transfer of accumulated forces with ring passing data transfer. Heterogeneity in node computation speed is addressed by decomposing system data across nodes in proportion to node computation speed, in conjunction with use of evenly sized communication buffers. This scheme is shown experimentally to have some potential in improving simulation performance in comparison with an even decomposition of data across nodes. Techniques for further heterogeneous cluster load balancing are discussed and remain an opportunity for further work.

  19. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    SciTech Connect

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-28

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  20. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    PubMed Central

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-01-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887

  1. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    NASA Astrophysics Data System (ADS)

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  2. Investigation of reflective notching with massively parallel simulation

    NASA Astrophysics Data System (ADS)

    Tadros, Karim H.; Neureuther, Andrew R.; Gamelin, John K.; Guerrieri, Roberto

    1990-06-01

    A massively parallel simulation program TEMPEST is used to investigate the role of topography in generating reflective notching and to study the possibility of reducing effects through the introduction of special properties of resists and antireflection coating materials. The emphasis is on examining physical scattering mechanisms such as focused specular reflections resist thickness interference effects reflections from substrate grains and focusing of incident light by the resist curvature. Specular reflection from topography can focus incident radiation causing a 10-fold increase in effective exposure. Further complications such as dimples in the surface of positive resist features can result from a second reflection of focused energy by the resist/air interface. Variations in line-edge exposure due to substrate grain structure are primarily specular in nature and can become significant for grains larger than )tresi Local exposure variations due to vertical standing waves and changes in energy coupling due to changes in resist thickness are displaced laterally and are significant effects even though they are slightly less severe than vertical wave propagation theory suggests. Focusing effects due to refraction by the curved surface of the resist produce only minor changes in exposure. Increased resist contrast and resist absorption offer some improvement in reducing notching effects though minimizing substrate reflectivity is more effective. CPU time using 32 virtual nodes to simulate a 4 pm by 2 pm isolated domain with 13 bleaching steps was 30 minutes

  3. Humans can integrate feedback of discrete events in their sensorimotor control of a robotic hand

    PubMed Central

    Segil, Jacob L.; Clemente, Francesco; Weir, Richard F. ff; Edin, Benoni

    2015-01-01

    Providing functionally effective sensory feedback to users of prosthetics is a largely unsolved challenge. Traditional solutions require high band-widths for providing feedback for the control of manipulation and yet have been largely unsuccessful. In this study, we have explored a strategy that relies on temporally discrete sensory feedback that is technically simple to provide. According to the Discrete Event-driven Sensory feedback Control (DESC) policy, motor tasks in humans are organized in phases delimited by means of sensory encoded discrete mechanical events. To explore the applicability of DESC for control, we designed a paradigm in which healthy humans operated an artificial robot hand to lift and replace an instrumented object, a task that can readily be learned and mastered under visual control. Assuming that the central nervous system of humans naturally organizes motor tasks based on a strategy akin to DESC, we delivered short-lasting vibrotactile feedback related to events that are known to forcefully affect progression of the grasp-lift-and-hold task. After training, we determined whether the artificial feedback had been integrated with the sensorimotor control by introducing short delays and we indeed observed that the participants significantly delayed subsequent phases of the task. This study thus gives support to the DESC policy hypothesis. Moreover, it demonstrates that humans can integrate temporally discrete sensory feedback while controlling an artificial hand and invites further studies in which inexpensive, noninvasive technology could be used in clever ways to provide physiologically appropriate sensory feedback in upper limb prosthetics with much lower band-width requirements than with traditional solutions. PMID:24992899

  4. Humans can integrate feedback of discrete events in their sensorimotor control of a robotic hand.

    PubMed

    Cipriani, Christian; Segil, Jacob L; Clemente, Francesco; ff Weir, Richard F; Edin, Benoni

    2014-11-01

    Providing functionally effective sensory feedback to users of prosthetics is a largely unsolved challenge. Traditional solutions require high band-widths for providing feedback for the control of manipulation and yet have been largely unsuccessful. In this study, we have explored a strategy that relies on temporally discrete sensory feedback that is technically simple to provide. According to the Discrete Event-driven Sensory feedback Control (DESC) policy, motor tasks in humans are organized in phases delimited by means of sensory encoded discrete mechanical events. To explore the applicability of DESC for control, we designed a paradigm in which healthy humans operated an artificial robot hand to lift and replace an instrumented object, a task that can readily be learned and mastered under visual control. Assuming that the central nervous system of humans naturally organizes motor tasks based on a strategy akin to DESC, we delivered short-lasting vibrotactile feedback related to events that are known to forcefully affect progression of the grasp-lift-and-hold task. After training, we determined whether the artificial feedback had been integrated with the sensorimotor control by introducing short delays and we indeed observed that the participants significantly delayed subsequent phases of the task. This study thus gives support to the DESC policy hypothesis. Moreover, it demonstrates that humans can integrate temporally discrete sensory feedback while controlling an artificial hand and invites further studies in which inexpensive, noninvasive technology could be used in clever ways to provide physiologically appropriate sensory feedback in upper limb prosthetics with much lower band-width requirements than with traditional solutions. PMID:24992899

  5. Xyce Parallel Electronic Simulator Reference Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  6. Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

    NASA Astrophysics Data System (ADS)

    Furuichi, M.; Nishiura, D.

    2015-12-01

    Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our

  7. Rasterizing geological models for parallel finite difference simulation using seismic simulation as an example

    NASA Astrophysics Data System (ADS)

    Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan

    2016-01-01

    3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.

  8. Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

    NASA Technical Reports Server (NTRS)

    Baganoff, Donald

    1996-01-01

    The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

  9. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  10. A natural partitioning scheme for parallel simulation of multibody systems

    NASA Technical Reports Server (NTRS)

    Chiou, J. C.; Park, K. C.; Farhat, C.

    1993-01-01

    A parallel partitioning scheme based on physical-co-ordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure and a Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

  11. A natural partitioning scheme for parallel simulation of multibody systems

    NASA Technical Reports Server (NTRS)

    Chiou, J. C.; Park, K. C.; Farhat, C.

    1991-01-01

    A parallel partitioning scheme based on physical-coordinate variables is presented to systematically eliminate system constraint forces and yield the equations of motion of multibody dynamics systems in terms of their independent coordinates. Key features of the present scheme include an explicit determination of the independent coordinates, a parallel construction of the null space matrix of the constraint Jacobian matrix, an easy incorporation of the previously developed two-stage staggered solution procedure, and Schur complement based parallel preconditioned conjugate gradient numerical algorithm.

  12. Scalable Parallel Formulations of the Barnes-Hut Method for n-Body Simulations

    NASA Astrophysics Data System (ADS)

    Grama, Ananth Y.; Kumar, Vipin; Sameh, Ahmed

    In this paper, we present two new parallel formulations of the Barnes-Hut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation which combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. The second parallel formulation is inspired by two currently best known parallel algorithms for the Barnes-Hut method. We present an experimental evaluation of these schemes on a 256 processor nCUBE2 parallel computer for an astrophysical simulation.

  13. Parallel climate model (PCM) control and transient simulations

    NASA Astrophysics Data System (ADS)

    Washington, W. M.; Weatherly, J. W.; Meehl, G. A.; Semtner, A. J., Jr.; Bettge, T. W.; Craig, A. P.; Strand, W. G., Jr.; Arblaster, J.; Wayland, V. B.; James, R.; Zhang, Y.

    The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3) and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on several distributed and shared memory computer systems. The coupling method is similar to that used in the NCAR Climate System Model (CSM) in that a flux coupler ties the components together, with interpolations between the different grids of the component models. Flux adjustments are not used in the PCM. The ocean component has 2/3° average horizontal grid spacing with 32 vertical levels and a free surface that allows calculation of sea level changes. Near the equator, the grid spacing is approximately 1/2° in latitude to better capture the ocean equatorial dynamics. The North Pole is rotated over northern North America thus producing resolution smaller than 2/3° in the North Atlantic where the sinking part of the world conveyor circulation largely takes place. Because this ocean model component does not have a computational point at the North Pole, the Arctic Ocean circulation systems are more realistic and similar to the observed. The elastic viscous plastic sea ice model has a grid spacing of 27km to represent small-scale features such as ice transport through the Canadian Archipelago and the East Greenland current region. Results from a 300year present-day coupled climate control simulation are presented, as well as for a transient 1% per year compound CO2 increase experiment which shows a global warming of 1.27°C for a 10year average at the doubling point of CO2 and 2.89°C at the quadrupling point. There is a gradual warming beyond the doubling and quadrupling points with CO2 held constant. Globally averaged sea level rise at the time of CO2 doubling is approximately 7cm and at the

  14. Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2008-01-01

    Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.

  15. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    SciTech Connect

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  16. Efficient solid state NMR powder simulations using SMP and MPP parallel computation.

    PubMed

    Kristensen, Jørgen Holm; Farnan, Ian

    2003-04-01

    Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations. PMID:12713968

  17. Empirical development of parallelization guidelines for time-driven simulation. Master's thesis

    SciTech Connect

    Huson, M.L.

    1989-12-01

    Distributed simulation is an area of research which offers great promise for speeding up simulations. Program parallelization is usually an iterative process requiring several attempts to produce an efficient parallel implementation of a sequential program. This is due to the lack of any standards or guidelines for program parallelization. In this research effort a Ballistic Missile Defense (BMD) time-driven simulation program, developed by DESE Research and Engineering, was used as a test vehicle for investigating parallelization options for distributed and shared memory architectures. Implementations were developed to address issues of functional versus data program decomposition, computation versus communications overhead, and shared versus distributed memory architectures. Performance data collected from each implementation was used to develop guidelines for implementing parallel versions of sequential time-driven simulations. These guidelines were based on the relative performance of the various implementations and on general observations made during the course of the research.

  18. Parallel direct numerical simulation of three-dimensional spray formation

    NASA Astrophysics Data System (ADS)

    Chergui, Jalel; Juric, Damir; Shin, Seungwon; Kahouadji, Lyes; Matar, Omar

    2015-11-01

    We present numerical results for the breakup mechanism of a liquid jet surrounded by a fast coaxial flow of air with density ratio (water/air) ~ 1000 and kinematic viscosity ratio ~ 60. We use code BLUE, a three-dimensional, two-phase, high performance, parallel numerical code based on a hybrid Front-Tracking/Level Set algorithm for Lagrangian tracking of arbitrarily deformable phase interfaces and a precise treatment of surface tension forces. The parallelization of the code is based on the technique of domain decomposition where the velocity field is solved by a parallel GMRes method for the viscous terms and the pressure by a parallel multigrid/GMRes method. Communication is handled by MPI message passing procedures. The interface method is also parallelized and defines the interface both by a discontinuous density field as well as by a triangular Lagrangian mesh and allows the interface to undergo large deformations including the rupture and/or coalescence of interfaces. EPSRC Programme Grant, MEMPHIS, EP/K0039761/1.

  19. A sweep algorithm for massively parallel simulation of circuit-switched networks

    NASA Technical Reports Server (NTRS)

    Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

    1992-01-01

    A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

  20. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  1. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  2. Towards parallel I/O in finite element simulations

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Pramono, Eddy; Felippa, Carlos

    1989-01-01

    I/O issues in finite element analysis on parallel processors are addressed. Viable solutions for both local and shared memory multiprocessors are presented. The approach is simple but limited by currently available hardware and software systems. Implementation is carried out on a CRAY-2 system. Performance results are reported.

  3. Comparison of serial and parallel simulations of a corridor fire using FDS

    NASA Astrophysics Data System (ADS)

    Valasek, L.

    2015-09-01

    Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.

  4. Parallel computing in enterprise modeling.

    SciTech Connect

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  5. Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

    NASA Astrophysics Data System (ADS)

    Fathany, Maulana Yusuf; Fuada, Syifaul; Lawu, Braham Lawas; Sulthoni, Muhammad Amin

    2016-04-01

    This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.

  6. Parallel Adaptive Multi-Mechanics Simulations using Diablo

    SciTech Connect

    Parsons, D; Solberg, J

    2004-12-03

    Coupled multi-mechanics simulations (such as thermal-stress and fluidstructure interaction problems) are of substantial interest to engineering analysts. In addition, adaptive mesh refinement techniques present an attractive alternative to current mesh generation procedures and provide quantitative error bounds that can be used for model verification. This paper discusses spatially adaptive multi-mechanics implicit simulations using the Diablo computer code. (U)

  7. A high resolution finite volume method for efficient parallel simulation of casting processes on unstructured meshes

    SciTech Connect

    Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.

    1997-03-01

    We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

  8. Large Eddy simulation of parallel blade-vortex interaction

    NASA Astrophysics Data System (ADS)

    Felten, Frederic; Lund, Thomas

    2002-11-01

    Helicopter Blade-Vortex Interaction (BVI) generally occurs under certain conditions of powered descent or during extreme maneuvering. The vibration and acoustic problems associated with the interaction of rotor tip vortices and the following blades is a major aerodynamic concern for the helicopter community. Numerous experimental and computational studies have been done over the last two decades in order to gain a better understanding of the physical mechanisms involved in BVI. The most severe interaction, in terms of generated noise, happens when the vortex filament is parallel to the blade, thus affecting a great portion of it. The majority of the previous numerical studies of parallel BVI fall within a potential flow framework. Some Navier-Stokes approaches using dissipative numerical methods and RANS-type turbulence models have also been attempted, but with limited success. The current investigation makes use of an incompressible, non-dissipative, kinetic energy conserving collocated mesh scheme in conjunction with a dynamic subgrid-scale model. The concentrated tip vortex is not attenuated as it is convected downstream and over a NACA-0012 airfoil. The lift, drag, moment and pressure coefficients induced by the passage of the vortex are monitored in time and compared with experimental data.

  9. Parallelized modelling and solution scheme for hierarchically scaled simulations

    NASA Technical Reports Server (NTRS)

    Padovan, Joe

    1995-01-01

    This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

  10. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  11. Xyce parallel electronic simulator users guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  12. Pelegant : a parallel accelerator simulation code for electron generation and tracking.

    SciTech Connect

    Wang, Y.; Borland, M. D.; Accelerator Systems Division

    2006-01-01

    elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

  13. Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers

    SciTech Connect

    Vashishta, Priya; Kalia, Rajiv

    2005-02-24

    Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.

  14. A parallel implementation of the Cellular Potts Model for simulation of cell-based morphogenesis

    PubMed Central

    Chen, Nan; Glazier, James A.; Izaguirre, Jesús A.; Alber, Mark S.

    2007-01-01

    The Cellular Potts Model (CPM) has been used in a wide variety of biological simulations. However, most current CPM implementations use a sequential modified Metropolis algorithm which restricts the size of simulations. In this paper we present a parallel CPM algorithm for simulations of morphogenesis, which includes cell–cell adhesion, a cell volume constraint, and cell haptotaxis. The algorithm uses appropriate data structures and checkerboard subgrids for parallelization. Communication and updating algorithms synchronize properties of cells simulated on different processor nodes. Tests show that the parallel algorithm has good scalability, permitting large-scale simulations of cell morphogenesis (107 or more cells) and broadening the scope of CPM applications. The new algorithm satisfies the balance condition, which is sufficient for convergence of the underlying Markov chain. PMID:18084624

  15. Virtual reality visualization of parallel molecular dynamics simulation

    SciTech Connect

    Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.

    1995-12-31

    When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.

  16. Toward parallel, adaptive mesh refinement for chemically reacting flow simulations

    SciTech Connect

    Devine, K.D.; Shadid, J.N.; Salinger, A.G. Hutchinson, S.A.; Hennigan, G.L.

    1997-12-01

    Adaptive numerical methods offer greater efficiency than traditional numerical methods by concentrating computational effort in regions of the problem domain where the solution is difficult to obtain. In this paper, the authors describe progress toward adding mesh refinement to MPSalsa, a computer program developed at Sandia National laboratories to solve coupled three-dimensional fluid flow and detailed reaction chemistry systems for modeling chemically reacting flow on large-scale parallel computers. Data structures that support refinement and dynamic load-balancing are discussed. Results using uniform refinement with mesh sequencing to improve convergence to steady-state solutions are also presented. Three examples are presented: a lid driven cavity, a thermal convection flow, and a tilted chemical vapor deposition reactor.

  17. Parallel performance optimizations on unstructured mesh-based simulations

    SciTech Connect

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

  18. An Optimization Algorithm for Multipath Parallel Allocation for Service Resource in the Simulation Task Workflow

    PubMed Central

    Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

    2014-01-01

    Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506

  19. Massively Parallel Reactive and Quantum Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Vashishta, Priya

    2015-03-01

    In this talk I will discuss two simulations: Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on a 163,840-processor BlueGene/P supercomputer to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit on the surface. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated. Quantum molecular dynamics (QMD) simulations are performed on 786,432-processor Blue Gene/Q to study on-demand production of hydrogen gas from water using Al nanoclusters. QMD simulations reveal rapid hydrogen production from water by an Al nanocluster. We find a low activation-barrier mechanism, in which a pair of Lewis acid and base sites on the Aln surface preferentially catalyzes hydrogen production. I will also discuss on-demand production of hydrogen gas from water using and LiAl alloy particles. Research reported in this lecture was carried in collaboration with Rajiv Kalia, Aiichiro Nakano and Ken-ichi Nomura from the University of Southern California, and Fuyuki Shimojo and Kohei Shimamura from Kumamoto University, Japan.

  20. Parallel performance optimizations on unstructured mesh-based simulations

    DOE PAGESBeta

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  1. Exploiting quantum parallelism to simulate quantum random many-body systems.

    PubMed

    Paredes, B; Verstraete, F; Cirac, J I

    2005-09-30

    We present an algorithm that exploits quantum parallelism to simulate randomness in a quantum system. In our scheme, all possible realizations of the random parameters are encoded quantum mechanically in a superposition state of an auxiliary system. We show how our algorithm allows for the efficient simulation of dynamics of quantum random spin chains with known numerical methods. We propose an experimental realization based on atoms in optical lattices in which disorder could be simulated in parallel and in a controlled way through the interaction with another atomic species. PMID:16241634

  2. Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

    SciTech Connect

    Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

    2011-09-06

    We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

  3. Parallel simulation of subsonic fluid dynamics on a cluster of workstations

    NASA Astrophysics Data System (ADS)

    Skordos, Panayotis A.

    1994-11-01

    An effective approach of simulating fluid dynamics on a cluster of non-dedicated workstations is presented. The approach uses local interaction algorithms, small communication capacity, and automatic migration of parallel processes from busy hosts to free hosts. The approach is well-suited for simulating subsonic flow problems which involve both hydrodynamics and acoustic waves, for example, the flow of air inside wind musical instruments. Typical simulations achieve 80% parallel efficiency (speedup/processors) using 20 HP-Apollo workstations. Detailed measurements of the parallel efficiency of 2D and 3D simulations are presented, and a theoretical model of efficiency is developed which fits closely the measurements. Two numerical methods of fluid dynamics are tested: explicit finite differences, and the lattice Boltzmann method.

  4. Characterization of parallel-hole collimator using Monte Carlo Simulation

    PubMed Central

    Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh

    2015-01-01

    Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730

  5. Partitioning and packing mathematical simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.; Milner, E. J.

    1986-01-01

    The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

  6. Developing a real-time emulation of multiresolutional control architectures for complex, discrete-event systems

    SciTech Connect

    Davis, W.J.; Macro, J.G.; Brook, A.L.

    1996-12-31

    This paper first discusses an object-oriented, control architecture and then applies the architecture to produce a real-time software emulator for the Rapid Acquisition of Manufactured Parts (RAMP) flexible manufacturing system (FMS). In specifying the control architecture, the coordinated object is first defined as the primary modeling element. These coordinated objects are then integrated into a Recursive, Object-Oriented Coordination Hierarchy. A new simulation methodology, the Hierarchical Object-Oriented Programmable Logic Simulator, is then employed to model the interactions among the coordinated objects. The final step in implementing the emulator is to distribute the models of the coordinated objects over a network of computers and to synchronize their operation to a real-time clock. The paper then introduces the Hierarchical Subsystem Controller as an intelligent controller for the coordinated object. The proposed approach to intelligent control is then compared to the concept of multiresolutional semiosis that has been developed by Dr. Alex Meystel. Finally, the plans for implementing an intelligent controller for the RAMP FMS are discussed.

  7. Parallel FEM Simulation of Electromechanics in the Heart

    NASA Astrophysics Data System (ADS)

    Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

    2011-11-01

    Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.

  8. LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS

    SciTech Connect

    R. RYNE; J. QIANG

    2000-08-01

    In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.

  9. A parallel algorithm for transient solid dynamics simulations with contact detection

    SciTech Connect

    Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

    1996-06-01

    Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations.

  10. Monte Carlo simulations of converging laser beam propagating in turbid media with parallel computing

    NASA Astrophysics Data System (ADS)

    Wu, Di; Lu, Jun Q.; Hu, Xin H.; Zhao, S. S.

    1999-11-01

    Due to its flexibility and simplicity, Monte Carlo method is often used to study light propagation in turbid medium where the photons are treated like classic particles being scattered and absorbed randomly based on a radiative transfer theory. However, due to the need of large number of photons to produce statistically significance results, this type of calculations requires large computing resources. To overcome such difficulty, we implemented parallel computing technique into our Monte Carlo simulations. The algorithm is based on the fact that the classic particles are uncorrelated, and the trajectories of multiple photons can be tracked simultaneously. When a beam of focused light incident to the medium, the incident photons are divided into groups according to the available processes on a parallel machine and the calculations are carried out in parallel. Utilizing PVM (Parallel Virtual Machine, a parallel computing software), the parallel programs in both C and FORTRAN are developed on the massive parallel computer Cray T3E at the North Carolina Supercomputer Center and a local PC-cluster network running UNIX/Sun Solaris. The parallel performances of our codes have been excellent on both Cray T3E and the PC clusters. In this paper, we present results on a focusing laser beam propagating through a highly scattering and diluted solution of intralipid. The dependence of the spatial distribution of light near the focal point on the concentration of intralipid solution is studied and its significance is discussed.

  11. Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.

    2016-07-01

    Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.

  12. Scalable simulations for directed self-assembly patterning with the use of GPU parallel computing

    NASA Astrophysics Data System (ADS)

    Yoshimoto, Kenji; Peters, Brandon L.; Khaira, Gurdaman S.; de Pablo, Juan J.

    2012-03-01

    Directed self-assembly (DSA) patterning has been increasingly investigated as an alternative lithographic process for future technology nodes. One of the critical specs for DSA patterning is defects generated through annealing process or by roughness of pre-patterned structure. Due to their high sensitivity to the process and wafer conditions, however, characterization of those defects still remain challenging. DSA simulations can be a powerful tool to predict the formation of the DSA defects. In this work, we propose a new method to perform parallel computing of DSA Monte Carlo (MC) simulations. A consumer graphics card was used to access its hundreds of processing units for parallel computing. By partitioning the simulation system into non-interacting domains, we were able to run MC trial moves in parallel on multiple graphics-processing units (GPUs). Our results show a significant improvement in computational performance.

  13. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  14. Robust large-scale parallel nonlinear solvers for simulations.

    SciTech Connect

    Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson

    2005-11-01

    This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write

  15. Simulation of reflooding on two parallel heated channel by TRACE

    NASA Astrophysics Data System (ADS)

    Zakir, Md. Ghulam

    2016-07-01

    In case of Loss-Of-Coolant accident (LOCA) in a Boiling Water Reactor (BWR), heat generated in the nuclear fuel is not adequately removed because of the decrease of the coolant mass flow rate in the reactor core. This fact leads to an increase of the fuel temperature that can cause damage to the core and leakage of the radioactive fission products. In order to reflood the core and to discontinue the increase of temperature, an Emergency Core Cooling System (ECCS) delivers water under this kind of conditions. This study is an investigation of how the power distribution between two channels can affect the process of reflooding when the emergency water is injected from the top of the channels. The peak cladding temperature (PCT) on LOCA transient for different axial level is determined as well. A thermal-hydraulic system code TRACE has been used. A TRACE model of the two heated channels has been developed, and three hypothetical cases with different power distributions have been studied. Later, a comparison between a simulated and experimental data has been shown as well.

  16. Parallel computation for reservoir thermal simulation: An overlapping domain decomposition approach

    NASA Astrophysics Data System (ADS)

    Wang, Zhongxiao

    2005-11-01

    In this dissertation, we are involved in parallel computing for the thermal simulation of multicomponent, multiphase fluid flow in petroleum reservoirs. We report the development and applications of such a simulator. Unlike many efforts made to parallelize locally the solver of a linear equations system which affects the performance the most, this research takes a global parallelization strategy by decomposing the computational domain into smaller subdomains. This dissertation addresses the domain decomposition techniques and, based on the comparison, adopts an overlapping domain decomposition method. This global parallelization method hands over each subdomain to a single processor of the parallel computer to process. Communication is required when handling overlapping regions between subdomains. For this purpose, MPI (message passing interface) is used for data communication and communication control. A physical and mathematical model is introduced for the reservoir thermal simulation. Numerical tests on two sets of industrial data of practical oilfields indicate that this model and the parallel implementation match the history data accurately. Therefore, we expect to use both the model and the parallel code to predict oil production and guide the design, implementation and real-time fine tuning of new well operating schemes. A new adaptive mechanism to synchronize processes on different processors has been introduced, which not only ensures the computational accuracy but also improves the time performance. To accelerate the convergence rate of iterative solution of the large linear equations systems derived from the discretization of governing equations of our physical and mathematical model in space and time, we adopt the ORTHOMIN method in conjunction with an incomplete LU factorization preconditioning technique. Important improvements have been made in both ORTHOMIN method and incomplete LU factorization in order to enhance time performance without affecting

  17. Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Sawyer, Darren Charles

    1994-01-01

    The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.

  18. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    NASA Technical Reports Server (NTRS)

    Krosel, S. M.; Milner, E. J.

    1982-01-01

    The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

  19. A conflict-free, path-level parallelization approach for sequential simulation algorithms

    NASA Astrophysics Data System (ADS)

    Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.

    2015-07-01

    Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.

  20. Special purpose parallel computer architecture for real-time control and simulation in robotic applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

    1993-01-01

    This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

  1. Virtual Simulator: An infrastructure for design and performance-prediction of massively parallel codes

    NASA Astrophysics Data System (ADS)

    Perumalla, K.; Fujimoto, R.; Pande, S.; Karimabadi, H.; Driscoll, J.; Omelchenko, Y.

    2005-12-01

    Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code.

  2. Parallel simulation of tsunami inundation on a large-scale supercomputer

    NASA Astrophysics Data System (ADS)

    Oishi, Y.; Imamura, F.; Sugawara, D.

    2013-12-01

    An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the

  3. Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics

    SciTech Connect

    Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J.

    1996-09-01

    Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.

  4. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    NASA Technical Reports Server (NTRS)

    Jones, Mark Howard

    1987-01-01

    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

  5. Object-Oriented NeuroSys: Parallel Programs for Simulating Large Networks of Biologically Accurate Neurons

    SciTech Connect

    Pacheco, P; Miller, P; Kim, J; Leese, T; Zabiyaka, Y

    2003-05-07

    Object-oriented NeuroSys (ooNeuroSys) is a collection of programs for simulating very large networks of biologically accurate neurons on distributed memory parallel computers. It includes two principle programs: ooNeuroSys, a parallel program for solving the large systems of ordinary differential equations arising from the interconnected neurons, and Neurondiz, a parallel program for visualizing the results of ooNeuroSys. Both programs are designed to be run on clusters and use the MPI library to obtain parallelism. ooNeuroSys also includes an easy-to-use Python interface. This interface allows neuroscientists to quickly develop and test complex neuron models. Both ooNeuroSys and Neurondiz have a design that allows for both high performance and relative ease of maintenance.

  6. Efficient parallel algorithm for statistical ion track simulations in crystalline materials

    NASA Astrophysics Data System (ADS)

    Jeon, Byoungseon; Grønbech-Jensen, Niels

    2009-02-01

    We present an efficient parallel algorithm for statistical Molecular Dynamics simulations of ion tracks in solids. The method is based on the Rare Event Enhanced Domain following Molecular Dynamics (REED-MD) algorithm, which has been successfully applied to studies of, e.g., ion implantation into crystalline semiconductor wafers. We discuss the strategies for parallelizing the method, and we settle on a host-client type polling scheme in which a multiple of asynchronous processors are continuously fed to the host, which, in turn, distributes the resulting feed-back information to the clients. This real-time feed-back consists of, e.g., cumulative damage information or statistics updates necessary for the cloning in the rare event algorithm. We finally demonstrate the algorithm for radiation effects in a nuclear oxide fuel, and we show the balanced parallel approach with high parallel efficiency in multiple processor configurations.

  7. Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

    NASA Technical Reports Server (NTRS)

    Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

    2009-01-01

    A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.

  8. A parallel computational framework for integrated surface-subsurface flow and transport simulations

    NASA Astrophysics Data System (ADS)

    Park, Y.; Hwang, H.; Sudicky, E. A.

    2010-12-01

    HydroGeoSphere is a 3D control-volume finite element hydrologic model describing fully-integrated surface and subsurface water flow and solute and thermal energy transport. Because the model solves tighly-coupled highly-nonlinear partial differential equations, often applied at regional and continental scales (for example, to analyze the impact of climate change on water resources), high performance computing (HPC) is essential. The target parallelization includes the composition of the Jacobian matrix for the iterative linearization method and the sparse-matrix solver, a preconditioned Bi-CGSTAB. The matrix assembly is parallelized by using a coarse-grained scheme in that the local matrix compositions can be performed independently. The preconditioned Bi-CGSTAB algorithm performs a number of LU substitutions, matrix-vector multiplications, and inner products, where the parallelization of the LU substitution is not trivial. The parallelization of the solver is achieved by partitioning the domain into equal-size subdomains, with an efficient reordering scheme. The computational flow of the Bi-CGSTAB solver is also modified to reduce the parallelization overhead and to be suitable for parallel architectures. The parallelized model is tested on several benchmark simulations which include linear and nonlinear flow problems involving various domain sizes and degrees of hydrologic complexities. The performance is evaluated in terms of computational robustness and efficiency, using standard scaling performance measures. The results of simulation profiling indicate that the efficiency becomes higher with an increasing number of nodes/elements in the mesh, for increasingly nonlinear transient simulations, and with domains of irregular geometry. These characteristics are promising for the large-scale analysis water resources problems involved integrated surface/subsurface flow regimes.

  9. IB: a Monte Carlo Simulation Tool for Neutron Scattering Instrument Design under Parallel Virtual Machine

    SciTech Connect

    Zhao, Jinkui

    2011-01-01

    IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.

  10. Parallel Brownian dynamics simulations with the message-passing and PGAS programming models

    NASA Astrophysics Data System (ADS)

    Teijeiro, C.; Sutmann, G.; Taboada, G. L.; Touriño, J.

    2013-04-01

    The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes.

  11. Xyce parallel electronic simulator reference guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  12. Study of Fracture in SiC by Parallel Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Chatterjee, A.; Omeltchenko, A.; Kalia, R. K.; Vashishta, P.

    1997-03-01

    Large scale molecular-dynamics simulations are performed on parallel architectures to investigate dynamic fracture in SiC. The simulations are based on an empirical bond-order potential proposed by Tersoff.(J. Tersoff, Phys. Rev. B 39), 5566(1989) (M. Tang and S. Yip, Phys. Rev. B 52), 15150(1995) Results will be presented for crack-front morphology, crack-tip speed, and the effect of strain rate on dynamic fracture.

  13. Massively parallel simulation of flow and transport in variably saturated porous and fractured media

    SciTech Connect

    Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten

    2002-01-15

    This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.

  14. A Plane-Parallel Wind Solution for Testing Numerical Simulations of Photoevaporation

    NASA Astrophysics Data System (ADS)

    Hutchison, Mark A.; Laibe, Guillaume

    2016-04-01

    Here, we derive a Parker-wind-like solution for a stratified, plane-parallel atmosphere undergoing photoionisation. The difference compared to the standard Parker solar wind is that the sonic point is crossed only at infinity. The simplicity of the analytic solution makes it a convenient test problem for numerical simulations of photoevaporation in protoplanetary discs.

  15. A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Owen, Jeffrey E.

    1988-01-01

    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

  16. Accelerating Markov chain Monte Carlo simulation through sequential updating and parallel computing

    NASA Astrophysics Data System (ADS)

    Ren, Ruichao

    Monte Carlo simulation is a statistical sampling method used in studies of physical systems with properties that cannot be easily obtained analytically. The phase behavior of the Restricted Primitive Model of electrolyte solutions on the simple cubic lattice is studied using grand canonical Monte Carlo simulations and finite-size scaling techniques. The transition between disordered and ordered, NaCl-like structures is continuous, second-order at high temperatures and discrete, first-order at low temperatures. The line of continuous transitions meets the line of first-order transitions at a tricritical point. A new algorithm-Random Skipping Sequential (RSS) Monte Carl---is proposed, justified and shown analytically to have better mobility over the phase space than the conventional Metropolis algorithm satisfying strict detailed balance. The new algorithm employs sequential updating, and yields greatly enhanced sampling statistics than the Metropolis algorithm with random updating. A parallel version of Markov chain theory is introduced and applied in accelerating Monte Carlo simulation via cluster computing. It is shown that sequential updating is the key to reduce the inter-processor communication or synchronization which slows down parallel simulation with increasing number of processors. Parallel simulation results for the two-dimensional lattice gas model show substantial reduction of simulation time by the new method for systems of large and moderate sizes.

  17. Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

    NASA Technical Reports Server (NTRS)

    Joslin, Ronald D.; Zubair, Mohammad

    1993-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

  18. A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

    PubMed

    Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

    2013-09-15

    A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can

  19. A Queue Simulation Tool for a High Performance Scientific Computing Center

    NASA Technical Reports Server (NTRS)

    Spear, Carrie; McGalliard, James

    2007-01-01

    The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

  20. Parallel peridynamics-SPH simulation of explosion induced soil fragmentation by using OpenMP

    NASA Astrophysics Data System (ADS)

    Fan, Houfu; Li, Shaofan

    2016-06-01

    In this work, we use the OpenMP-based shared-memory parallel programming to implement the recently developed coupling method of state-based peridynamics and smoothed particle hydrodynamics (PD-SPH), and we then employ the program to simulate dynamic soil fragmentation induced by the explosion of the buried explosives. The paper offers detailed technical description and discussion on the PD-SHP coupling algorithm and how to use the OpenMP shared-memory programming to implement such large-scale computation in a desktop environment, with an example to illustrate the basic computing principle and the parallel algorithm structure. In specific, the paper provides a complete OpenMP parallel algorithm for the PD-SPH scheme with the programming and parallelization details. Numerical examples of soil fragmentation caused by the buried explosives are also presented. Results show that the simulation carried out by the OpenMP parallel code is much faster than that by the corresponding serial computer code.

  1. Relationship between parallel faults and stress field in rock mass based on numerical simulation

    NASA Astrophysics Data System (ADS)

    Imai, Y.; Mikada, H.; Goto, T.; Takekawa, J.

    2012-12-01

    Parallel cracks and faults, caused by earthquakes and crustal deformations, are often observed in various scales from regional to laboratory scales. However, the mechanism of formation of these parallel faults has not been quantitatively clarified, yet. Since the stress field plays a key role to the nucleation of parallel faults, it is fundamentally to investigate the failure and the extension of cracks in a large-scale rock mass (not with a laboratory-scale specimen) due to mechanically loaded stress field. In this study, we developed a numerical simulations code for rock mass failures under different loading conditions, and conducted rock failure experiments using this code. We assumed a numerical rock mass consisting of basalt with a rectangular shape for the model. We also assumed the failure of rock mass in accordance with the Mohr-Coulomb criterion, and the distribution of the initial tensile and compressive strength of rock elements to be the Weibull model. In this study, we use the Hamiltonian Particle Method (HPM), one of the particle methods, to represent large deformation and the destruction of materials. Out simulation results suggest that the confining pressure would have dominant influence for the initiation of parallel faults and their conjugates in compressive conditions. We conclude that the shearing force would provoke the propagation of parallel fractures along the shearing direction, but prevent that of fractures to the conjugate direction.

  2. The distributed diagonal force decomposition method for parallelizing molecular dynamics simulations.

    PubMed

    Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

    2011-11-15

    Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007

  3. Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

    NASA Technical Reports Server (NTRS)

    Yamakov, Vesselin I.

    2016-01-01

    This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.

  4. Parallel Simulation Algorithms for the Three Dimensional Strong-Strong Beam-Beam Interaction

    SciTech Connect

    Kabel, A.C.; /SLAC

    2008-03-17

    The strong-strong beam-beam effect is one of the most important effects limiting the luminosity of ring colliders. Little is known about it analytically, so most studies utilize numeric simulations. The two-dimensional realm is readily accessible to workstation-class computers (cf.,e.g.,[1, 2]), while three dimensions, which add effects such as phase averaging and the hourglass effect, require vastly higher amounts of CPU time. Thus, parallelization of three-dimensional simulation techniques is imperative; in the following we discuss parallelization strategies and describe the algorithms used in our simulation code, which will reach almost linear scaling of performance vs. number of CPUs for typical setups.

  5. Parallel 3D Multi-Stage Simulation of a Turbofan Engine

    NASA Technical Reports Server (NTRS)

    Turner, Mark G.; Topp, David A.

    1998-01-01

    A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force

  6. Application of parallel computing to seismic damage process simulation of an arch dam

    NASA Astrophysics Data System (ADS)

    Zhong, Hong; Lin, Gao; Li, Jianbo

    2010-06-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  7. Large-scale numerical simulation of laser propulsion by parallel computing

    NASA Astrophysics Data System (ADS)

    Zeng, Yaoyuan; Zhao, Wentao; Wang, Zhenghua

    2013-05-01

    As one of the most significant methods to study laser propelled rocket, the numerical simulation of laser propulsion has drawn an ever increasing attention at present. Nevertheless, the traditional serial simulation model cannot satisfy the practical needs because of insatiable memory overhead and considerable computation time. In order to solve this problem, we study on a general algorithm for laser propulsion design, and bring about parallelization by using a twolevel hybrid parallel programming model. The total computing domain is decomposed into distributed data spaces, and each partition is assigned to a MPI process. A single step of computation operates in the inter loop level, where a compiler directive is used to split MPI process into several OpenMP threads. Finally, parallel efficiency of hybrid program about two typical configurations on a China-made supercomputer with 4 to 256 cores is compared with pure MPI program. And, the hybrid program exhibits better performance than the pure MPI program on the whole, roughly as expected. The result indicates that our hybrid parallel approach is effective and practical in large-scale numerical simulation of laser propulsion.

  8. Re-forming supercritical quasi-parallel shocks. I - One- and two-dimensional simulations

    NASA Technical Reports Server (NTRS)

    Thomas, V. A.; Winske, D.; Omidi, N.

    1990-01-01

    The process of reforming supercritical quasi-parallel shocks is investigated using one-dimensional and two-dimensional hybrid (particle ion, massless fluid electron) simulations both of shocks and of simpler two-stream interactions. It is found that the supercritical quasi-parallel shock is not steady. Instread of a well-defined shock ramp between upstream and downstream states that remains at a fixed position in the flow, the ramp periodically steepens, broadens, and then reforms upstream of its former position. It is concluded that the wave generation process is localized at the shock ramp and that the reformation process proceeds in the absence of upstream perturbations intersecting the shock.

  9. Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

    SciTech Connect

    Kononenko, Oleksiy

    2015-03-26

    ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.

  10. Parallel simulations of Grover's algorithm for closest match search in neutron monitor data

    NASA Astrophysics Data System (ADS)

    Kussainov, Arman; White, Yelena

    We are studying the parallel implementations of Grover's closest match search algorithm for neutron monitor data analysis. This includes data formatting, and matching quantum parameters to a conventional structure of a chosen programming language and selected experimental data type. We have employed several workload distribution models based on acquired data and search parameters. As a result of these simulations, we have an understanding of potential problems that may arise during configuration of real quantum computational devices and the way they could run tasks in parallel. The work was supported by the Science Committee of the Ministry of Science and Education of the Republic of Kazakhstan Grant #2532/GF3.

  11. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    NASA Astrophysics Data System (ADS)

    Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Páll, Szilárd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik

    2015-09-01

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  12. Application of parallel computing techniques to a large-scale reservoir simulation

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-02-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.

  13. Design of a real-time wind turbine simulator using a custom parallel architecture

    NASA Technical Reports Server (NTRS)

    Hoffman, John A.; Gluck, R.; Sridhar, S.

    1995-01-01

    The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

  14. A method for data handling numerical results in parallel OpenFOAM simulations

    NASA Astrophysics Data System (ADS)

    Anton, Alin; Muntean, Sebastian

    2015-12-01

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit®[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  15. Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

    NASA Technical Reports Server (NTRS)

    Lyons, Daniel T.; Desai, Prasun N.

    2005-01-01

    This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

  16. A method for data handling numerical results in parallel OpenFOAM simulations

    SciTech Connect

    Anton, Alin; Muntean, Sebastian

    2015-12-31

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  17. A new parallel P3M code for very large-scale cosmological simulations

    NASA Astrophysics Data System (ADS)

    MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

    1998-12-01

    We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

  18. Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.

    PubMed

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S

    2003-01-01

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models. PMID:12714301

  19. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  20. Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.; Lichtner, P. C.; Mills, R. T.

    2014-01-01

    To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.

  1. A 3D parallel simulator for crystal growth and solidification in complex alloy systems

    NASA Astrophysics Data System (ADS)

    Nestler, Britta

    2005-02-01

    A 3D parallel simulator is developed to numerically solve the evolution equations of a new non-isothermal phase-field model for crystal growth and solidification in complex alloy systems. The new model and the simulator are capable to simultaneously describe the diffusion processes of multiple components, the phase transitions between multiple phases and the development of the temperature field. Weak and facetted formulations of both, surface energy and kinetic anisotropies are incorporated in the phase-field model. Multicomponent bulk diffusion effects including interdiffusion coefficients as well as diffusion in the interfacial region of phase or grain boundaries are considered. We introduce our parallel simulator that is based on a finite difference discretization including effective adaptive strategies and multigrid methods to reduce computation time and memory usage. The parallelization is realized for distributed as well as shared memory computer architectures using MPI libraries and OpenMP concepts. Applying the new computer model, we present a variety of simulated crystal structures such as dendrites, grains, binary and ternary eutectics in 2D and 3D. The influence of anisotropy on the microstructure evolution shows the formation of facets in preferred crystallographic directions. Phase transformations and solidification processes in a real multi-component alloy can be described by incorporating the physical data (e.g. surface tensions, kinetic coefficients, specific heat, heat and mass diffusion coefficients) and the specific phase diagram (in particular latent heats and melting temperatures) into the diffuse interface model via the free energies.

  2. Long-time atomistic simulations with the Parallel Replica Dynamics method

    NASA Astrophysics Data System (ADS)

    Perez, Danny

    Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.

  3. Vortex-induced vibration of two parallel risers: Experimental test and numerical simulation

    NASA Astrophysics Data System (ADS)

    Huang, Weiping; Zhou, Yang; Chen, Haiming

    2016-04-01

    The vortex-induced vibration of two identical rigidly mounted risers in a parallel arrangement was studied using Ansys- CFX and model tests. The vortex shedding and force were recorded to determine the effect of spacing on the two-degree-of-freedom oscillation of the risers. CFX was used to study the single riser and two parallel risers in 2-8 D spacing considering the coupling effect. Because of the limited width of water channel, only three different riser spacings, 2 D, 3 D, and 4 D, were tested to validate the characteristics of the two parallel risers by comparing to the numerical simulation. The results indicate that the lift force changes significantly with the increase in spacing, and in the case of 3 D spacing, the lift force of the two parallel risers reaches the maximum. The vortex shedding of the risers in 3 D spacing shows that a variable velocity field with the same frequency as the vortex shedding is generated in the overlapped area, thus equalizing the period of drag force to that of lift force. It can be concluded that the interaction between the two parallel risers is significant when the risers are brought to a small distance between them because the trajectory of riser changes from oval to curve 8 as the spacing is increased. The phase difference of lift force between the two risers is also different as the spacing changes.

  4. Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas

    SciTech Connect

    Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.

    2006-07-15

    The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.

  5. Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

    PubMed Central

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  6. A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

    NASA Technical Reports Server (NTRS)

    Bui, Trong T.

    1999-01-01

    A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

  7. Parallel solutions for voxel-based simulations of reaction-diffusion systems.

    PubMed

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the Sτ-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  8. Adaptive finite element simulation of flow and transport applications on parallel computers

    NASA Astrophysics Data System (ADS)

    Kirk, Benjamin Shelton

    The subject of this work is the adaptive finite element simulation of problems arising in flow and transport applications on parallel computers. Of particular interest are new contributions to adaptive mesh refinement (AMR) in this parallel high-performance context, including novel work on data structures, treatment of constraints in a parallel setting, generality and extensibility via object-oriented programming, and the design/implementation of a flexible software framework. This technology and software capability then enables more robust, reliable treatment of multiscale--multiphysics problems and specific studies of fine scale interaction such as those in biological chemotaxis (Chapter 4) and high-speed shock physics for compressible flows (Chapter 5). The work begins by presenting an overview of key concepts and data structures employed in AMR simulations. Of particular interest is how these concepts are applied in the physics-independent software framework which is developed here and is the basis for all the numerical simulations performed in this work. This open-source software framework has been adopted by a number of researchers in the U.S. and abroad for use in a wide range of applications. The dynamic nature of adaptive simulations pose particular issues for efficient implementation on distributed-memory parallel architectures. Communication cost, computational load balance, and memory requirements must all be considered when developing adaptive software for this class of machines. Specific extensions to the adaptive data structures to enable implementation on parallel computers is therefore considered in detail. The libMesh framework for performing adaptive finite element simulations on parallel computers is developed to provide a concrete implementation of the above ideas. This physics-independent framework is applied to two distinct flow and transport applications classes in the subsequent application studies to illustrate the flexibility of the

  9. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    SciTech Connect

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-11-13

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.

  10. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations.

    PubMed

    Hepburn, I; Chen, W; De Schutter, E

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification. PMID:27497550

  11. Task parallel sensitivity analysis and parameter estimation of groundwater simulations through the SALSSA framework

    SciTech Connect

    Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

    2010-07-15

    The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

  12. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

    NASA Astrophysics Data System (ADS)

    Hepburn, I.; Chen, W.; De Schutter, E.

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.

  13. A Large Scale Simulation of Ultrasonic Wave Propagation in Concrete Using Parallelized EFIT

    NASA Astrophysics Data System (ADS)

    Nakahata, Kazuyuki; Tokunaga, Jyunichi; Kimoto, Kazushi; Hirose, Sohichi

    A time domain simulation tool for the ultrasonic propagation in concrete is developed using the elastodynamic finite integration technique (EFIT) and the image-based modeling. The EFIT is a grid-based time domain differential technique and easily treats the different boundary conditions in the inhomogeneous material such as concrete. Here, the geometry of concrete is determined by a scanned image of concrete and the processed color bitmap image is fed into the EFIT. Although the ultrasonic wave simulation in such a complex material requires much time to calculate, we here execute the EFIT by a parallel computing technique using a shared memory computer system. In this study, formulations of the EFIT and treatment of the different boundary conditions are briefly described and examples of shear horizontal wave propagations in reinforced concrete are demonstrated. The methodology and performance of parallelization for the EFIT are also discussed.

  14. Construction of a parallel processor for simulating manipulators and other mechanical systems

    NASA Technical Reports Server (NTRS)

    Hannauer, George

    1991-01-01

    This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

  15. Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations

    NASA Technical Reports Server (NTRS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.

  16. Spontaneous hot flow anomalies at quasi-parallel shocks: 2. Hybrid simulations

    NASA Astrophysics Data System (ADS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Abstract<p label="1">Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid <span class="hlt">simulations</span> to investigate the formation of Spontaneous Hot Flow Anomalies (SHFAs) upstream of quasi-<span class="hlt">parallel</span> bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-<span class="hlt">parallel</span> bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity, and enhancements in ion temperature. Using virtual spacecraft in the <span class="hlt">simulation</span>, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the <span class="hlt">simulation</span> data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly nonuniform plasma in the quasi-<span class="hlt">parallel</span> magnetosheath including large-scale density and magnetic field cavities.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24329381','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24329381"><span id="translatedtitle">Efficient <span class="hlt">parallelization</span> of short-range molecular dynamics <span class="hlt">simulations</span> on many-core systems.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Meyer, R</p> <p>2013-11-01</p> <p>This article introduces a highly <span class="hlt">parallel</span> algorithm for molecular dynamics <span class="hlt">simulations</span> with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high <span class="hlt">parallel</span> speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark <span class="hlt">simulations</span> on a typical 12-core machine show that the described algorithm achieves excellent <span class="hlt">parallel</span> efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These <span class="hlt">simulations</span> demonstrate that the algorithm scales well to large numbers of cores. PMID:24329381</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhRvE..88e3309M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhRvE..88e3309M"><span id="translatedtitle">Efficient <span class="hlt">parallelization</span> of short-range molecular dynamics <span class="hlt">simulations</span> on many-core systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Meyer, R.</p> <p>2013-11-01</p> <p>This article introduces a highly <span class="hlt">parallel</span> algorithm for molecular dynamics <span class="hlt">simulations</span> with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high <span class="hlt">parallel</span> speedups for strongly inhomogeneous systems like nanodevices or nanostructured materials. In the proposed scheme the calculation of the forces and the generation of neighbor lists are divided into small tasks. The tasks are then executed by a thread pool according to a dependent task schedule. This schedule is constructed in such a way that a particle is never accessed by two threads at the same time. Benchmark <span class="hlt">simulations</span> on a typical 12-core machine show that the described algorithm achieves excellent <span class="hlt">parallel</span> efficiencies above 80% for different kinds of systems and all numbers of cores. For inhomogeneous systems the speedups are strongly superior to those obtained with spatial decomposition. Further benchmarks were performed on an Intel Xeon Phi coprocessor. These <span class="hlt">simulations</span> demonstrate that the algorithm scales well to large numbers of cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20040200740','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20040200740"><span id="translatedtitle">Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow <span class="hlt">Simulations</span> Using Massively <span class="hlt">Parallel</span> Computers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Morgan, Philip E.</p> <p>2004-01-01</p> <p>This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow <span class="hlt">Simulations</span> Using Massively <span class="hlt">Parallel</span> Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two <span class="hlt">parallel</span> flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order <span class="hlt">parallel</span> solver for Direct Numerical <span class="hlt">Simulations</span> (DNS) and Large Eddy <span class="hlt">Simulation</span> (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1999APS..DPP.JP165L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1999APS..DPP.JP165L"><span id="translatedtitle"><span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> of Ultra-High Intensity Laser Plasma Interactions.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lasinski, B. F.; Still, C. H.; Langdon, A. B.; Wilks, S. C.; Hatchett, S. P.; Hinkel, D. E.</p> <p>1999-11-01</p> <p>We extend our previous <span class="hlt">simulations</span> of high intensity short pulse laser plasma interactionsfootnote B. F. Lasinski, A. B. Langdon, S. P. Hatchett, M. H. Key, and M. Tabak, Phys. Plasmas 6, 2041 (1999); S. C. Wilks and W. L. Kruer, IEEE Journal of Quantum Electronics 11, 1954 (1997). to 3D and to much larger systems in 2D using our new, modern, 3D, electromagnetic, fully relativistic, massively <span class="hlt">parallel</span> PIC code. Our <span class="hlt">simulation</span> parameters are guided by the recent Petawatt experiments at Livermore. We study the generation of hot electrons and energetic ions and the associated complex phenomena. Laser light filamentation and the formation of high static magnetic fields are described.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li class="active"><span>17</span></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_17 --> <div id="page_18" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="341"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22218447','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22218447"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of three-dimensional molecular dynamic <span class="hlt">simulation</span> for laser-cluster interaction</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Holkundkar, Amol R.</p> <p>2013-11-15</p> <p>The objective of this article is to report the <span class="hlt">parallel</span> implementation of the 3D molecular dynamic <span class="hlt">simulation</span> code for laser-cluster interactions. The benchmarking of the code has been done by comparing the <span class="hlt">simulation</span> results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMD.....8..473H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMD.....8..473H"><span id="translatedtitle">A generic <span class="hlt">simulation</span> cell method for developing extensible, efficient and readable <span class="hlt">parallel</span> computational models</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Honkonen, I.</p> <p>2015-03-01</p> <p>I present a method for developing extensible and modular computational models without sacrificing serial or <span class="hlt">parallel</span> performance or source code readability. By using a generic <span class="hlt">simulation</span> cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic <span class="hlt">simulation</span> cell method presented here, generic <span class="hlt">simulation</span> cell class (gensimcell), also includes support for <span class="hlt">parallel</span> programming by allowing model developers to select which <span class="hlt">simulation</span> variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic <span class="hlt">simulation</span> cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at <a href="https://github.com/nasailja/gensimcell"target="_blank">https://github.com/nasailja/gensimcell</a> for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/974641','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/974641"><span id="translatedtitle">: A Scalable and Transparent System for <span class="hlt">Simulating</span> MPI Programs</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Perumalla, Kalyan S</p> <p>2010-01-01</p> <p>is a scalable, transparent system for experimenting with the execution of <span class="hlt">parallel</span> programs on <span class="hlt">simulated</span> computing platforms. The level of <span class="hlt">simulated</span> detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of <span class="hlt">simulated</span> (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, and MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely <span class="hlt">discrete</span> <span class="hlt">event</span> style of execution, and due to the scalability and efficiency of the underlying <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span> engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully <span class="hlt">simulating</span> over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003JSMEC..46..263A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003JSMEC..46..263A"><span id="translatedtitle">Application of <span class="hlt">Discrete</span> <span class="hlt">Event</span> Control to the Insertion Task of Electric Line Using 6-Link Electro-Hydraulic Manipulators with Dual Arm</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ahn, Kyoungkwan; Yokota, Shinichi</p> <p></p> <p>Uninterrupted power supply has become indispensable during the maintenance task of active electric power lines as a result of today's highly information-oriented society and increasing demand of electric utilities. The maintenance task has the risk of electric shock and the danger of falling from high place. Therefore it is necessary to realize an autonomous robot system using electro-hydraulic manipulator because hydraulic manipulators have the advantage of electric insulation. Meanwhile it is relatively difficult to realize autonomous assembly tasks particularly in the case of manipulating flexible objects such as electric lines. In this report, a <span class="hlt">discrete</span> <span class="hlt">event</span> control system is introduced for automatic assembly task of electric lines into sleeves as one of a typical task of active electric power lines. In the implementation of a <span class="hlt">discrete</span> <span class="hlt">event</span> control system, LVQNN (learning vector quantization neural network) is applied to the insertion task of electric lines to sleeves. In order to apply these proposed control system to the unknown environment, virtual learning data for LVQNN was generated by fuzzy inference. By the experimental results of two types of electric lines and sleeves, these proposed <span class="hlt">discrete</span> <span class="hlt">event</span> control and neural network learning algorithm are confirmed very effective to the insertion tasks of electric lines to sleeves as a typical task of active electric power maintenance tasks.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012AGUFMOS34A..07R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012AGUFMOS34A..07R"><span id="translatedtitle">Field-Scale, Massively <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Production from Oceanic Gas Hydrate Deposits</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Reagan, M. T.; Moridis, G. J.; Freeman, C. M.; Pan, L.; Boyle, K. L.; Johnson, J. N.; Husebo, J. A.</p> <p>2012-12-01</p> <p>The quantity of hydrocarbon gases trapped in natural hydrate accumulations is enormous, leading to significant interest in the evaluation of their potential as an energy source. It has been shown that large volumes of gas can be readily produced at high rates for long times from some types of methane hydrate accumulations by means of depressurization-induced dissociation, and using conventional technologies with horizontal or vertical well configurations. However, these systems are currently assessed using simplified or reduced-scale 3D or even 2D production <span class="hlt">simulations</span>. In this study, we use the massively <span class="hlt">parallel</span> TOUGH+HYDRATE code (pT+H) to assess the production potential of a large, deep-ocean hydrate reservoir and develop strategies for effective production. The <span class="hlt">simulations</span> model a full 3D system of over 24 km2 extent, examining the productivity of vertical and horizontal wells, single or multiple wells, and explore variations in reservoir properties. Systems of up to 2.5M gridblocks, running on thousands of supercomputing nodes, are required to <span class="hlt">simulate</span> such large systems at the highest level of detail. The <span class="hlt">simulations</span> reveal the challenges inherent in producing from deep, relatively cold systems with extensive water-bearing channels and connectivity to large aquifers, including the difficulty of achieving depressurizing, the challenges of high water removal rates, and the complexity of production design. Also highlighted are new frontiers in large-scale reservoir <span class="hlt">simulation</span> of coupled flow, transport, thermodynamics, and phase behavior, including the construction of large meshes, the use <span class="hlt">parallel</span> numerical solvers and MPI, and large-scale, <span class="hlt">parallel</span> 3D visualization of results.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22230824','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22230824"><span id="translatedtitle">Massively <span class="hlt">parallel</span> Monte Carlo for many-particle <span class="hlt">simulations</span> on GPUs</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.</p> <p>2013-12-01</p> <p>Current trends in <span class="hlt">parallel</span> processors call for the design of efficient massively <span class="hlt">parallel</span> algorithms for scientific computing. <span class="hlt">Parallel</span> algorithms for Monte Carlo <span class="hlt">simulations</span> of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively <span class="hlt">parallel</span> method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001JChPh.114.9772S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001JChPh.114.9772S"><span id="translatedtitle">A novel <span class="hlt">parallel</span>-rotation algorithm for atomistic Monte Carlo <span class="hlt">simulation</span> of dense polymer systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Santos, S.; Suter, U. W.; Müller, M.; Nievergelt, J.</p> <p>2001-06-01</p> <p>We develop and test a new elementary Monte Carlo move for use in the off-lattice <span class="hlt">simulation</span> of polymer systems. This novel <span class="hlt">Parallel</span>-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The <span class="hlt">parallel</span>-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo <span class="hlt">simulation</span>. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo <span class="hlt">simulation</span> of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice <span class="hlt">simulation</span> of realistic models of dense polymer systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880007021','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880007021"><span id="translatedtitle">LISP based <span class="hlt">simulation</span> generators for modeling complex space processes</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Tseng, Fan T.; Schroer, Bernard J.; Dwan, Wen-Shing</p> <p>1987-01-01</p> <p>The development of a <span class="hlt">simulation</span> assistant for modeling <span class="hlt">discrete</span> <span class="hlt">event</span> processes is presented. Included are an overview of the system, a description of the <span class="hlt">simulation</span> generators, and a sample process generated using the <span class="hlt">simulation</span> assistant.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016SPIE.9805E..0NS&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016SPIE.9805E..0NS&link_type=ABSTRACT"><span id="translatedtitle">Modeling of fatigue crack induced nonlinear ultrasonics using a highly <span class="hlt">parallelized</span> explicit local interaction <span class="hlt">simulation</span> approach</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shen, Yanfeng; Cesnik, Carlos E. S.</p> <p>2016-04-01</p> <p>This paper presents a <span class="hlt">parallelized</span> modeling technique for the efficient <span class="hlt">simulation</span> of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction <span class="hlt">Simulation</span> Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly <span class="hlt">parallelized</span> supercomputing on powerful graphic cards. Both the explicit contact formulation and the <span class="hlt">parallel</span> feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015CoPhC.194...18N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015CoPhC.194...18N"><span id="translatedtitle">Computational performance of a smoothed particle hydrodynamics <span class="hlt">simulation</span> for shared-memory <span class="hlt">parallel</span> computing</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide</p> <p>2015-09-01</p> <p>The computational performance of a smoothed particle hydrodynamics (SPH) <span class="hlt">simulation</span> is investigated for three types of current shared-memory <span class="hlt">parallel</span> computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several <span class="hlt">parallel</span> implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH <span class="hlt">simulation</span> on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs <span class="hlt">parallelized</span> by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23163385','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23163385"><span id="translatedtitle"><span class="hlt">Simulations</span> of structural and dynamic anisotropy in nano-confined water between <span class="hlt">parallel</span> graphite plates.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M H; Najafi, Bijan</p> <p>2012-11-14</p> <p>We use molecular dynamics <span class="hlt">simulations</span> to study the structure, dynamics, and transport properties of nano-confined water between <span class="hlt">parallel</span> graphite plates with separation distances (H) from 7 to 20 Å at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our <span class="hlt">simulations</span> show anisotropic structure and dynamics of the confined water phase in directions <span class="hlt">parallel</span> and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions <span class="hlt">parallel</span> and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 Å to H = 20 Å, the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ~10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm(-3), bubble formation and restructuring of the water layers are observed. PMID:23163385</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/974630','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/974630"><span id="translatedtitle"><span class="hlt">Parallel</span> Agent-Based <span class="hlt">Simulations</span> on Clusters of GPUs and Multi-Core Processors</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K</p> <p>2010-01-01</p> <p>An effective latency-hiding mechanism is presented in the <span class="hlt">parallelization</span> of agent-based model <span class="hlt">simulations</span> (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art <span class="hlt">parallel</span> computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct <span class="hlt">parallel</span> platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular <span class="hlt">simulator</span> in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016PASJ...68...54I&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016PASJ...68...54I&link_type=ABSTRACT"><span id="translatedtitle">Implementation and performance of FDPS: a framework for developing <span class="hlt">parallel</span> particle <span class="hlt">simulation</span> codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro</p> <p>2016-08-01</p> <p>We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle <span class="hlt">Simulators</span>). FDPS is an application-development framework which helps researchers to develop <span class="hlt">simulation</span> programs using particle methods for large-scale distributed-memory <span class="hlt">parallel</span> supercomputers. A particle-based <span class="hlt">simulation</span> program for distributed-memory <span class="hlt">parallel</span> computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory <span class="hlt">parallel</span> computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient <span class="hlt">parallel</span> execution of particle-based <span class="hlt">simulations</span> as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale <span class="hlt">parallel</span> supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/895092','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/895092"><span id="translatedtitle">Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments New Directions and Work-in-Progress</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Trinitis, C; Schulz, M</p> <p>2006-06-29</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in <span class="hlt">parallel</span> algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. ParSim brings together researchers from both application disciplines and computer science and aims at fostering closer cooperation between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. This offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, eleven papers from authors in nine countries were submitted to ParSim, and we selected five of them. They cover a wide range of different application fields including gas flow <span class="hlt">simulations</span>, thermo-mechanical processes in nuclear waste storage, and cosmological <span class="hlt">simulations</span>. At the same time, the selected contributions also address the computer science side of their codes and discuss different <span class="hlt">parallelization</span> strategies, programming models and languages, as well as the use nonblocking collective operations in MPI. We are confident that this provides an attractive program and that ParSim will be an informal setting for lively discussions and for fostering new</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016PASJ..tmp...65I','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016PASJ..tmp...65I"><span id="translatedtitle">Implementation and performance of FDPS: a framework for developing <span class="hlt">parallel</span> particle <span class="hlt">simulation</span> codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro</p> <p>2016-06-01</p> <p>We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle <span class="hlt">Simulators</span>). FDPS is an application-development framework which helps researchers to develop <span class="hlt">simulation</span> programs using particle methods for large-scale distributed-memory <span class="hlt">parallel</span> supercomputers. A particle-based <span class="hlt">simulation</span> program for distributed-memory <span class="hlt">parallel</span> computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory <span class="hlt">parallel</span> computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient <span class="hlt">parallel</span> execution of particle-based <span class="hlt">simulations</span> as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale <span class="hlt">parallel</span> supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016RScI...87g6101S&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016RScI...87g6101S&link_type=ABSTRACT"><span id="translatedtitle">Note: Application of a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator for <span class="hlt">simulation</span> of hip joint motion</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shan, X. L.; Cheng, G.; Liu, X. Z.</p> <p>2016-07-01</p> <p>In the paper, a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator, which has two moving platforms, is proposed. The <span class="hlt">parallel</span> manipulator is adopted to <span class="hlt">simulate</span> hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the <span class="hlt">simulated</span> motions. The experimental results indicate that the 2(3HUS+S) <span class="hlt">parallel</span> manipulator can realize the <span class="hlt">simulation</span> of many kinds of hip joint motions without changing the structure size.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27475608','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27475608"><span id="translatedtitle">Note: Application of a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator for <span class="hlt">simulation</span> of hip joint motion.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shan, X L; Cheng, G; Liu, X Z</p> <p>2016-07-01</p> <p>In the paper, a novel 2(3HUS+S) <span class="hlt">parallel</span> manipulator, which has two moving platforms, is proposed. The <span class="hlt">parallel</span> manipulator is adopted to <span class="hlt">simulate</span> hip joint motion and can conduct an experiment for two hip joints simultaneously. Motion experiments are conducted in the paper, and the recommended hip joint motion curves from ISO14242 and actual hip joint motions during jogging and walking are selected as the <span class="hlt">simulated</span> motions. The experimental results indicate that the 2(3HUS+S) <span class="hlt">parallel</span> manipulator can realize the <span class="hlt">simulation</span> of many kinds of hip joint motions without changing the structure size. PMID:27475608</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3605599','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3605599"><span id="translatedtitle">GROMACS 4.5: a high-throughput and highly <span class="hlt">parallel</span> open source molecular <span class="hlt">simulation</span> toolkit</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik</p> <p>2013-01-01</p> <p>Motivation: Molecular <span class="hlt">simulation</span> has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated <span class="hlt">simulation</span> of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new <span class="hlt">simulation</span> algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient <span class="hlt">parallelization</span> even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art <span class="hlt">parallelization</span>, this provides extremely high performance and cost efficiency for high-throughput as well as massively <span class="hlt">parallel</span> <span class="hlt">simulations</span>. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/936704','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/936704"><span id="translatedtitle">De Novo Ultrascale Atomistic <span class="hlt">Simulations</span> On High-End <span class="hlt">Parallel</span> Supercomputers</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H</p> <p>2006-09-04</p> <p>We present a de novo hierarchical <span class="hlt">simulation</span> framework for first-principles based predictive <span class="hlt">simulations</span> of materials and their validation on high-end <span class="hlt">parallel</span> supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) <span class="hlt">simulations</span> explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) <span class="hlt">simulations</span> are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling <span class="hlt">simulation</span> algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical <span class="hlt">simulation</span> with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition <span class="hlt">parallelization</span> framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic <span class="hlt">simulations</span>--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the <span class="hlt">parallel</span> efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21499769','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21499769"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Martinez, E.; Monasterio, P.R.; Marian, J.</p> <p>2011-02-20</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_18 --> <div id="page_19" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="361"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910070181&hterms=Geomagnetic+pulsations&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D30%26Ntt%3DGeomagnetic%2Bpulsations','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910070181&hterms=Geomagnetic+pulsations&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D30%26Ntt%3DGeomagnetic%2Bpulsations"><span id="translatedtitle">Steepening of <span class="hlt">parallel</span> propagating hydromagnetic waves into magnetic pulsations - A <span class="hlt">simulation</span> study</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Akimoto, K.; Winske, D.; Onsager, T. G.; Thomsen, M. F.; Gary, S. P.</p> <p>1991-01-01</p> <p>The steepening mechanism of <span class="hlt">parallel</span> propagating low-frequency MHD-like waves observed upstream of the earth's quasi-<span class="hlt">parallel</span> bow shock has been investigated by means of electromagnetic hybrid <span class="hlt">simulations</span>. It is shown that an ion beam through the resonant electromagnetic ion/ion instability excites large-amplitude waves, which consequently pitch angle scatter, decelerate, and eventually magnetically trap beam ions in regions where the wave amplitudes are largest. As a result, the beam ions become bunched in both space and gyrophase. As these higher-density, nongyrotropic beam segments are formed, the hydromagnetic waves rapidly steepen, resulting in magnetic pulsations, with properties generally in agreement with observations. This steepening process operates on the scale of the linear growth time of the resonant ion/ion instability. Many of the pulsations generated by this mechanism are left-hand polarized in the spacecraft frame.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMDD....8.2369H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMDD....8.2369H"><span id="translatedtitle">A <span class="hlt">parallelization</span> scheme to <span class="hlt">simulate</span> reactive transport in the subsurface environment with OGS#IPhreeqc</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.</p> <p>2015-03-01</p> <p>This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to <span class="hlt">simulate</span> thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible <span class="hlt">parallelization</span> scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and <span class="hlt">parallelization</span> scheme have been tested and verified in terms of precision and performance.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011JCoPh.230.1359M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011JCoPh.230.1359M"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Martínez, E.; Monasterio, P. R.; Marian, J.</p> <p>2011-02-01</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JSP...162..701U','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JSP...162..701U"><span id="translatedtitle"><span class="hlt">Parallel</span> Tempering Monte Carlo <span class="hlt">Simulations</span> of Spherical Fixed-Connectivity Model for Polymerized Membranes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Usui, Satoshi; Koibuchi, Hiroshi</p> <p>2016-02-01</p> <p>We study the first order phase transition of the fixed-connectivity triangulated surface model using the <span class="hlt">Parallel</span> Tempering Monte Carlo (PTMC) technique on relatively large lattices. From the PTMC results, we find that the transition is considerably stronger than the reported ones predicted by the conventional Metropolis MC (MMC) technique and the flat histogram MC technique. We also confirm that the results of the PTMC on relatively smaller lattices are in good agreement with those known results. This implies that the PTMC is successfully used to <span class="hlt">simulate</span> the first order phase transitions. The <span class="hlt">parallel</span> computation in the PTMC is implemented by OpenMP, where the speed of the PTMC on multi-core CPUs is considerably faster than that on the single-core CPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/658737','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/658737"><span id="translatedtitle"><span class="hlt">Parallel</span> traffic flow <span class="hlt">simulation</span> of freeway networks: Phase 2. Final report 1994--1995</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Chronopoulos, A.</p> <p>1997-07-01</p> <p>Explicit and implicit numerical methods for solving simple macroscopic traffic flow continuum models have been studied and efficiently implemented in traffic <span class="hlt">simulation</span> codes in the past. The authors have already studied and implemented explicit methods for solving the high-order flow conservation traffic model. Implicit methods allow much larger time step size than explicit methods, for the same accuracy. However, at each time step a nonlinear system must be solved. They use the Newton method coupled with a linear iterative (Orthomin). They accelerate the convergence of Orthomin with <span class="hlt">parallel</span> incomplete LU factorization preconditionings. The authors implemented this implicit method on a 16 processor nCUBE2 <span class="hlt">parallel</span> computer and obtained significant execution time speedup.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008JCoPh.227.6249C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008JCoPh.227.6249C"><span id="translatedtitle">Implementation of unsteady sampling procedures for the <span class="hlt">parallel</span> direct <span class="hlt">simulation</span> Monte Carlo method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.</p> <p>2008-06-01</p> <p>An unsteady sampling routine for a general <span class="hlt">parallel</span> direct <span class="hlt">simulation</span> Monte Carlo method called PDSC is introduced, allowing the <span class="hlt">simulation</span> of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and <span class="hlt">simulation</span> time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by <span class="hlt">simulating</span> shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. <span class="hlt">Simulations</span> are then conducted of two applications involving the interaction of shocks over wedges. The results of these <span class="hlt">simulations</span> are compared to experimental data and <span class="hlt">simulations</span> from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016IAUS..312..260W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016IAUS..312..260W"><span id="translatedtitle">Acceleration of hybrid MPI <span class="hlt">parallel</span> NBODY6++ for large N-body globular cluster <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Long; Spurzem, Rainer; Aarseth, Sverre; Nitadori, Keigo; Berczik, Peter; Kouwenhoven, M. B. N.; Naab, Thorsten</p> <p>2016-02-01</p> <p>Previous research on globular clusters (GCs) dynamics is mostly based on semi-analytic, Fokker-Planck, Monte-Carlo methods and on direct N-body (NB) <span class="hlt">simulations</span>. These works have great advantages but also limits since GCs are massive and compact and close encounters and binaries play very important roles in their dynamics. The former three methods make approximations and assumptions, while expensive computing time and number of stars limit the latter method. The current largest direct NB <span class="hlt">simulation</span> has ~ 500k stars (Heggie 2014). Here, we accelerate the direct NB code NBODY6++ (which extends NBODY6 to supercomputers by using MPI) with new <span class="hlt">parallel</span> computing technologies (GPU, OpenMP + SSE/AVX). Our aim is to handle large N (up to 106) direct NB <span class="hlt">simulations</span> to obtain better understanding of the dynamical evolution of GCs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/919137','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/919137"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> design : mathematical formulation, version 2.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.</p> <p>2004-06-01</p> <p>This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively <span class="hlt">parallel</span> SPICE-style circuit <span class="hlt">simulator</span> developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit <span class="hlt">simulation</span>, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit <span class="hlt">simulation</span>, such as voltage limiting, are also described in detail.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920260','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920260"><span id="translatedtitle"><span class="hlt">Parallel</span> Beam Dynamics <span class="hlt">Simulation</span> Tools for Future Light SourceLinac Modeling</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Qiang, Ji; Pogorelov, Ilya v.; Ryne, Robert D.</p> <p>2007-06-25</p> <p>Large-scale modeling on <span class="hlt">parallel</span> computers is playing an increasingly important role in the design of future light sources. Such modeling provides a means to accurately and efficiently explore issues such as limits to beam brightness, emittance preservation, the growth of instabilities, etc. Recently the IMPACT codes suite was enhanced to be applicable to future light source design. <span class="hlt">Simulations</span> with IMPACT-Z were performed using up to one billion <span class="hlt">simulation</span> particles for the main linac of a future light source to study the microbunching instability. Combined with the time domain code IMPACT-T, it is now possible to perform large-scale start-to-end linac <span class="hlt">simulations</span> for future light sources, including the injector, main linac, chicanes, and transfer lines. In this paper we provide an overview of the IMPACT code suite, its key capabilities, and recent enhancements pertinent to accelerator modeling for future linac-based light sources.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003CoPhC.155..159A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003CoPhC.155..159A"><span id="translatedtitle">FLY. A <span class="hlt">parallel</span> tree N-body code for cosmological <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.</p> <p>2003-10-01</p> <p>FLY is a <span class="hlt">parallel</span> treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: <span class="hlt">parallel</span> program. The user can select the number of processors >=1 Has the code been vectorized or <span class="hlt">parallelized</span>?: <span class="hlt">parallelized</span> Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: <span class="hlt">Parallel</span> tree N-body code for cosmological <span class="hlt">simulations</span> Nature of physical problem: FLY is a <span class="hlt">parallel</span> collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles <span class="hlt">simulation</span> on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4824128','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4824128"><span id="translatedtitle">BioFVM: an efficient, <span class="hlt">parallelized</span> diffusive transport solver for 3-D biological <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul</p> <p>2016-01-01</p> <p>Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can <span class="hlt">simulate</span> release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been <span class="hlt">parallelized</span> with OpenMP, allowing efficient <span class="hlt">simulations</span> on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger <span class="hlt">simulator</span>. Availability and implementation: BioFVM is written in C ++ with <span class="hlt">parallelization</span> in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011AGUFMNG51D1672H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011AGUFMNG51D1672H"><span id="translatedtitle">Using Speculative Execution to Reduce Communication in a <span class="hlt">Parallel</span> Large Scale Earthquake <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Heien, E. M.; Yikilmaz, M. B.; Sachs, M. K.; Rundle, J. B.; Turcotte, D. L.; Kellogg, L. H.</p> <p>2011-12-01</p> <p>Earthquake <span class="hlt">simulations</span> on <span class="hlt">parallel</span> systems can be communication intensive due to local events (rupture waves) which have global effects (stress transfer). These events require global communication to transmit the effects of increased stress to model elements on other computing nodes. We describe a method of using speculative execution in a large scale <span class="hlt">parallel</span> computation to decrease communication and improve <span class="hlt">simulation</span> speed. This method exploits the tendency of earthquake ruptures to remain physically localized even though their effects on stress will be over long ranges. In this method we assume the stress transfer caused by a rupture remains localized and avoid global communication until the rupture has a high probability of passing to another node. We then calculate the stress state of the system to ensure that the rupture in fact remained localized, proceeding if the assumption was correct or rolling back the calculation otherwise. Using this method we are able to reduce communication frequency by 78% percent, in turn decreasing communication time by up to 66% and improving <span class="hlt">simulation</span> speed by up to 45%.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23600445','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23600445"><span id="translatedtitle">Accelerating groundwater flow <span class="hlt">simulation</span> in MODFLOW using JASMIN-based <span class="hlt">parallel</span> computing.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Cheng, Tangpei; Mo, Zeyao; Shao, Jingli</p> <p>2014-01-01</p> <p>To accelerate the groundwater flow <span class="hlt">simulation</span> process, this paper reports our work on developing an efficient <span class="hlt">parallel</span> <span class="hlt">simulator</span> through rebuilding the well-known software MODFLOW on JASMIN (J Adaptive Structured Meshes applications Infrastructure). The rebuilding process is achieved by designing patch-based data structure and <span class="hlt">parallel</span> algorithms as well as adding slight modifications to the compute flow and subroutines in MODFLOW. Both the memory requirements and computing efforts are distributed among all processors; and to reduce communication cost, data transfers are batched and conveniently handled by adding ghost nodes to each patch. To further improve performance, constant-head/inactive cells are tagged and neglected during the linear solving process and an efficient load balancing strategy is presented. The accuracy and efficiency are demonstrated through modeling three scenarios: The first application is a field flow problem located at Yanming Lake in China to help design reasonable quantity of groundwater exploitation. Desirable numerical accuracy and significant performance enhancement are obtained. Typically, the tagged program with load balancing strategy running on 40 cores is six times faster than the fastest MICCG-based MODFLOW program. The second test is <span class="hlt">simulating</span> flow in a highly heterogeneous aquifer. The AMG-based JASMIN program running on 40 cores is nine times faster than the GMG-based MODFLOW program. The third test is a simplified transient flow problem with the order of tens of millions of cells to examine the scalability. Compared to 32 cores, <span class="hlt">parallel</span> efficiency of 77 and 68% are obtained on 512 and 1024 cores, respectively, which indicates impressive scalability. PMID:23600445</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=20020010587&hterms=Weeratunga&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DWeeratunga','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=20020010587&hterms=Weeratunga&qs=N%3D0%26Ntk%3DAll%26Ntx%3Dmode%2Bmatchall%26Ntt%3DWeeratunga"><span id="translatedtitle"><span class="hlt">Simulation</span> of Unsteady Combustion in a Ramjet Engine Using a Highly <span class="hlt">Parallel</span> Computer</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Menon, Suresh; Weeratunga, Sisira; Cooper, D. M. (Technical Monitor)</p> <p>1994-01-01</p> <p>Combustion instability in ramjets is a complex phenomenon that involve nonlinear interaction between acoustic waves, vortex motion and unsteady heat release in the combustor. To numerically <span class="hlt">simulate</span> this 3-D, transient phenomenon, enormous computer resources (time, memory and disk storage) are required. Although current generation vector supercomputers are capable of providing adequate resources for <span class="hlt">simulations</span> of this nature, their high cost and limited availability, makes such machines less than satisfactory for routine use. The primary focus of this study is to assess the feasibility of using highly <span class="hlt">parallel</span> computer systems as a cost-effective alternative for conducting such unsteady flow <span class="hlt">simulations</span>. Towards this end, a large-eddy <span class="hlt">simulation</span> model for combustion instability was implemented on the Intel iPSC/860 and a careful study was conducted to determine the benefits and the problems associated with the use of such machines for transient <span class="hlt">simulations</span>. Details of this study along with the results obtained from the unsteady combustion <span class="hlt">simulations</span> carried out on the iPSC/860 are discussed in this paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19900007132','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19900007132"><span id="translatedtitle">Stochastic <span class="hlt">simulation</span> of charged particle transport on the massively <span class="hlt">parallel</span> processor</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Earl, James A.</p> <p>1988-01-01</p> <p>Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively <span class="hlt">Parallel</span> Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These <span class="hlt">simulations</span> have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001APS..DPPKP1112L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001APS..DPPKP1112L"><span id="translatedtitle"><span class="hlt">Parallel</span> PIC <span class="hlt">Simulations</span> of Short-Pulse High Intensity Laser Plasma Interactions.</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lasinski, B. F.; Still, C. H.; Langdon, A. B.</p> <p>2001-10-01</p> <p>We extend our previous <span class="hlt">simulations</span> of high intensity short pulse laser plasma interactions footnote B. F. Lasinski, A. B. Langdon, S. P. Hatchett, M. H. Key, and M. Tabak, Phys. Plasmas 6, 2041 (1999); S. C. Wilks and W. L. Kruer, IEEE Journal of Quantum Electronics 11, 1954 (1997). to 3D and to much larger systems in 2D using our new, modern, 3D, electromagnetic, fully relativistic, massively <span class="hlt">parallel</span> PIC code. We study the generation of hot electrons and energetic ions and the associated complex phenomena. Laser light filamentation and the formation of high static magnetic fields are described.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1028177','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1028177"><span id="translatedtitle">A <span class="hlt">parallel</span> multigrid preconditioner for the <span class="hlt">simulation</span> of large fracture networks</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sampath, Rahul S; Barai, Pallab; Nukala, Phani K</p> <p>2010-01-01</p> <p>Computational modeling of a fracture in disordered materials using discrete lattice models requires the solution of a linear system of equations every time a new lattice bond is broken. Solving these linear systems of equations successively is the most expensive part of fracture <span class="hlt">simulations</span> using large three-dimensional networks. In this paper, we present a <span class="hlt">parallel</span> multigrid preconditioned conjugate gradient algorithm to solve these linear systems. Numerical experiments demonstrate that this algorithm performs significantly better than the algorithms previously used to solve this problem.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1035294','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1035294"><span id="translatedtitle">Understanding Performance of <span class="hlt">Parallel</span> Scientific <span class="hlt">Simulation</span> Codes using Open|SpeedShop</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ghosh, K K</p> <p>2011-11-07</p> <p>Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, <span class="hlt">parallel</span>, scientific <span class="hlt">simulation</span> codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/5574659','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/5574659"><span id="translatedtitle">Forced-convection boiling tests performed in <span class="hlt">parallel</span> <span class="hlt">simulated</span> LMR fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Rose, S.D.; Carbajo, J.J.; Levin, A.E.; Lloyd, D.B.; Montgomery, B.H.; Wantland, J.L.</p> <p>1985-04-21</p> <p>Forced-convection tests have been carried out using <span class="hlt">parallel</span> <span class="hlt">simulated</span> Liquid Metal Reactor fuel assemblies in an engineering-scale sodium loop, the Thermal-Hydraulic Out-of-Reactor Safety facility. The tests, performed under single- and two-phase conditions, have shown that for low forced-convection flow there is significant flow augmentation by thermal convection, an important phenomenon under degraded shutdown heat removal conditions in an LMR. The power and flows required for boiling and dryout to occur are much higher than decay heat levels. The experimental evidence supports analytical results that heat removal from an LMR is possible with a degraded shutdown heat removal system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/427933','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/427933"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span> and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, L.</p> <p>1995-12-31</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed and its operating modes for 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using constant current model and Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of 1 kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500 W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging PF mode for the entire load range and requires a narrow variation in switching frequency.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_19 --> <div id="page_20" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="381"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2000PhDT.......208B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2000PhDT.......208B"><span id="translatedtitle"><span class="hlt">Parallel</span> direct numerical <span class="hlt">simulation</span> of wake vortex detection using monostatic and bistatic radio acoustic sounding systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Boluriaan Esfahaani, Said</p> <p></p> <p>A <span class="hlt">parallel</span> two-dimensional code is developed in this thesis to numerically <span class="hlt">simulate</span> wake vortex detection using a Radio Acoustic Sounding System (RASS). The Maxwell equations for media with non-uniform permittivity and the linearized Euler equations for media with non-uniform mean flow are the main framework for the <span class="hlt">simulations</span>. The code is written in Fortran 90 with the Message Passing Interface (MPI) for <span class="hlt">parallel</span> implementation. The main difficulty encountered with a time accurate <span class="hlt">simulation</span> of a RASS is the number of samples required to resolve the Doppler shift in the scattered electromagnetic signal. Even for a 1D <span class="hlt">simulation</span> with a typical scatterer size, the CPU time required to run the code is far beyond currently available computer resources. Two solutions that overcome this problem are described. In the first the actual electromagnetic wave propagation speed is replaced with a much lower value. This allows an explicit, time accurate numerical scheme to be used. In the second the governing differential equations are recast in order to remove the carrier frequency and solve only for the frequency shift using an implicit scheme with large time steps. The numerical stability characteristics of the resulting discretized equation with complex coefficients are examined. A number of cases for both the monostatic and bistatic configurations are considered. First, a uniform mean flow is considered and the RASS <span class="hlt">simulation</span> is performed for two different types of incident acoustic field, namely a short single frequency acoustic pulse and a continuous broadband acoustic source. Both the explicit and implicit schemes are examined and the mean flow velocity is determined from the spectrum of the backscattered electromagnetic signal with very good accuracy. Second, the Taylor and Oseen vortex models are considered and their velocity field along the incident electromagnetic beam is retrieved. The Abel transform is then applied to the velocity profiles determined by both</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577"><span id="translatedtitle">Evaluating the performance of <span class="hlt">parallel</span> subsurface <span class="hlt">simulators</span>: An illustrative example with PFLOTRAN</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Hammond, G E; Lichtner, P C; Mills, R T</p> <p>2014-01-01</p> <p>[1] To better inform the subsurface scientist on the expected performance of <span class="hlt">parallel</span> <span class="hlt">simulators</span>, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's <span class="hlt">parallel</span> layout and code design, PFLOTRAN's <span class="hlt">parallel</span> performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2005SPIE.6019..523L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2005SPIE.6019..523L"><span id="translatedtitle"><span class="hlt">Simulation</span> of optical devices using <span class="hlt">parallel</span> finite-difference time-domain method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Li, Kang; Kong, Fanmin; Mei, Liangmo; Liu, Xin</p> <p>2005-11-01</p> <p>This paper presents a new <span class="hlt">parallel</span> finite-difference time-domain (FDTD) numerical method in a low-cost network environment to stimulate optical waveguide characteristics. The PC motherboard based cluster is used, as it is relatively low-cost, reliable and has high computing performance. Four clusters are networked by fast Ethernet technology. Due to the simplicity nature of FDTD algorithm, a native Ethernet packet communication mechanism is used to reduce the overhead of the communication between the adjacent clusters. To validate the method, a microcavity ring resonator based on semiconductor waveguides is chosen as an instance of FDTD <span class="hlt">parallel</span> computation. Speed-up rate under different division density is calculated. From the result we can conclude that when the decomposing size reaches a certain point, a good <span class="hlt">parallel</span> computing speed up will be maintained. This <span class="hlt">simulation</span> shows that through the overlapping of computation and communication method and controlling the decomposing size, the overhead of the communication of the shared data will be conquered. The result indicates that the implementation can achieve significant speed up for the FDTD algorithm. This will enable us to tackle the larger real electromagnetic problem by the low-cost PC clusters.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22089677','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22089677"><span id="translatedtitle">A <span class="hlt">PARALLEL</span> MONTE CARLO CODE FOR <span class="hlt">SIMULATING</span> COLLISIONAL N-BODY SYSTEMS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.</p> <p>2013-02-15</p> <p>We present a new <span class="hlt">parallel</span> code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a <span class="hlt">parallel</span> random number generation scheme as well as a <span class="hlt">parallel</span> sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all <span class="hlt">simulations</span>. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the <span class="hlt">parallel</span> sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015A%26C....12..109H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015A%26C....12..109H"><span id="translatedtitle">L-PICOLA: A <span class="hlt">parallel</span> code for fast dark matter <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Howlett, C.; Manera, M.; Percival, W. J.</p> <p>2015-09-01</p> <p>Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-<span class="hlt">parallel</span> code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body <span class="hlt">simulation</span>. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the <span class="hlt">simulation</span> and <span class="hlt">simulate</span> the past lightcone at run-time, with optional replication of the <span class="hlt">simulation</span> volume. Through comparisons to fully non-linear N-Body <span class="hlt">simulations</span> we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24416069','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24416069"><span id="translatedtitle">Large-scale modeling of epileptic seizures: scaling properties of two <span class="hlt">parallel</span> neuronal network <span class="hlt">simulation</span> algorithms.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim</p> <p>2013-01-01</p> <p>Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale <span class="hlt">simulations</span> of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale <span class="hlt">simulations</span>. We have determined the detailed behavior of two such <span class="hlt">simulators</span> on <span class="hlt">parallel</span> computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our <span class="hlt">simulations</span> required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, <span class="hlt">simulations</span> of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/pages/biblio/1227736-large-scale-modeling-epileptic-seizures-scaling-properties-two-parallel-neuronal-network-simulation-algorithms','SCIGOV-DOEP'); return false;" href="http://www.osti.gov/pages/biblio/1227736-large-scale-modeling-epileptic-seizures-scaling-properties-two-parallel-neuronal-network-simulation-algorithms"><span id="translatedtitle">Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two <span class="hlt">Parallel</span> Neuronal Network <span class="hlt">Simulation</span> Algorithms</span></a></p> <p><a target="_blank" href="http://www.osti.gov/pages">DOE PAGESBeta</a></p> <p>Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; Visser, Sid; Stevens, Rick L.; Wildeman, Albert; van Drongelen, Wim</p> <p>2013-01-01</p> <p>Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale <span class="hlt">simulations</span> of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale <span class="hlt">simulations</span>. We have determinedmore » the detailed behavior of two such <span class="hlt">simulators</span> on <span class="hlt">parallel</span> computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our <span class="hlt">simulations</span> required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, <span class="hlt">simulations</span> of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/18334421','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/18334421"><span id="translatedtitle">Gait <span class="hlt">simulation</span> via a 6-DOF <span class="hlt">parallel</span> robot with iterative learning control.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Aubin, Patrick M; Cowley, Matthew S; Ledoux, William R</p> <p>2008-03-01</p> <p>We have developed a robotic gait <span class="hlt">simulator</span> (RGS) by leveraging a 6-degree of freedom <span class="hlt">parallel</span> robot, with the goal of overcoming three significant challenges of gait <span class="hlt">simulation</span>, including: 1) operating at near physiologically correct velocities; 2) inputting full scale ground reaction forces; and 3) <span class="hlt">simulating</span> motion in all three planes (sagittal, coronal and transverse). The robot will eventually be employed with cadaveric specimens, but as a means of exploring the capability of the system, we have first used it with a prosthetic foot. Gait data were recorded from one transtibial amputee using a motion analysis system and force plate. Using the same prosthetic foot as the subject, the RGS accurately reproduced the recorded kinematics and kinetics and the appropriate vertical ground reaction force was realized with a proportional iterative learning controller. After six gait iterations the controller reduced the root mean square (RMS) error between the <span class="hlt">simulated</span> and in situ; vertical ground reaction force to 35 N during a 1.5 s <span class="hlt">simulation</span> of the stance phase of gait with a prosthetic foot. This paper addresses the design, methodology and validation of the novel RGS. PMID:18334421</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777"><span id="translatedtitle">PCSIM: A <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Environment for Neural Circuits Fully Integrated with Python</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus</p> <p>2008-01-01</p> <p>The <span class="hlt">Parallel</span> Circuit <span class="hlt">SIMulator</span> (PCSIM) is a software package for <span class="hlt">simulation</span> of neural circuits. It is primarily designed for distributed <span class="hlt">simulation</span> of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit <span class="hlt">simulator</span> with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural <span class="hlt">simulations</span>. PMID:19543450</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1093069','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1093069"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive fluid-structure interaction <span class="hlt">simulation</span> of explosions impacting on building structures</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Deiterding, Ralf; Wood, Stephen L</p> <p>2013-01-01</p> <p>We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside <span class="hlt">simulations</span> verifying the coupled fluid-structure solver and assessing its <span class="hlt">parallel</span> scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the <span class="hlt">simulation</span> of a prototypical blast explosion in a realistic multistory building are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20140009920','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20140009920"><span id="translatedtitle"><span class="hlt">Simulation</span>/Emulation Techniques: Compressing Schedules With <span class="hlt">Parallel</span> (HW/SW) Development</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Mangieri, Mark L.; Hoang, June</p> <p>2014-01-01</p> <p>NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of <span class="hlt">simulators</span> and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of <span class="hlt">simulators</span> and emulators in advance of deliverable hardware to achieve <span class="hlt">parallel</span> design and development on a compressed schedule.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015SPIE.9424E..0JB','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015SPIE.9424E..0JB"><span id="translatedtitle"><span class="hlt">Simulating</span> massively <span class="hlt">parallel</span> electron beam inspection for sub-20 nm defects</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt</p> <p>2015-03-01</p> <p>SEMATECH has initiated a program to develop massively-<span class="hlt">parallel</span> electron beam defect inspection (MPEBI). Here we use JMONSEL <span class="hlt">simulations</span> to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these <span class="hlt">simulated</span> images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1096496','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1096496"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> users%3CU%2B2019%3E guide, version 6.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.</p> <p>2013-08-01</p> <p>This manual describes the use of the Xyce <span class="hlt">Parallel</span> Electronic <span class="hlt">Simulator</span>. Xyce has been designed as a SPICE-compatible, high-performance analog circuit <span class="hlt">simulator</span>, and has been written to support the <span class="hlt">simulation</span> needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale <span class="hlt">parallel</span> computing platforms (up to thousands of processors). This includes support for most popular <span class="hlt">parallel</span> and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a <span class="hlt">parallel</span> code in the most general sense of the phrase - a message passing <span class="hlt">parallel</span> implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory <span class="hlt">parallel</span> platforms. Attention has been paid to the specific nature of circuit-<span class="hlt">simulation</span> problems to ensure that optimal <span class="hlt">parallel</span> efficiency is achieved as the number of processors grows.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1194329','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1194329"><span id="translatedtitle">A Many-Task <span class="hlt">Parallel</span> Approach for Multiscale <span class="hlt">Simulations</span> of Subsurface Flow and Reactive Transport</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.</p> <p>2014-12-16</p> <p>Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall <span class="hlt">simulation</span> domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in <span class="hlt">parallel</span>; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo <span class="hlt">simulations</span> of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing <span class="hlt">parallel</span> continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale <span class="hlt">simulations</span>, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface <span class="hlt">simulations</span> with minimal overhead.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4755232','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4755232"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Martinez, Michael; Romanowska, Julia; Kokh, Daria B.; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan</p> <p>2015-01-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein–protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration‐dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object‐oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016APS..MAR.T1021D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016APS..MAR.T1021D"><span id="translatedtitle">Optimized <span class="hlt">simulations</span> of Olami-Feder-Christensen systems using <span class="hlt">parallel</span> algorithms</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Dominguez, Rachele; Necaise, Rance; Montag, Eric</p> <p></p> <p>The sequential nature of the Olami-Feder-Christensen (OFC) model for earthquake <span class="hlt">simulations</span> limits the benefits of <span class="hlt">parallel</span> computing approaches because of the frequent communication required between processors. We developed a <span class="hlt">parallel</span> version of the OFC algorithm for multi-core processors. Our data, even for relatively small system sizes and low numbers of processors, indicates that increasing the number of processors provides significantly faster <span class="hlt">simulations</span>; producing more efficient results than previous attempts that used network-based Beowulf clusters. Our algorithm optimizes performance by exploiting the multi-core processor architecture, minimizing communication time in contrast to the networked Beowulf-cluster approaches. Our multi-core algorithm is the basis for a new algorithm using GPUs that will drastically increase the number of processors available. Previous studies incorporating realistic structural features of faults into OFC models have revealed spatial and temporal patterns observed in real earthquake systems. The computational advances presented here will allow for studying interacting networks of faults, rather than individual faults, further enhancing our understanding of the relationship between the earth's structure and the triggering process. Support for this project comes from the Chenery Research Fund, the Rashkind Family Endowment, the Walter Williams Craigie Teaching Endowment, and the Schapiro Undergraduate Research Fellowship.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26123630','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26123630"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; Öztürk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C</p> <p>2015-08-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/16851481','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/16851481"><span id="translatedtitle">On the efficiency of exchange in <span class="hlt">parallel</span> tempering monte carlo <span class="hlt">simulations</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Predescu, Cristian; Predescu, Mihaela; Ciobanu, Cristian V</p> <p>2005-03-10</p> <p>We introduce the concept of effective fraction, defined as the expected probability that a configuration from the lowest index replica successfully reaches the highest index replica during a replica exchange Monte Carlo <span class="hlt">simulation</span>. We then argue that the effective fraction represents an adequate measure of the quality of the sampling technique, as far as swapping is concerned. Under the hypothesis that the correlation between successive exchanges is negligible, we propose a technique for the computation of the effective fraction, a technique that relies solely on the values of the acceptance probabilities obtained at the end of the <span class="hlt">simulation</span>. The effective fraction is then utilized for the study of the efficiency of a popular swapping scheme in the context of <span class="hlt">parallel</span> tempering in the canonical ensemble. For large dimensional oscillators, we show that the swapping probability that minimizes the computational effort is 38.74%. By studying the <span class="hlt">parallel</span> tempering swapping efficiency for a 13-atom Lennard-Jones cluster, we argue that the value of 38.74% remains roughly the optimal probability for most systems with continuous distributions that are likely to be encountered in practice. PMID:16851481</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W"><span id="translatedtitle">MDSLB: A new static load balancing method for <span class="hlt">parallel</span> molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang</p> <p>2014-02-01</p> <p>Large-scale <span class="hlt">parallelization</span> of molecular dynamics <span class="hlt">simulations</span> is facing challenges which seriously affect the <span class="hlt">simulation</span> efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in <span class="hlt">parallel</span>, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.204..107B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.204..107B"><span id="translatedtitle">A scalable <span class="hlt">parallel</span> Stokesian Dynamics method for the <span class="hlt">simulation</span> of colloidal suspensions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bülow, F.; Hamberger, P.; Nirschl, H.; Dörfler, W.</p> <p>2016-07-01</p> <p>We have developed a new method for the efficient numerical <span class="hlt">simulation</span> of colloidal suspensions. This method is designed and especially well-suited for <span class="hlt">parallel</span> code execution, but it can also be applied to single-core programs. It combines the Stokesian Dynamics method with a variant of the widely used Barnes-Hut algorithm in order to reduce computational costs. This combination and the inherent <span class="hlt">parallelization</span> of the method make <span class="hlt">simulations</span> of large numbers of particles within days possible. The level of accuracy can be determined by the user and is limited by the truncation of the used multipole expansion. Compared to the original Stokesian Dynamics method the complexity can be reduced from O(N2) to linear complexity for dilute suspensions of strongly clustered particles, N being the number of particles. In case of non-clustered particles in a dense suspension, the complexity depends on the particle configuration and is between O(N) and O(Pnp,max2) , where P is the number of used processes and np,max = ⌈ N / P ⌉ , respectively.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_20 --> <div id="page_21" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="401"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24732497','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24732497"><span id="translatedtitle">pWeb: A High-Performance, <span class="hlt">Parallel</span>-Computing Framework for Web-Browser-Based Medical <span class="hlt">Simulation</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Halic, Tansel; Ahn, Woojin; De, Suvranu</p> <p>2014-01-01</p> <p>This work presents a pWeb - a new language and compiler for <span class="hlt">parallelization</span> of client-side compute intensive web applications such as surgical <span class="hlt">simulations</span>. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical <span class="hlt">simulations</span> and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of <span class="hlt">parallel</span> programming languages as well as the fork/join <span class="hlt">parallel</span> model which is not supported by web workers. The language compiler automatically generates an equivalent <span class="hlt">parallel</span> script that complies with the HTML5 standard. A case study on realistic rendering for surgical <span class="hlt">simulations</span> demonstrates enhanced performance with a compact set of instructions. PMID:24732497</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1185588','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1185588"><span id="translatedtitle">Improving the Performance of the Extreme-scale <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Engelmann, Christian; Naughton III, Thomas J</p> <p>2014-01-01</p> <p>Investigating the performance of <span class="hlt">parallel</span> applications at scale on future high-performance computing (HPC) architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale <span class="hlt">Simulator</span> (xSim) is a <span class="hlt">simulation</span>-based toolkit for investigating the performance of <span class="hlt">parallel</span> applications at scale. xSim scales to millions of <span class="hlt">simulated</span> Message Passing Interface (MPI) processes. The overhead introduced by a <span class="hlt">simulation</span> tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span> management overhead and (2) a new <span class="hlt">simulated</span> MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement, such as by reducing the <span class="hlt">simulation</span> overhead for running the NAS <span class="hlt">Parallel</span> Benchmark suite inside the <span class="hlt">simulator</span> from 1,020\\% to 238% for the conjugate gradient (CG) benchmark and from 102% to 0% for the embarrassingly <span class="hlt">parallel</span> (EP) and benchmark, as well as, from 37,511% to 13,808% for CG and from 3,332% to 204% for EP with accurate process failure <span class="hlt">simulation</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CG.....89..174K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CG.....89..174K"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">simulation</span> of particle transport in an advection field applied to volcanic explosive eruptions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Künzli, Pierre; Tsunematsu, Kae; Albuquerque, Paul; Falcone, Jean-Luc; Chopard, Bastien; Bonadonna, Costanza</p> <p>2016-04-01</p> <p>Volcanic ash transport and dispersal models typically describe particle motion via a turbulent velocity field. Particles are advected inside this field from the moment they leave the vent of the volcano until they deposit on the ground. Several techniques exist to <span class="hlt">simulate</span> particles in an advection field such as finite difference Eulerian, Lagrangian-puff or pure Lagrangian techniques. In this paper, we present a new flexible <span class="hlt">simulation</span> tool called TETRAS (TEphra TRAnsport <span class="hlt">Simulator</span>) based on a hybrid Eulerian-Lagrangian model. This scheme offers the advantages of being numerically stable with no numerical diffusion and easily parallelizable. It also allows us to output particle atmospheric concentration or ground mass load at any given time. The model is validated using the advection-diffusion analytical equation. We also obtained a good agreement with field observations of the tephra deposit associated with the 2450 BP Pululagua (Ecuador) and the 1996 Ruapehu (New Zealand) eruptions. As this kind of model can lead to computationally intensive <span class="hlt">simulations</span>, a <span class="hlt">parallelization</span> on a distributed memory architecture was developed. A related performance model, taking into account load imbalance, is proposed and its accuracy tested.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200..324N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200..324N"><span id="translatedtitle">MaMiCo: Software design for <span class="hlt">parallel</span> molecular-continuum flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim</p> <p>2016-03-01</p> <p>The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum <span class="hlt">simulations</span>, retaining sequential and <span class="hlt">parallel</span> performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow <span class="hlt">simulations</span> which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum <span class="hlt">simulations</span>. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid <span class="hlt">simulations</span> is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012JASMS..23.1609S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012JASMS..23.1609S"><span id="translatedtitle">Application of <span class="hlt">Parallel</span> Hybrid Algorithm in Massively <span class="hlt">Parallel</span> GPGPU—The Improved Effective and Efficient Method for Calculating Coulombic Interactions in <span class="hlt">Simulations</span> of Many Ions with SIMION</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Saito, Kenichiro; Koizumi, Eiko; Koizumi, Hideya</p> <p>2012-09-01</p> <p>In our previous study, we introduced a new hybrid approach to effectively approximate the total force on each ion during a trajectory calculation in mass spectrometry device <span class="hlt">simulations</span>, and the algorithm worked successfully with SIMION. We took one step further and applied the method in massively <span class="hlt">parallel</span> general-purpose computing with GPU (GPGPU) to test its performance in <span class="hlt">simulations</span> with thousands to over a million ions. We took extra care to minimize the barrier synchronization and data transfer between the host (CPU) and the device (GPU) memory, and took full advantage of the latency hiding. <span class="hlt">Parallel</span> codes were written in CUDA C++ and implemented to SIMION via the user-defined Lua program. In this study, we tested the <span class="hlt">parallel</span> hybrid algorithm with a couple of basic models and analyzed the performance by comparing it to that of the original, fully-explicit method written in serial code. The Coulomb explosion <span class="hlt">simulation</span> with 128,000 ions was completed in 309 s, over 700 times faster than the 63 h taken by the original explicit method in which we evaluated two-body Coulomb interactions explicitly on one ion with each of all the other ions. The <span class="hlt">simulation</span> of 1,024,000 ions was completed in 2650 s. In another example, we applied the hybrid method on a <span class="hlt">simulation</span> of ions in a simple quadrupole ion storage model with 100,000 ions, and it only took less than 10 d. Based on our estimate, the same <span class="hlt">simulation</span> is expected to take 5-7 y by the explicit method in serial code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920870','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920870"><span id="translatedtitle">6th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Schulz, M; Trinitis, C</p> <p>2007-07-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and many-core, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in <span class="hlt">parallel</span> algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 6th year ParSim continues to build a bridge between computer science and the application disciplines and to help with fostering cooperations between the different fields. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a shorter turn-around time. This offers the unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, ten papers with authors in ten countries were submitted to ParSim, and after a quick turn-around, yet thorough review process we decided to accept three of them for publication and presentation during the ParSim session. These three papers show the use of <span class="hlt">simulation</span> in a range of different application fields including earthquake and turbulence <span class="hlt">simulation</span>. At the same time, they also address computer science aspects and discuss different <span class="hlt">parallelization</span> strategies, programming models and environments, as well as scalability. We are confident that this provides an attractive program and that ParSim will yet again be an informal setting for lively discussions and for fostering new</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011AGUFMIN11D..03W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011AGUFMIN11D..03W"><span id="translatedtitle">An evaluation of <span class="hlt">parallelization</span> strategies for low-frequency electromagnetic induction <span class="hlt">simulators</span> using staggered grid discretizations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Weiss, C. J.; Schultz, A.</p> <p>2011-12-01</p> <p>The high computational cost of the forward solution for modeling low-frequency electromagnetic induction phenomena is one of the primary impediments against broad-scale adoption by the geoscience community of exploration techniques, such as magnetotellurics and geomagnetic depth sounding, that rely on fast and cheap forward solutions to make tractable the inverse problem. As geophysical observables, electromagnetic fields are direct indicators of Earth's electrical conductivity - a physical property independent of (but in some cases correlative with) seismic wavespeed. Electrical conductivity is known to be a function of Earth's physiochemical state and temperature, and to be especially sensitive to the presence of fluids, melts and volatiles. Hence, electromagnetic methods offer a critical and independent constraint on our understanding of Earth's interior processes. Existing methods for <span class="hlt">parallelization</span> of time-harmonic electromagnetic <span class="hlt">simulators</span>, as applied to geophysics, have relied heavily on a combination of strategies: coarse-grained decompositions of the model domain; and/or, a high-order functional decomposition across spectral components, which in turn can be domain-decomposed themselves. Hence, in terms of scaling, both approaches are ultimately limited by the growing communication cost as the granularity of the forward problem increases. In this presentation we examine alternate <span class="hlt">parallelization</span> strategies based on OpenMP shared-memory <span class="hlt">parallelization</span> and CUDA-based GPU <span class="hlt">parallelization</span>. As a test case, we use two different numerical <span class="hlt">simulation</span> packages, each based on a staggered Cartesian grid: FDM3D (Weiss, 2006) which solves the curl-curl equation directly in terms of the scattered electric field (available under the LGPL at www.openem.org); and APHID, the A-Phi Decomposition based on mixed vector and scalar potentials, in which the curl-curl operator is replaced operationally by the vector Laplacian. We describe progress made in modifying the code to</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/6818542','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/6818542"><span id="translatedtitle">Forced-to-natural convection transition tests in <span class="hlt">parallel</span> <span class="hlt">simulated</span> liquid metal reactor fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Levin, A.E. ); Montgomery, B.H. )</p> <p>1990-01-01</p> <p>The Thermal-Hydraulic Out of Reactor Safety (THORS) Program at Oak Ridge National Laboratory (ORNL) had as its objective the testing of <span class="hlt">simulated</span>, electrically heated liquid metal reactor (LMR) fuel assemblies in an engineering-scale, sodium loop. Between 1971 and 1985, the THORS Program operated 11 <span class="hlt">simulated</span> fuel bundles in conditions covering a wide range of normal and off-normal conditions. The last test series in the Program, THORS-SHRS Assembly 1, employed two <span class="hlt">parallel</span>, 19-pin, full-length, <span class="hlt">simulated</span> fuel assemblies of a design consistent with the large LMR (Large Scale Prototype Breeder -- LSPB) under development at that time. These bundles were installed in the THORS Facility, allowing single- and <span class="hlt">parallel</span>-bundle testing in thermal-hydraulic conditions up to and including sodium boiling and dryout. As the name SHRS (Shutdown Heat Removal System) implies, a major objective of the program was testing under conditions expected during low-power reactor operation, including low-flow forced convection, natural convection, and forced-to-natural convection transition at various powers. The THORS-SHRS Assembly 1 experimental program was divided up into four phases. Phase 1 included preliminary and shakedown tests, including the collection of baseline steady-state thermal-hydraulic data. Phase 2 comprised natural convection testing. Forced convection testing was conducted in Phase 3. The final phase of testing included forced-to-natural convection transition tests. Phases 1, 2, and 3 have been discussed in previous papers. The fourth phase is described in this paper. 3 refs., 2 figs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://files.eric.ed.gov/fulltext/ED210037.pdf','ERIC'); return false;" href="http://files.eric.ed.gov/fulltext/ED210037.pdf"><span id="translatedtitle">Multiple-Instruction, Multiple-Data Path Computers: <span class="hlt">Parallel</span> Processing Impact on Flight <span class="hlt">Simulation</span> Software. Final Report.</span></a></p> <p><a target="_blank" href="http://www.eric.ed.gov/ERICWebPortal/search/extended.jsp?_pageLabel=advanced">ERIC Educational Resources Information Center</a></p> <p>Lord, Robert E.; And Others</p> <p></p> <p>The purpose of this study was to evaluate the <span class="hlt">parallel</span> processing impact of multiple-instruction multiple-data path (MIMD) computers on flight <span class="hlt">simulation</span> software. Basic mathematical functions and arithmetic expressions from typical flight <span class="hlt">simulation</span> software were selected and run on an MIMD computer to evaluate the improvement in execution time…</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910032684&hterms=elements+hardware&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D60%26Ntt%3Delements%2Bhardware','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910032684&hterms=elements+hardware&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D60%26Ntt%3Delements%2Bhardware"><span id="translatedtitle">A comparison of real-time blade-element and rotor-map helicopter <span class="hlt">simulations</span> using <span class="hlt">parallel</span> processing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Corliss, Lloyd; Du Val, Ronald W.; Gillman, Herbert, III; Huynh, Loc C.</p> <p>1990-01-01</p> <p>In recent efforts by NASA, the Army, and Advanced Rotorcraft Technology, Inc. (ART), the application of <span class="hlt">parallel</span> processing techniques to real-time <span class="hlt">simulation</span> have been studied. Traditionally, real-time helicopter <span class="hlt">simulations</span> have omitted the modeling of high-frequency phenomena in order to achieve real-time operation on affordable computers. <span class="hlt">Parallel</span> processing technology can now provide the means for significantly improving the fidelity of real-time <span class="hlt">simulation</span>, and one specific area for improvement is the modeling of rotor dynamics. This paper focuses on the results of a piloted <span class="hlt">simulation</span> in which a traditional rotor-map mathematical model was compared with a more sophisticated blade-element mathematical model that had been implemented using <span class="hlt">parallel</span> processing hardware and software technology.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013AGUFMIN23A1416G&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013AGUFMIN23A1416G&link_type=ABSTRACT"><span id="translatedtitle">Accelerating Dust Storm <span class="hlt">Simulation</span> by Balancing Task Allocation in <span class="hlt">Parallel</span> Computing Environment</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.</p> <p>2013-12-01</p> <p>Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm <span class="hlt">simulation</span> is a data and computing intensive process. Normally, a <span class="hlt">simulation</span> for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a <span class="hlt">parallel</span> fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm <span class="hlt">simulation</span>, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the <span class="hlt">paralleling</span>. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/966572','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/966572"><span id="translatedtitle">8th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Trinitis, C; Bader, M; Schulz, M</p> <p>2009-06-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Significant progress in CPU architecture (multi- and many-core CPUs, SMT, transactional memory, virtualization support, shared caches etc.) system scalability, and interconnect technology, continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 8th year, ParSim continues to build a bridge between application disciplines and computer science and to help fostering closer cooperations between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. We believe that this offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables participants to present and discuss their work within the scope of both the session and the host conference. This year, five papers from authors in five countries were submitted to Par-Sim, and we selected three of them. They cover a range of different application fields including mechanical engineering, material science, and structural engineering <span class="hlt">simulations</span>. We are confident that this resulted in an attractive special session and that this will be an informal setting for lively discussions as well as for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Jan Westerholm, Juha</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1996gmu..rept.....W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1996gmu..rept.....W"><span id="translatedtitle">Development of a Massively <span class="hlt">Parallel</span> Particle-Mesh Algorithm for <span class="hlt">Simulations</span> of Galaxy Dynamics and Plasmas</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wallin, John</p> <p>1996-01-01</p> <p>Particle-mesh calculations treat forces and potentials as field quantities which are represented approximately on a mesh. A system of particles is mapped onto this mesh as a density distribution of mass or charge. The Fourier transform is used to convolve this distribution with the Green's function of the potential, and a finite difference scheme is used to calculate the forces acting on the particles. The computation time scales as the Ng log Ng, where Ng is the size of the computational grid. In contrast, the particle-particle method's computing time relies on direct summation, so the time for each calculation is given by Np2, where Np is the number of particles. The particle-mesh method is best suited for <span class="hlt">simulations</span> with a fixed minimum resolution and for collisionless systems, while hierarchical tree codes have proven to be superior for collisional systems where two-body interactions are important. Particle mesh methods still dominate in plasma physics where collisionless systems are modeled. The CM-200 Connection Machine produced by Thinking Machines Corp. is a data <span class="hlt">parallel</span> system. On this system, the front-end computer controls the timing and execution of the <span class="hlt">parallel</span> processing units. The programming paradigm is Single-Instruction, Multiple Data (SIMD). The processors on the CM-200 are connected in an N-dimensional hypercube; the largest number of links a message will ever have to make is N. As in all <span class="hlt">parallel</span> computing, the efficiency of an algorithm is primarily determined by the fraction of the time spent communicating compared to that spent computing. Because of the topology of the processors, nearest neighbor communication is more efficient than general communication.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/974699','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/974699"><span id="translatedtitle">Massively <span class="hlt">parallel</span> <span class="hlt">simulation</span> with DOE's ASCI supercomputers : an overview of the Los Alamos Crestone project</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Weaver, R. P.; Gittings, M. L.</p> <p>2004-01-01</p> <p>The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively <span class="hlt">parallel</span> hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-<span class="hlt">parallel</span> versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest <span class="hlt">simulations</span> we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these <span class="hlt">simulations</span>. Key features of any massively <span class="hlt">parallel</span> system</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014PhDT........13Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014PhDT........13Z"><span id="translatedtitle">Scalable <span class="hlt">parallel</span> programming for high performance seismic <span class="hlt">simulation</span> on petascale heterogeneous supercomputers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhou, Jun</p> <p></p> <p>The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale <span class="hlt">simulations</span> are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" <span class="hlt">simulations</span> with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable <span class="hlt">parallel</span> programming techniques for high performance seismic <span class="hlt">simulation</span> running on petascale heterogeneous supercomputers. A real world earthquake <span class="hlt">simulation</span> code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake <span class="hlt">simulation</span> workflow has also been developed to support the efficient production sets of <span class="hlt">simulations</span>. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhDT.......119R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhDT.......119R"><span id="translatedtitle"><span class="hlt">Parallel</span> Algorithms for Monte Carlo Particle Transport <span class="hlt">Simulation</span> on Exascale Computing Architectures</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Romano, Paul Kollath</p> <p></p> <p>Monte Carlo particle transport methods are being considered as a viable option for high-fidelity <span class="hlt">simulation</span> of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in <span class="hlt">parallel</span> efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear <span class="hlt">parallel</span> scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the <span class="hlt">simulation</span>. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed <span class="hlt">simulations</span>. The analysis demonstrated that load imbalances in domain decomposed <span class="hlt">simulations</span> arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006CoPhC.175..440B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006CoPhC.175..440B"><span id="translatedtitle">A package of Linux scripts for the <span class="hlt">parallelization</span> of Monte Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Badal, Andreu; Sempau, Josep</p> <p>2006-09-01</p> <p>Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) <span class="hlt">simulation</span> involves a prohibitively large amount of time. This limitation can be overcome by having recourse to <span class="hlt">parallel</span> computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a <span class="hlt">parallelization</span> scheme of a MC <span class="hlt">simulation</span> that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the <span class="hlt">parallel</span> calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to <span class="hlt">parallelize</span> PENELOPE, a code for MC <span class="hlt">simulation</span> of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in <span class="hlt">parallel</span> easily, without requiring specific libraries or significant alterations of the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016ApJ...823....7H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016ApJ...823....7H"><span id="translatedtitle">Ion Dynamics at a Rippled Quasi-<span class="hlt">parallel</span> Shock: 2D Hybrid <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Hao, Yufei; Lu, Quanming; Gao, Xinliang; Wang, Shui</p> <p>2016-05-01</p> <p>In this paper, two-dimensional hybrid <span class="hlt">simulations</span> are performed to investigate ion dynamics at a rippled quasi-<span class="hlt">parallel</span> shock. The results show that the ripples around the shock front are inherent structures of a quasi-<span class="hlt">parallel</span> shock, and the re-formation of the shock is not synchronous along the surface of the shock front. By following the trajectories of the upstream ions, we find that these ions behave differently when they interact with the shock front at different positions along the shock surface. The upstream particles are transmitted more easily through the upper part of a ripple, and the corresponding bulk velocity downstream is larger, where a high-speed jet is formed. In the lower part of the ripple, the upstream particles tend to be reflected by the shock. Ions reflected by the shock may suffer multiple-stage acceleration when moving along the shock surface or trapped between the upstream waves and the shock front. Finally, these ions may escape further upstream or move downstream; therefore, superthermal ions can be found both upstream and downstream.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S"><span id="translatedtitle">MPI <span class="hlt">parallelization</span> of Vlasov codes for the <span class="hlt">simulation</span> of nonlinear laser-plasma interactions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.</p> <p>2003-10-01</p> <p>The <span class="hlt">simulation</span> of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive <span class="hlt">parallelization</span>. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for <span class="hlt">parallel</span> computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880008905','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880008905"><span id="translatedtitle">Experiences with serial and <span class="hlt">parallel</span> algorithms for channel routing using <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Brouwer, Randall Jay</p> <p>1988-01-01</p> <p>Two algorithms for channel routing using <span class="hlt">simulated</span> annealing are presented. <span class="hlt">Simulated</span> annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of <span class="hlt">simulated</span> annealing. The algorithm was implemented as a serial program for a workstation, and a <span class="hlt">parallel</span> program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_21 --> <div id="page_22" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="421"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010PhPl...17g3107W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010PhPl...17g3107W"><span id="translatedtitle">Three-dimensional <span class="hlt">parallel</span> UNIPIC-3D code for <span class="hlt">simulations</span> of high-power microwave devices</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan</p> <p>2010-07-01</p> <p>This paper introduces a self-developed, three-dimensional <span class="hlt">parallel</span> fully electromagnetic particle <span class="hlt">simulation</span> code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to <span class="hlt">simulate</span> the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the <span class="hlt">simulated</span> HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20160006398','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20160006398"><span id="translatedtitle"><span class="hlt">Parallel</span> Adjective High-Order CFD <span class="hlt">Simulations</span> Characterizing SOFIA Cavity Acoustics</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak</p> <p>2016-01-01</p> <p>This paper presents large-scale MPI-<span class="hlt">parallel</span> computational uid dynamics <span class="hlt">simulations</span> for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These <span class="hlt">simulations</span> focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy <span class="hlt">simulations</span>. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1050409','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1050409"><span id="translatedtitle">Mechanisms for the convergence of time-<span class="hlt">parallelized</span>, parareal turbulent plasma <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Reynolds-Barredo, J.; Newman, David E; Sanchez, R.; Samaddar, D.; Berry, Lee A; Elwasif, Wael R</p> <p>2012-01-01</p> <p>Parareal is a recent algorithm able to <span class="hlt">parallelize</span> the time dimension in spite of its sequential nature. It has been applied to several linear and nonlinear problems and, very recently, to a <span class="hlt">simulation</span> of fully-developed, two-dimensional drift wave turbulence. The mere fact that parareal works in such a turbulent regime is in itself somewhat unexpected, due to the characteristic sensitivity of turbulence to any change in initial conditions. This fundamental property of any turbulent system should render the iterative correction procedure characteristic of the parareal method inoperative, but this seems not to be the case. In addition, the choices that must be made to implement parareal (division of the temporal domain, election of the coarse solver and so on) are currently made using trial-and-error approaches. Here, we identify the mechanisms responsible for the convergence of parareal of these <span class="hlt">simulations</span> of drift wave turbulence. We also investigate which conditions these mechanisms impose on any successful parareal implementation. The results reported here should be useful to guide future implementations of parareal within the much wider context of fully-developed fluid and plasma turbulent <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..MARM27008W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..MARM27008W"><span id="translatedtitle">Large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> of short pulse laser interaction with metals</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Chengping; Zhigilei, Leonid; Computational Materials Group Team</p> <p>2014-03-01</p> <p>Taking advantage of petascale supercomputing architectures, large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> (108-109 atoms) are performed to study the microscopic mechanisms of short pulse laser interaction with metals. The results of the <span class="hlt">simulations</span> reveal a complex picture of highly non-equilibrium processes responsible for material modification and/or ejection. At low laser fluences below the ablation threshold, fast melting and resolidification occur under conditions of extreme heating and cooling rates resulting in surface microstructure modification. At higher laser fluences in the spallation regime, the material is ejected by the relaxation of laser-induced stresses and proceeds through the nucleation, growth and percolation of multiple voids in the sub-surface region of the irradiated target. At a fluence of ~ 2.5 times the spallation threshold, the top part of the target reaches the conditions for an explosive decomposition into vapor and small droplets, marking the transition to the phase explosion regime of laser ablation. The dynamics of plume formation and the characteristics of the ablation plume are obtained from the <span class="hlt">simulations</span> and compared with the results of time-resolved plume imaging experiments. Financial support for this work was provided by NSF (DMR-0907247 and CMMI-1301298) and AFOSR (FA9550-10-1-0541). Computational support was provided by the OLCF (MAT048) and XSEDE (TG-DMR110090).</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015APS..DFD.E9006P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015APS..DFD.E9006P"><span id="translatedtitle">A 3D MPI-<span class="hlt">Parallel</span> GPU-accelerated framework for <span class="hlt">simulating</span> ocean wave energy converters</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Pathak, Ashish; Raessi, Mehdi</p> <p>2015-11-01</p> <p>We present an MPI-<span class="hlt">parallel</span> GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC <span class="hlt">simulations</span>. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables <span class="hlt">simulations</span> at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical <span class="hlt">simulations</span> of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22043417','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22043417"><span id="translatedtitle">The role of the electron convection term for the <span class="hlt">parallel</span> electric field and electron acceleration in MHD <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.</p> <p>2011-08-15</p> <p>There has been a great concern about the origin of the <span class="hlt">parallel</span> electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to <span class="hlt">simulate</span> magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with <span class="hlt">simulation</span> results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when <span class="hlt">parallel</span> current flows. The electron convection term enables us to describe a situation in which a <span class="hlt">parallel</span> electric field and <span class="hlt">parallel</span> electron acceleration coexist, which is impossible for ideal or resistive MHD.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016CoPhC.204...74Z&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2016CoPhC.204...74Z&link_type=ABSTRACT"><span id="translatedtitle"><span class="hlt">Parallel</span> two-level domain decomposition based Jacobi-Davidson algorithms for pyramidal quantum dot <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhao, Tao; Hwang, Feng-Nan; Cai, Xiao-Chuan</p> <p>2016-07-01</p> <p>We consider a quintic polynomial eigenvalue problem arising from the finite volume discretization of a quantum dot <span class="hlt">simulation</span> problem. The problem is solved by the Jacobi-Davidson (JD) algorithm. Our focus is on how to achieve the quadratic convergence of JD in a way that is not only efficient but also scalable when the number of processor cores is large. For this purpose, we develop a projected two-level Schwarz preconditioned JD algorithm that exploits multilevel domain decomposition techniques. The pyramidal quantum dot calculation is carefully studied to illustrate the efficiency of the proposed method. Numerical experiments confirm that the proposed method has a good scalability for problems with hundreds of millions of unknowns on a <span class="hlt">parallel</span> computer with more than 10,000 processor cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/372178','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/372178"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span>, and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, R.L.</p> <p>1996-07-01</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed /and its operating modes for a 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using a constant current model and the Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of a 1-kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500-W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging power factor (PF) mode for the entire load range and requires a narrow variation in switching frequency, to adequately regulate the output power.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W"><span id="translatedtitle"><span class="hlt">Simulation</span> of the Quasi-Monoenergetic Protons Generation by <span class="hlt">Parallel</span> Laser Pulses Interaction with Foils</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu</p> <p>2014-11-01</p> <p>A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-<span class="hlt">parallel</span> laser pulses is proposed. Particle-in-cell <span class="hlt">simulation</span> shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 × 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4° is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014snam.conf04304F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014snam.conf04304F"><span id="translatedtitle">Hybrid <span class="hlt">parallel</span> strategy for the <span class="hlt">simulation</span> of fast transient accidental situations at reactor scale</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.</p> <p>2014-06-01</p> <p>This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to <span class="hlt">simulate</span> the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The <span class="hlt">parallel</span> acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/19051924','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/19051924"><span id="translatedtitle">Non-equilibrium molecular dynamics <span class="hlt">simulation</span> of nanojet injection with adaptive-spatial decomposition <span class="hlt">parallel</span> algorithm.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shin, Hyun-Ho; Yoon, Woong-Sup</p> <p>2008-07-01</p> <p>An Adaptive-Spatial Decomposition <span class="hlt">parallel</span> algorithm was developed to increase computation efficiency for molecular dynamics <span class="hlt">simulations</span> of nano-fluids. Injection of a liquid argon jet with a scale of 17.6 molecular diameters was investigated. A solid annular platinum injector was also solved simultaneously with the liquid injectant by adopting a solid modeling technique which incorporates phantom atoms. The viscous heat was naturally discharged through the solids so the liquid boiling problem was avoided with no separate use of temperature controlling methods. Parametric investigations of injection speed, wall temperature, and injector length were made. A sudden pressure drop at the orifice exit causes flash boiling of the liquid departing the nozzle exit with strong evaporation on the surface of the liquids, while rendering a slender jet. The elevation of the injection speed and the wall temperature causes an activation of the surface evaporation concurrent with reduction in the jet breakup length and the drop size. PMID:19051924</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1165004','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1165004"><span id="translatedtitle">Acceleration of the matrix multiplication of Radiance three phase daylighting <span class="hlt">simulations</span> with <span class="hlt">parallel</span> computing on heterogeneous hardware of personal computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.</p> <p>2013-05-23</p> <p>Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting <span class="hlt">simulation</span> program, has been used to conduct daylighting <span class="hlt">simulations</span> for complex fenestration systems. Depending on the configurations, the <span class="hlt">simulation</span> can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight <span class="hlt">simulation</span> by conducting <span class="hlt">parallel</span> computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in <span class="hlt">parallel</span> using OpenCL. The speed of new approach was evaluated using various daylighting <span class="hlt">simulation</span> cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting <span class="hlt">simulation</span>, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3306636','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3306636"><span id="translatedtitle">Macro-scale phenomena of arterial coupled cells: a massively <span class="hlt">parallel</span> <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Shaikh, Mohsin Ahmed; Wall, David J. N.; David, Tim</p> <p>2012-01-01</p> <p>Impaired mass transfer characteristics of blood-borne vasoactive species such as adenosine triphosphate in regions such as an arterial bifurcation have been hypothesized as a prospective mechanism in the aetiology of atherosclerotic lesions. Arterial endothelial cells (ECs) and smooth muscle cells (SMCs) respond differentially to altered local haemodynamics and produce coordinated macro-scale responses via intercellular communication. Using a computationally designed arterial segment comprising large populations of mathematically modelled coupled ECs and SMCs, we investigate their response to spatial gradients of blood-borne agonist concentrations and the effect of micro-scale-driven perturbation on the macro-scale. Altering homocellular (between same cell type) and heterocellular (between different cell types) intercellular coupling, we <span class="hlt">simulated</span> four cases of normal and pathological arterial segments experiencing an identical gradient in the concentration of the agonist. Results show that the heterocellular calcium (Ca2+) coupling between ECs and SMCs is important in eliciting a rapid response when the vessel segment is stimulated by the agonist gradient. In the absence of heterocellular coupling, homocellular Ca2+ coupling between SMCs is necessary for propagation of Ca2+ waves from downstream to upstream cells axially. Desynchronized intracellular Ca2+ oscillations in coupled SMCs are mandatory for this propagation. Upon decoupling the heterocellular membrane potential, the arterial segment looses the inhibitory effect of ECs on the Ca2+ dynamics of the underlying SMCs. The full system comprises hundreds of thousands of coupled nonlinear ordinary differential equations <span class="hlt">simulated</span> on the massively <span class="hlt">parallel</span> Blue Gene architecture. The use of massively <span class="hlt">parallel</span> computational architectures shows the capability of this approach to address macro-scale phenomena driven by elementary micro-scale components of the system. PMID:21920960</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/10108404','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/10108404"><span id="translatedtitle">Implementation of a <span class="hlt">parallel</span> algorithm for thermo-chemical nonequilibrium flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.</p> <p>1995-01-01</p> <p>Massively <span class="hlt">parallel</span> (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to <span class="hlt">simulate</span> a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale <span class="hlt">simulation</span> with a large number of computer nodes. This paper will discuss the implementation of a massively <span class="hlt">parallel</span> computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2008SPIE.6924E..0YT','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2008SPIE.6924E..0YT"><span id="translatedtitle">Massively-<span class="hlt">parallel</span> FDTD <span class="hlt">simulations</span> to address mask electromagnetic effects in hyper-NA immersion lithography</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Tirapu Azpiroz, Jaione; Burr, Geoffrey W.; Rosenbluth, Alan E.; Hibbs, Michael</p> <p>2008-03-01</p> <p>In the Hyper-NA immersion lithography regime, the electromagnetic response of the reticle is known to deviate in a complicated manner from the idealized Thin-Mask-like behavior. Already, this is driving certain RET choices, such as the use of polarized illumination and the customization of reticle film stacks. Unfortunately, full 3-D electromagnetic mask <span class="hlt">simulations</span> are computationally intensive. And while OPC-compatible mask electromagnetic field (EMF) models can offer a reasonable tradeoff between speed and accuracy for full-chip OPC applications, full understanding of these complex physical effects demands higher accuracy. Our paper describes recent advances in leveraging High Performance Computing as a critical step towards lithographic modeling of the full manufacturing process. In this paper, highly accurate full 3-D electromagnetic <span class="hlt">simulation</span> of very large mask layouts are conducted in <span class="hlt">parallel</span> with reasonable turnaround time, using a Blue- Gene/L supercomputer and a Finite-Difference Time-Domain (FDTD) code developed internally within IBM. A 3-D <span class="hlt">simulation</span> of a large 2-D layout spanning 5μm×5μm at the wafer plane (and thus (20μm×20μm×0.5μm at the mask) results in a <span class="hlt">simulation</span> with roughly 12.5GB of memory (grid size of 10nm at the mask, single-precision computation, about 30 bytes/grid point). FDTD is flexible and easily parallelizable to enable full <span class="hlt">simulations</span> of such large layout in approximately an hour using one BlueGene/L "midplane" containing 512 dual-processor nodes with 256MB of memory per processor. Our scaling studies on BlueGene/L demonstrate that <span class="hlt">simulations</span> up to 100μm × 100μm at the mask can be computed in a few hours. Finally, we will show that the use of a subcell technique permits accurate <span class="hlt">simulation</span> of features smaller than the grid discretization, thus improving on the tradeoff between computational complexity and <span class="hlt">simulation</span> accuracy. We demonstrate the correlation of the real and quadrature components that comprise the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1009919','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1009919"><span id="translatedtitle">On Deciding between Conservative and Optimistic Approaches on Massively <span class="hlt">Parallel</span> Platforms</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Carothers, Prof. Christopher D.; Perumalla, Kalyan S</p> <p>2010-01-01</p> <p>Over 5000 publications on <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span> (PDES) have appeared in the literature to date. Nevertheless, few articles have focused on empirical studies of PDES performance on large supercomputer-based systems. This gap is bridged here, by undertaking a parameterized performance study on thousands of processor cores of a Blue Gene supercomputing system. In contrast to theoretical insights from analytical studies, our study is based on actual implementation in software, incurring the actual messaging and computational overheads for both conservative and optimistic synchronization approaches of PDES. Complex and counter-intuitive effects are uncovered and analyzed, with different event timestamp distributions and available levels of concurrency in the synthetic benchmark models. The results are intended to provide guidance to the PDES community in terms of how the synchronization protocols behave at high processor core counts using a state-of-the-art supercomputing systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010JGRB..11512101H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010JGRB..11512101H"><span id="translatedtitle">A <span class="hlt">parallel</span> 3-D staggered grid pseudospectral time domain method for ground-penetrating radar wave <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Huang, Qinghua; Li, Zhanhui; Wang, Yanbin</p> <p>2010-12-01</p> <p>We presented a <span class="hlt">parallel</span> 3-D staggered grid pseudospectral time domain (PSTD) method for <span class="hlt">simulating</span> ground-penetrating radar (GPR) wave propagation. We took the staggered grid method to weaken the global effect in PSTD and developed a modified fast Fourier transform (FFT) spatial derivative operator to eliminate the wraparound effect due to the implicit periodical boundary condition in FFT operator. After the above improvements, we achieved the <span class="hlt">parallel</span> PSTD computation based on an overlap domain decomposition method without any absorbing condition for each subdomain, which can significantly reduce the required grids in each overlap subdomain comparing with other proposed algorithms. We test our <span class="hlt">parallel</span> technique for some numerical models and obtained consistent results with the analytical ones and/or those of the nonparallel PSTD method. The above numerical tests showed that our <span class="hlt">parallel</span> PSTD algorithm is effective in <span class="hlt">simulating</span> 3-D GPR wave propagation, with merits of saving computation time, as well as more flexibility in dealing with complicated models without losing the accuracy. The application of our <span class="hlt">parallel</span> PSTD method in applied geophysics and paleoseismology based on GPR data confirmed the efficiency of our algorithm and its potential applications in various subdisciplines of solid earth geophysics. This study would also provide a useful <span class="hlt">parallel</span> PSTD approach to the <span class="hlt">simulation</span> of other geophysical problems on distributed memory PC cluster.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016IJCFD..30...79Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016IJCFD..30...79Z"><span id="translatedtitle">Implementation and efficiency analysis of <span class="hlt">parallel</span> computation using OpenACC: a case study using flow field <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhang, Shanghong; Yuan, Rui; Wu, Yu; Yi, Yujun</p> <p>2016-01-01</p> <p>The Open Accelerator (OpenACC) application programming interface is a relatively new <span class="hlt">parallel</span> computing standard. In this paper, particle-based flow field <span class="hlt">simulations</span> are examined as a case study of OpenACC <span class="hlt">parallel</span> computation. The <span class="hlt">parallel</span> conversion process of the OpenACC standard is explained, and further, the performance of the flow field <span class="hlt">parallel</span> model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the <span class="hlt">parallel</span> algorithm, a speedup factor of 18.26× is possible. In contrast, a speedup factor of just 11.77× was achieved with the conventional Open Multi-Processing (OpenMP) <span class="hlt">parallel</span> mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC <span class="hlt">parallel</span> mode is found to have good portability, making it easy to implement <span class="hlt">parallel</span> computation from the original serial model.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200...57J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200...57J"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji</p> <p>2016-03-01</p> <p>Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer <span class="hlt">simulations</span> and data analyses, including molecular dynamics (MD) <span class="hlt">simulations</span>. In this study, we develop hybrid (MPI+OpenMP) <span class="hlt">parallelization</span> schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD <span class="hlt">simulations</span>. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to <span class="hlt">simulate</span> one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22253805','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22253805"><span id="translatedtitle"><span class="hlt">Parallel</span> kinetic Monte Carlo <span class="hlt">simulation</span> framework incorporating accurate models of adsorbate lateral interactions</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nielsen, Jens; D’Avezac, Mayeul; Hetherington, James; Stamatakis, Michail</p> <p>2013-12-14</p> <p>Ab initio kinetic Monte Carlo (KMC) <span class="hlt">simulations</span> have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These <span class="hlt">simulations</span> necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for <span class="hlt">simulating</span> catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce <span class="hlt">parallelization</span> with OpenMP. We further benchmark our framework by <span class="hlt">simulating</span> a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_22 --> <div id="page_23" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="441"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/929325','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/929325"><span id="translatedtitle">Progress on H5Part: A Portable High Performance <span class="hlt">Parallel</span> DataInterface for Electromagnetics <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger,Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt</p> <p>2007-06-22</p> <p>Significant problems facing all experimental andcomputationalsciences arise from growing data size and complexity. Commonto allthese problems is the need to perform efficient data I/O ondiversecomputer architectures. In our scientific application, thelargestparallel particle <span class="hlt">simulations</span> generate vast quantitiesofsix-dimensional data. Such a <span class="hlt">simulation</span> run produces data foranaggregate data size up to several TB per run. Motived by the needtoaddress data I/O and access challenges, we have implemented H5Part,anopen source data I/O API that simplifies the use of the HierarchicalDataFormat v5 library (HDF5). HDF5 is an industry standard forhighperformance, cross-platform data storage and retrieval that runsonall contemporary architectures from large <span class="hlt">parallel</span> supercomputerstolaptops. H5Part, which is oriented to the needs of the particlephysicsand cosmology communities, provides support for parallelstorage andretrieval of particles, structured and in the future unstructuredmeshes.In this paper, we describe recent work focusing on I/O supportforparticles and structured meshes and provide data showing performance onmodernsupercomputer architectures like the IBM POWER 5.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016NewA...43...49B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016NewA...43...49B"><span id="translatedtitle">Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively <span class="hlt">parallel</span> star formation <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.</p> <p>2016-02-01</p> <p>We present an algorithm for solving the radiative transfer problem on massively <span class="hlt">parallel</span> computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation <span class="hlt">simulations</span> with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse <span class="hlt">simulations</span> resembling the early stages of protostar and disc formation.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013ApJ...776...46E','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013ApJ...776...46E"><span id="translatedtitle">Monte Carlo <span class="hlt">Simulations</span> of Nonlinear Particle Acceleration in <span class="hlt">Parallel</span> Trans-relativistic Shocks</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M.</p> <p>2013-10-01</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and "smooth" the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor γ0 = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006PhDT........83C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006PhDT........83C"><span id="translatedtitle">Giant impacts during planet formation: <span class="hlt">Parallel</span> tree code <span class="hlt">simulations</span> using smooth particle hydrodynamics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cohen, Randi L.</p> <p></p> <p>There is both theoretical and observational evidence that giant planets collided with objects ≥ Mearth during their evolution. These impacts may play a key role in giant planet formation. This paper describes impacts of a ˜ Earth-mass object onto a suite of proto-giant-planets, as <span class="hlt">simulated</span> using an SPH <span class="hlt">parallel</span> tree code. We run 6 <span class="hlt">simulations</span>, varying the impact angle and evolutionary stage of the proto-Jupiter. We find that it is possible for an impactor to free some mass from the core of the proto-planet it impacts through direct collision, as well as to make physical contact with the core yet escape partially, or even completely, intact. None of the 6 cases we consider produced a solid disk or resulted in a net decrease in the core mass of the pinto-planet (since the mass decrease due to disruption was outweighed by the increase due to the addition of the impactor's mass to the core). However, we suggest parameters which may have these effects, and thus decrease core mass and formation time in protoplanetary models and/or create satellite systems. We find that giant impacts can remove significant envelope mass from forming giant planets, leaving only 2 MEarth of gas, similar to Uranus and Neptune. They can also create compositional inhomogeneities in planetary cores, which creates differences in planetary thermal emission characteristics.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2000DPS....32.6532C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2000DPS....32.6532C"><span id="translatedtitle">Giant Impacts During Planet Formation: <span class="hlt">Parallel</span> Tree Code <span class="hlt">Simulations</span> Using Smooth Particle Hydrodynamics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Cohen, R.; Bodenheimer, P.; Asphaug, E.</p> <p>2000-12-01</p> <p>There is both theoretical and observational evidence that giant planets collided with objects with mass >= Mearth during their evolution. These impacts may help shorten planetary formation timescales by changing the opacity of the planetary atmosphere to allow quicker cooling. They may also redistribute heavy metals within giant planets, affect the core/envelope mass ratio, and help determine the ratio of emitted to absorbed energy within giant planets. Thus, the researchers propose to <span class="hlt">simulate</span> the impact of a ~ Earth-mass object onto a proto-giant-planet with SPH. Results of the SPH collision models will be input into a steady-state planetary evolution code and the effect of impacts on formation timescales, core/envelope mass ratios, density profiles, and thermal emissions of giant planets will be quantified. The collision will be modelled using a modified version of an SPH routine which <span class="hlt">simulates</span> the collision of two polytropes. The Saumon-Chabrier and Tillotson equations of state will replace the polytropic equation of state. The <span class="hlt">parallel</span> tree algorithm of Olson & Packer will be used for the domain decomposition and neighbor search necessary to calculate pressure and self-gravity efficiently. This work is funded by the NASA Graduate Student Researchers Program.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013NIMPA.732..233R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013NIMPA.732..233R"><span id="translatedtitle">A study of Gd-based <span class="hlt">parallel</span> plate avalanche counter for thermal neutrons by MC <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Rhee, J. T.; Kim, H. G.; Ahmad, Farzana; Jeon, Y. J.; Jamil, M.</p> <p>2013-12-01</p> <p>In this work, we demonstrate the feasibility and characteristics of a single-gap <span class="hlt">parallel</span> plate avalanche counter (PPAC) as a low energy neutron detector, based on Gd-converter coating. Upon falling on the Gd-converter surface, the incident low energy neutrons produce internal conversion electrons which are evaluated and detected. For estimating the performance of the Gd-based PPAC, a <span class="hlt">simulation</span> study has been performed using GEANT4 Monte Carlo (MC) code. The detector response as a function of incident neutron energies in the range of 25-100 meV has been evaluated with two different physics lists. Using the QGSP_BIC_HP physics list and assuming 5 μm converter thickness, 11.8%, 18.48%, and 30.28% detection efficiencies have been achieved for the forward-, the backward-, and the total response of the converter-based PPAC. On the other hand, considering the same converter thickness and detector configuration, with the QGSP_BERT_HP physics list efficiencies of 12.19%, 18.62%, and 30.81%, respectively, were obtained. These <span class="hlt">simulation</span> results are briefly discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968"><span id="translatedtitle">A <span class="hlt">parallel</span> overset-curvilinear-immersed boundary framework for <span class="hlt">simulating</span> complex 3D incompressible flows</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis</p> <p>2013-01-01</p> <p>We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to <span class="hlt">simulate</span> a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient <span class="hlt">parallel</span> computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by <span class="hlt">simulating</span> the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20120014386','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20120014386"><span id="translatedtitle"><span class="hlt">Simulated</span> Wake Characteristics Data for Closely Spaced <span class="hlt">Parallel</span> Runway Operations Analysis</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Guerreiro, Nelson M.; Neitzke, Kurt W.</p> <p>2012-01-01</p> <p>A <span class="hlt">simulation</span> experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced <span class="hlt">parallel</span> runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were <span class="hlt">simulated</span> to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22270730','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22270730"><span id="translatedtitle">MONTE CARLO <span class="hlt">SIMULATIONS</span> OF NONLINEAR PARTICLE ACCELERATION IN <span class="hlt">PARALLEL</span> TRANS-RELATIVISTIC SHOCKS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M. E-mail: ambykov@yahoo.com</p> <p>2013-10-10</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and 'smooth' the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor γ{sub 0} = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/385558','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/385558"><span id="translatedtitle"><span class="hlt">Parallel</span> contact detection algorithm for transient solid dynamics <span class="hlt">simulations</span> using PRONTO3D</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.</p> <p>1996-09-01</p> <p>An efficient, scalable, <span class="hlt">parallel</span> algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD <span class="hlt">parallel</span> computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in <span class="hlt">parallel</span> computation by devising a dynamic (adaptive) processor load balancing scheme.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/28186','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/28186"><span id="translatedtitle">Automated integration of genomic physical mapping data via <span class="hlt">parallel</span> <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Slezak, T.</p> <p>1994-06-01</p> <p>The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via <span class="hlt">simulated</span> annealing performed on a network of 40+ Unix machines in <span class="hlt">parallel</span>, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the <span class="hlt">parallel</span> net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003PhDT.......106F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003PhDT.......106F"><span id="translatedtitle">Numerical investigation of <span class="hlt">parallel</span> airfoil-vortex interaction using large eddy <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Felten, Frederic N.</p> <p></p> <p>Helicopter Blade-Vortex Interaction (BVI) occurs under certain conditions of powered descent or during extreme maneuvering. The vibration and acoustic problems associated with the interaction of rotor tip vortices and the following blades are major aerodynamic concerns for the helicopter community. Researchers have performed numerous experimental and computational studies over the last two decades in order to gain a better understanding of the physical mechanisms involved in BVI. The most severe interaction, in terms of generated noise, happens when the vortex filament is <span class="hlt">parallel</span> to the blade, thus affecting a great portion of it. The majority of the previous numerical studies of <span class="hlt">parallel</span> BVI fall within a potential flow framework, therefore excluding all viscous phenomena. Some Navier-Stokes approaches using dissipative numerical methods in conjunction with RANS-type turbulence models have also been attempted, but with limited success. In this work, the situation is improved by increasing the fidelity of both the numerical method and the turbulence model. A kinetic-energy conserving finite-volume scheme using a collocated-mesh arrangement, specially designed for <span class="hlt">simulation</span> of turbulence in complex geometries, was implemented. For the turbulence model, a cost-effective zonal hybrid RANS/LES technique is used. A BANS zone covers the boundary layers on the airfoil and the wake region behind, while the remainder of the flow field, including the region occupied by the vortex makes up the dynamic LES zone. The concentrated tip vortex is not attenuated as it is convected downstream and over a NACA 0012 airfoil. The lift, drag, moment and friction coefficients induced by the passage of the vortex are monitored in time and compared with experimental data.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19930002341','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19930002341"><span id="translatedtitle">Direct numerical <span class="hlt">simulation</span> of instabilities in <span class="hlt">parallel</span> flow with spherical roughness elements</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Deanna, R. G.</p> <p>1992-01-01</p> <p>Results from a direct numerical <span class="hlt">simulation</span> of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical <span class="hlt">simulation</span> approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces <span class="hlt">simulate</span> an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the <span class="hlt">parallel</span>-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1995PhDT.......219M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1995PhDT.......219M"><span id="translatedtitle">Three-Dimensional <span class="hlt">Parallel</span> Lattice Boltzmann Hydrodynamic <span class="hlt">Simulations</span> of Turbulent Flows in Interstellar Dark Clouds</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Muders, Dirk</p> <p>1995-08-01</p> <p>Exploring the clumpy and filamentary structure of interstellar molecular clouds is one of the key problems of modern astrophysics. So far, we have little knowledge of the physical processes that cause the structure, but turbulence is suspected to be essential. In this thesis I study turbulent flows and how they contribute to the structure of interstellar dark clouds. To this end, three-dimensional numerical hydrodynamic <span class="hlt">simulations</span> are needed since the detailed turbulent spatial and velocity structure cannot be analytically calculated. I employ the ``Lattice Boltzmann Method'', a recently developed numerical method which solves the Boltzmann equation in a discretized phase space. Mesoscopic particle packets move with fixed velocities on a Cartesian lattice and at each time step they exchange mass according to given rules. Because of its mainly local operations the method is well suited for application on <span class="hlt">parallel</span> or clustered computers. As part of my thesis I have developed a <span class="hlt">parallelized</span> ``Lattice Boltzmann Method'' hydrodynamics code. I have improved the numerical stability for Reynolds numbers of up to 104.5 and Mach numbers of up to 0.9 and I have extended the method to include a second miscible fluid phase. The code has been used on the three currently most powerful workstations at the ``Max-Planck-Institut für Radioastronomie'' in Bonn and on the massively <span class="hlt">parallel</span> mainframe CM-5 at the ``Gesellschaft für Mathematik und Datenverarbeitung'' in St. Augustin. The <span class="hlt">simulations</span> consist of collimated shear flows and the motion of molecular clumps through an ambient medium. The dependence of the emerging structure on Reynolds and Mach numbers is studied. The main results are (1) that distinct clumps and filaments appear only at the transition between laminar and fully turbulent flow at Reynolds numbers between 500 and 5000 and (2) that subsonic viscous shear flows are capable of producing the dark cloud velocity structure. The unexpectedly low Reynolds numbers can</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1131524','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1131524"><span id="translatedtitle">Supporting the Development of Resilient Message Passing Applications using <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Naughton, III, Thomas J; Engelmann, Christian; Vallee, Geoffroy R; Boehm, Swen</p> <p>2014-01-01</p> <p>An emerging aspect of high-performance computing (HPC) hardware/software co-design is investigating performance under failure. The work in this paper extends the Extreme-scale <span class="hlt">Simulator</span> (xSim), which was designed for evaluating the performance of message passing interface (MPI) applications on future HPC architectures, with fault-tolerant MPI extensions proposed by the MPI Fault Tolerance Working Group. xSim permits running MPI applications with millions of concurrent MPI ranks, while observing application performance in a <span class="hlt">simulated</span> extreme-scale system using a lightweight <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span>. The newly added features offer user-level failure mitigation (ULFM) extensions at the <span class="hlt">simulated</span> MPI layer to support algorithm-based fault tolerance (ABFT). The presented solution permits investigating performance under failure and failure handling of ABFT solutions. The newly enhanced xSim is the very first performance tool that supports ULFM and ABFT.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5289523','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5289523"><span id="translatedtitle">Particle <span class="hlt">simulation</span> on radio frequency stabilization of flute modes in a tandem mirror. I. <span class="hlt">Parallel</span> antenna</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kadoya, Y.; Abe, H.</p> <p>1988-04-01</p> <p>A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied <span class="hlt">parallel</span> to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The <span class="hlt">parallel</span> electric field E/sub <span class="hlt">parallel</span>/ perturbs the electron velocity v/sub <span class="hlt">parallel</span>/ <span class="hlt">parallel</span> to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product <v/sub <span class="hlt">parallel</span>/B/sub perpendicular/> is then shown to stabilize the flute modes. The stabilizing wave power threshold, the frequency dependency, and the dependence on delchemically bondE/sub <span class="hlt">parallel</span>/chemically bond all agree with the theoretical predictions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20040111318','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20040111318"><span id="translatedtitle">Scalability of <span class="hlt">Parallel</span> Spatial Direct Numerical <span class="hlt">Simulations</span> on Intel Hypercube and IBM SP1 and SP2</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad</p> <p>1995-01-01</p> <p>The implementation and performance of a <span class="hlt">parallel</span> spatial direct numerical <span class="hlt">simulation</span> (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 <span class="hlt">parallel</span> computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be <span class="hlt">parallelized</span> on a distributed-memory <span class="hlt">parallel</span> machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" <span class="hlt">simulation</span> that consists of 1.7 million grid points. One time step of this <span class="hlt">simulation</span> is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same <span class="hlt">simulation</span>, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this <span class="hlt">simulation</span>, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical <span class="hlt">simulations</span>; incompressible viscous flows; spectral methods; finite differences; <span class="hlt">parallel</span> computing.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020060457','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020060457"><span id="translatedtitle">A Three Dimensional <span class="hlt">Parallel</span> Time Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid Systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2001-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability will be presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020073408','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020073408"><span id="translatedtitle">A Three-Dimensional <span class="hlt">Parallel</span> Time-Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid System</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2002-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability are presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/988956','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/988956"><span id="translatedtitle">Mesoscale <span class="hlt">Simulations</span> of Particulate Flows with <span class="hlt">Parallel</span> Distributed Lagrange Multiplier Technique</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kanarska, Y</p> <p>2010-03-24</p> <p>Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical <span class="hlt">simulation</span> of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A <span class="hlt">parallel</span> implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_23 --> <div id="page_24" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="461"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/957425','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/957425"><span id="translatedtitle"><span class="hlt">Parallel</span> Higher-order Finite Element Method for Accurate Field Computations in Wakefield and PIC <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.; Ko, K.; /SLAC</p> <p>2009-06-19</p> <p>Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) <span class="hlt">parallel</span> higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale <span class="hlt">simulation</span> of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell) approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012AGUFM.H13G1442S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012AGUFM.H13G1442S"><span id="translatedtitle"><span class="hlt">Simulation</span> of hydraulic fracture networks in three dimensions utilizing massively <span class="hlt">parallel</span> computing platforms</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Settgast, R. R.; Johnson, S.; Fu, P.; Walsh, S. D.; Ryerson, F. J.; Antoun, T.</p> <p>2012-12-01</p> <p>Hydraulic fracturing has been an enabling technology for commercially stimulating fracture networks for over half of a century. It has become one of the most widespread technologies for engineering subsurface fracture systems. Despite the ubiquity of this technique in the field, understanding and prediction of the hydraulic induced propagation of the fracture network in realistic, heterogeneous reservoirs has been limited. A number of developments in multiscale modeling in recent years have allowed researchers in related fields to tackle the modeling of complex fracture propagation as well as the mechanics of heterogeneous materials. These developments, combined with advances in quantifying solution uncertainties, provide possibilities for the geologic modeling community to capture both the fracturing behavior and longer-term permeability evolution of rock masses under hydraulic loading across both dynamic and viscosity-dominated regimes. Here we will demonstrate the first phase of this effort through illustrations of fully three-dimensional, tightly coupled hydromechanical <span class="hlt">simulations</span> of hydraulically induced fracture network propagation run on massively <span class="hlt">parallel</span> computing scales, and discuss preliminary results regarding the mechanisms by which fracture interactions and the accompanying changes to the stress field can lead to deleterious or beneficial changes to the fracture network.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/876729','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/876729"><span id="translatedtitle">Performance Evaluation of Lattice-Boltzmann Magnetohydrodynamics<span class="hlt">Simulations</span> on Modern <span class="hlt">Parallel</span> Vector Systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Carter, Jonathan; Oliker, Leonid</p> <p>2006-01-09</p> <p>The last decade has witnessed a rapid proliferation of superscalarcache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on such platforms has become major concern in high performance computing. The latest generation of custom-built <span class="hlt">parallel</span> vector systems have the potential to address this concern for numerical algorithms with sufficient regularity in their computational structure. In this work, we explore two and three dimensional implementations of a lattice-Boltzmann magnetohydrodynamics (MHD) physics application, on some of today's most powerful supercomputing platforms. Results compare performance between the vector-based Cray X1, Earth <span class="hlt">Simulator</span>, and newly-released NEC SX-8, with the commodity-based superscalar platforms of the IBM Power3, IntelItanium2, and AMD Opteron. Overall results show that the SX-8 attains unprecedented aggregate performance across our evaluated applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5256735','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5256735"><span id="translatedtitle">Monte Carlo <span class="hlt">simulation</span> of photoelectron energization in <span class="hlt">parallel</span> electric fields: Electroglow on Uranus</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Singhal, R.P.; Bhardwaj, A. )</p> <p>1991-09-01</p> <p>A Monte Carlo <span class="hlt">simulation</span> of photoelectron energization and energy degradation in H{sub 2} gas in the presence of <span class="hlt">parallel</span> electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2015EGUGA..17.6111S&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2015EGUGA..17.6111S&link_type=ABSTRACT"><span id="translatedtitle">A heterogeneous and <span class="hlt">parallel</span> computing framework for high-resolution hydrodynamic <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Smith, Luke; Liang, Qiuhua</p> <p>2015-04-01</p> <p>Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of <span class="hlt">parallel</span> processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic <span class="hlt">simulations</span> across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. <span class="hlt">Simulations</span> for the three-day event could be performed</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013JChPh.139g4114B&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013JChPh.139g4114B&link_type=ABSTRACT"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-01</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution time/<span class="hlt">parallel</span> execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22303583','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22303583"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup ((serial execution time)/(<span class="hlt">parallel</span> execution time) ) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19910022757','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19910022757"><span id="translatedtitle">Comparisons of elastic and rigid blade-element rotor models using <span class="hlt">parallel</span> processing technology for piloted <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Hill, Gary; Duval, Ronald W.; Green, John A.; Huynh, Loc C.</p> <p>1991-01-01</p> <p>A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A <span class="hlt">simulation</span> development and analysis tool, FLIGHTLAB, was used to implement these models in real time using <span class="hlt">parallel</span> processing technology. Pilot comments and quantitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing <span class="hlt">simulation</span> facilities can be upgraded using <span class="hlt">parallel</span> processing, and the importance of these upgrades to <span class="hlt">simulation</span> fidelity.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19900013690','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19900013690"><span id="translatedtitle"><span class="hlt">Parallel</span> processing of real-time dynamic systems <span class="hlt">simulation</span> on OSCAR (Optimally SCheduled Advanced multiprocessoR)</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke</p> <p>1989-01-01</p> <p><span class="hlt">Parallel</span> processing of real-time dynamic systems <span class="hlt">simulation</span> on a multiprocessor system named OSCAR is presented. In the <span class="hlt">simulation</span> of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for <span class="hlt">parallel</span> processing of the <span class="hlt">simulation</span> since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, <span class="hlt">parallelism</span> inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the <span class="hlt">parallelism</span> from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22230809','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22230809"><span id="translatedtitle">Obtaining identical results with double precision global accuracy on different numbers of processors in <span class="hlt">parallel</span> particle Monte Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Cleveland, Mathew A. Brunner, Thomas A.; Gentile, Nicholas A.; Keasler, Jeffrey A.</p> <p>2013-10-15</p> <p>We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo <span class="hlt">simulations</span>. Reproducibility is desirable for code verification, testing, and debugging. <span class="hlt">Parallelism</span> creates a unique problem for achieving reproducibility in Monte Carlo <span class="hlt">simulations</span> because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. <span class="hlt">Parallel</span> Monte Carlo, both domain replicated and decomposed <span class="hlt">simulations</span>, will run their particles in a different order during different runs of the same <span class="hlt">simulation</span> because the non-reproducibility of communication between processors. In addition, runs of the same <span class="hlt">simulation</span> using different domain decompositions will also result in particles being <span class="hlt">simulated</span> in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo <span class="hlt">simulations</span>. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1091975','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1091975"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using a variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span> such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl+4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span> the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23968079','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23968079"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Bylaska, Eric J; Weare, Jonathan Q; Weare, John H</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/585029','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/585029"><span id="translatedtitle">Infrastructure for distributed enterprise <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Johnson, M.M.; Yoshimura, A.S.; Goldsby, M.E.</p> <p>1998-01-01</p> <p>Traditional <span class="hlt">discrete-event</span> <span class="hlt">simulations</span> employ an inherently sequential algorithm and are run on a single computer. However, the demands of many real-world problems exceed the capabilities of sequential <span class="hlt">simulation</span> systems. Often the capacity of a computer`s primary memory limits the size of the models that can be handled, and in some cases <span class="hlt">parallel</span> execution on multiple processors could significantly reduce the <span class="hlt">simulation</span> time. This paper describes the development of an Infrastructure for Distributed Enterprise <span class="hlt">Simulation</span> (IDES) - a large-scale portable <span class="hlt">parallel</span> <span class="hlt">simulation</span> framework developed to support Sandia National Laboratories` mission in stockpile stewardship. IDES is based on the Breathing-Time-Buckets synchronization protocol, and maps a message-based model of distributed computing onto an object-oriented programming model. IDES is portable across heterogeneous computing architectures, including single-processor systems, networks of workstations and multi-processor computers with shared or distributed memory. The system provides a simple and sufficient application programming interface that can be used by scientists to quickly model large-scale, complex enterprise systems. In the background and without involving the user, IDES is capable of making dynamic use of idle processing power available throughout the enterprise network. 16 refs., 14 figs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010EGUGA..12.7428P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010EGUGA..12.7428P"><span id="translatedtitle">Efficient <span class="hlt">parallel</span> seismic <span class="hlt">simulations</span> including topography and 3-D material heterogeneities on locally refined composite grids</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Petersson, Anders; Rodgers, Arthur</p> <p>2010-05-01</p> <p> conserving, coupling procedure for the elastic wave equation at grid refinement interfaces. When used together with our single grid finite difference scheme, it results in a method which is provably stable, without artificial dissipation, for arbitrary heterogeneous isotropic elastic materials. The new coupling procedure is based on satisfying the summation-by-parts principle across refinement interfaces. From a practical standpoint, an important advantage of the proposed method is the absence of tunable numerical parameters, which seldom are appreciated by application experts. In WPP, the composite grid discretization is combined with a curvilinear grid approach that enables accurate modeling of free surfaces on realistic (non-planar) topography. The overall method satisfies the summation-by-parts principle and is stable under a CFL time step restriction. A feature of great practical importance is that WPP automatically generates the composite grid based on the user provided topography and the depths of the grid refinement interfaces. The WPP code has been verified extensively, for example using the method of manufactured solutions, by solving Lamb's problem, by solving various layer over half- space problems and comparing to semi-analytic (FK) results, and by <span class="hlt">simulating</span> scenario earthquakes where results from other seismic <span class="hlt">simulation</span> codes are available. WPP has also been validated against seismographic recordings of moderate earthquakes. WPP performs well on large <span class="hlt">parallel</span> computers and has been run on up to 32,768 processors using about 26 Billion grid points (78 Billion DOF) and 41,000 time steps. WPP is an open source code that is available under the Gnu general public license.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014EGUGA..16.2584S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014EGUGA..16.2584S"><span id="translatedtitle"><span class="hlt">Parallel</span> Processing of Numerical Tsunami <span class="hlt">Simulations</span> on a High Performance Cluster based on the GDAL Library</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Schroeder, Matthias; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim</p> <p>2014-05-01</p> <p>Thousands of numerical tsunami <span class="hlt">simulations</span> allow the computation of inundation and run-up along the coast for vulnerable areas over the time. A so-called Matching Scenario Database (MSDB) [1] contains this large number of <span class="hlt">simulations</span> in text file format. In order to visualize these wave propagations the scenarios have to be reprocessed automatically. In the TRIDEC project funded by the seventh Framework Programme of the European Union a Virtual Scenario Database (VSDB) and a Matching Scenario Database (MSDB) were established amongst others by the working group of the University of Bologna (UniBo) [1]. One part of TRIDEC was the developing of a new generation of a Decision Support System (DSS) for tsunami Early Warning Systems (TEWS) [2]. A working group of the GFZ German Research Centre for Geosciences was responsible for developing the Command and Control User Interface (CCUI) as central software application which support operator activities, incident management and message disseminations. For the integration and visualization in the CCUI, the numerical tsunami <span class="hlt">simulations</span> from MSDB must be converted into the shapefiles format. The usage of shapefiles enables a much easier integration into standard Geographic Information Systems (GIS). Since also the CCUI is based on two widely used open source products (GeoTools library and uDig), whereby the integration of shapefiles is provided by these libraries a priori. In this case, for an example area around the Western Iberian margin several thousand tsunami variations were processed. Due to the mass of data only a program-controlled process was conceivable. In order to optimize the computing efforts and operating time the use of an existing GFZ High Performance Computing Cluster (HPC) had been chosen. Thus, a geospatial software was sought after that is capable for <span class="hlt">parallel</span> processing. The FOSS tool Geospatial Data Abstraction Library (GDAL/OGR) was used to match the coordinates with the wave heights and generates the</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4696414','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4696414"><span id="translatedtitle">GENESIS: a hybrid-<span class="hlt">parallel</span> and multi-scale molecular dynamics <span class="hlt">simulator</span> with enhanced sampling algorithms for biomolecular and cellular <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji</p> <p>2015-01-01</p> <p>GENESIS (Generalized-Ensemble <span class="hlt">Simulation</span> System) is a new software package for molecular dynamics (MD) <span class="hlt">simulations</span> of macromolecules. It has two MD <span class="hlt">simulators</span>, called ATDYN and SPDYN. ATDYN is <span class="hlt">parallelized</span> based on an atomic decomposition algorithm for the <span class="hlt">simulations</span> of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly <span class="hlt">parallelized</span> based on a domain decomposition scheme, allowing large-scale MD <span class="hlt">simulations</span> on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both <span class="hlt">simulators</span> to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly <span class="hlt">parallel</span> performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of <span class="hlt">parallel</span> input/output files, also contribute to the performance. We show the REMD <span class="hlt">simulation</span> results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2009MPLB...23..325T','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2009MPLB...23..325T"><span id="translatedtitle">Numerical <span class="hlt">Simulation</span> of Unsteady Flow Field around Helicopter in Forward Flight Using a <span class="hlt">Parallel</span> Dynamic Overset Unstructured Grids Method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Tian, Shuling; Wu, Yizhao; Xia, Jian</p> <p></p> <p>A <span class="hlt">parallel</span> Navier-Stokes solver based on dynamic overset unstructured grids method is presented to <span class="hlt">simulate</span> the unsteady turbulent flow field around helicopter in forward flight. The grid method has the advantages of unstructured grid and Chimera grid and is suitable to deal with multiple bodies in relatively moving. Unsteady Navier-Stokes equations are solved on overset unstructured grids by an explicit dual time-stepping, finite volume method. Preconditioning method applied to inner iteration of the dual-time stepping is used to speed up the convergence of numerical <span class="hlt">simulation</span>. The Spalart-Allmaras one-equation turbulence model is used to evaluate the turbulent viscosity. <span class="hlt">Parallel</span> computation is based on the dynamic domain decomposition method in overset unstructured grids system at each physical time step. A generic helicopter Robin with a four-blade rotor in forward flight is considered to validate the method presented in this paper. Numerical <span class="hlt">simulation</span> results show that the <span class="hlt">parallel</span> dynamic overset unstructured grids method is very efficient for the <span class="hlt">simulation</span> of helicopter flow field and the results are reliable.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1115367','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1115367"><span id="translatedtitle">SCORPIO: A Scalable Two-Phase <span class="hlt">Parallel</span> I/O Library With Application To A Large Scale Subsurface <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T; Hammond, Glenn; Mahinthakumar, Kumar</p> <p>2013-01-01</p> <p>Inefficient <span class="hlt">parallel</span> I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that <span class="hlt">parallel</span> I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on <span class="hlt">parallel</span> file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively <span class="hlt">parallel</span> multi-phase and multi-component subsurface <span class="hlt">simulator</span> (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose <span class="hlt">parallel</span> I/O library, SCORPIO (SCalable block-ORiented <span class="hlt">Parallel</span> I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing <span class="hlt">parallel</span> I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20090007630','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20090007630"><span id="translatedtitle">A Framework for <span class="hlt">Parallel</span> Unstructured Grid Generation for Complex Aerodynamic <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos</p> <p>2009-01-01</p> <p>A framework for <span class="hlt">parallel</span> unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in <span class="hlt">parallel</span> using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent <span class="hlt">parallelism</span>. For the <span class="hlt">parallel</span> implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011IJTIA.131.1212N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011IJTIA.131.1212N"><span id="translatedtitle"><span class="hlt">Parallel</span> Computing of Magnetic Field Analysis for Rotating Machines Driven by Voltage Source on the Earth <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nakano, Tomohito; Kawase, Yoshihiro; Yamaguchi, Tadashi; Shibayama, Yoshiyasu; Nakamura, Masanori; Nishikawa, Noriaki; Uehara, Hitoshi</p> <p></p> <p>A <span class="hlt">parallel</span> computing method for rotating machines excited by the voltage source with the three-dimensional finite element method is developed. In this method, the matrix equations which contains voltage equations are divided into multiple subdomains and the matrix-vector products for the voltage equations in each subdomain are calculated efficiently. The validity and the usefulness of the method are verified through the computation of an IPM motor with the off-centered rotor on the Earth <span class="hlt">Simulator</span>.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_24 --> <div id="page_25" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li class="active"><span>25</span></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="481"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/27045833','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/27045833"><span id="translatedtitle">Process <span class="hlt">Simulation</span> of Complex Biological Pathways in Physical Reactive Space and Reformulated for Massively <span class="hlt">Parallel</span> Computing Platforms.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Ganesan, Narayan; Li, Jie; Sharma, Vishakha; Jiang, Hanyu; Compagnoni, Adriana</p> <p>2016-01-01</p> <p>Biological systems encompass complexity that far surpasses many artificial systems. Modeling and <span class="hlt">simulation</span> of large and complex biochemical pathways is a computationally intensive challenge. Traditional tools, such as ordinary differential equations, partial differential equations, stochastic master equations, and Gillespie type methods, are all limited either by their modeling fidelity or computational efficiency or both. In this work, we present a scalable computational framework based on modeling biochemical reactions in explicit 3D space, that is suitable for studying the behavior of large and complex biological pathways. The framework is designed to exploit <span class="hlt">parallelism</span> and scalability offered by commodity massively <span class="hlt">parallel</span> processors such as the graphics processing units (GPUs) and other <span class="hlt">parallel</span> computing platforms. The reaction modeling in 3D space is aimed at enhancing the realism of the model compared to traditional modeling tools and framework. We introduce the <span class="hlt">Parallel</span> Select algorithm that is key to breaking the sequential bottleneck limiting the performance of most other tools designed to study biochemical interactions. The algorithm is designed to be computationally tractable, handle hundreds of interacting chemical species and millions of independent agents by considering all-particle interactions within the system. We also present an implementation of the framework on the popular graphics processing units and apply it to the <span class="hlt">simulation</span> study of JAK-STAT Signal Transduction Pathway. The computational framework will offer a deeper insight into various biological processes within the cell and help us observe key events as they unfold in space and time. This will advance the current state-of-the-art in <span class="hlt">simulation</span> study of large scale biological systems and also enable the realistic <span class="hlt">simulation</span> study of macro-biological cultures, where inter-cellular interactions are prevalent. PMID:27045833</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3963881','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3963881"><span id="translatedtitle">cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic <span class="hlt">Simulator</span> for Massive <span class="hlt">Parallel</span> Analyses of Biological Systems</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo</p> <p>2014-01-01</p> <p>Tau-leaping is a stochastic <span class="hlt">simulation</span> algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of <span class="hlt">simulations</span>, leading to high computational costs. Since each <span class="hlt">simulation</span> can be executed independently from the others, a massive <span class="hlt">parallelization</span> of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic <span class="hlt">simulator</span> of biological systems that makes use of GPGPU computing to execute multiple <span class="hlt">parallel</span> tau-leaping <span class="hlt">simulations</span>, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of <span class="hlt">parallel</span> <span class="hlt">simulations</span> increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1226878','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1226878"><span id="translatedtitle">High Fidelity <span class="hlt">Simulations</span> of Large-Scale Wireless Networks</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Onunkwo, Uzoma; Benz, Zachary</p> <p>2015-11-01</p> <p>The worldwide proliferation of wireless connected devices continues to accelerate. There are 10s of billions of wireless links across the planet with an additional explosion of new wireless usage anticipated as the Internet of Things develops. Wireless technologies do not only provide convenience for mobile applications, but are also extremely cost-effective to deploy. Thus, this trend towards wireless connectivity will only continue and Sandia must develop the necessary <span class="hlt">simulation</span> technology to proactively analyze the associated emerging vulnerabilities. Wireless networks are marked by mobility and proximity-based connectivity. The de facto standard for exploratory studies of wireless networks is <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulations</span> (DES). However, the <span class="hlt">simulation</span> of large-scale wireless networks is extremely difficult due to prohibitively large turnaround time. A path forward is to expedite <span class="hlt">simulations</span> with <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span> (PDES) techniques. The mobility and distance-based connectivity associated with wireless <span class="hlt">simulations</span>, however, typically doom PDES and fail to scale (e.g., OPNET and ns-3 <span class="hlt">simulators</span>). We propose a PDES-based tool aimed at reducing the communication overhead between processors. The proposed solution will use light-weight processes to dynamically distribute computation workload while mitigating communication overhead associated with synchronizations. This work is vital to the analytics and validation capabilities of <span class="hlt">simulation</span> and emulation at Sandia. We have years of experience in Sandia’s <span class="hlt">simulation</span> and emulation projects (e.g., MINIMEGA and FIREWHEEL). Sandia’s current highly-regarded capabilities in large-scale emulations have focused on wired networks, where two assumptions prevent scalable wireless studies: (a) the connections between objects are mostly static and (b) the nodes have fixed locations.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/86949','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/86949"><span id="translatedtitle">Large-eddy <span class="hlt">simulation</span> of the Rayleigh-Taylor instability on a massively <span class="hlt">parallel</span> computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Amala, P.A.K.</p> <p>1995-03-01</p> <p>A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively <span class="hlt">parallel</span> computer. Programming models on massively <span class="hlt">parallel</span> computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively <span class="hlt">parallel</span> computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data <span class="hlt">parallel</span> code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data <span class="hlt">parallel</span> approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26575558','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26575558"><span id="translatedtitle">Molecular <span class="hlt">simulation</span> workflows as <span class="hlt">parallel</span> algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik</p> <p>2015-06-01</p> <p>Computational chemistry and other <span class="hlt">simulation</span> fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which <span class="hlt">simulation</span> applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple <span class="hlt">simulations</span> into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of <span class="hlt">simulations</span> and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally <span class="hlt">parallel</span>. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled <span class="hlt">simulations</span> using either distributed or <span class="hlt">parallel</span> resources with Copernicus. PMID:26575558</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19940010166','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19940010166"><span id="translatedtitle">Real-time dynamic <span class="hlt">simulation</span> of the Cassini spacecraft using DARTS. Part 2: <span class="hlt">Parallel</span>/vectorized real-time implementation</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Fijany, A.; Roberts, J. A.; Jain, A.; Man, G. K.</p> <p>1993-01-01</p> <p>Part 1 of this paper presented the requirements for the real-time <span class="hlt">simulation</span> of Cassini spacecraft along with some discussion of the DARTS algorithm. Here, in Part 2 we discuss the development and implementation of <span class="hlt">parallel</span>/vectorized DARTS algorithm and architecture for real-time <span class="hlt">simulation</span>. Development of the fast algorithms and architecture for real-time hardware-in-the-loop <span class="hlt">simulation</span> of spacecraft dynamics is motivated by the fact that it represents a hard real-time problem, in the sense that the correctness of the <span class="hlt">simulation</span> depends on both the numerical accuracy and the exact timing of the computation. For a given model fidelity, the computation should be computed within a predefined time period. Further reduction in computation time allows increasing the fidelity of the model (i.e., inclusion of more flexible modes) and the integration routine.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014SPIE.9145E..2PY','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014SPIE.9145E..2PY"><span id="translatedtitle">Modeling and <span class="hlt">simulation</span> of a 6-DOF <span class="hlt">parallel</span> platform for telescope secondary mirror</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Yue, Zhongyu; Ye, Yu; Gu, Bozhong</p> <p>2014-07-01</p> <p>The 6-DOF <span class="hlt">parallel</span> platform in this paper is a kind of Stewart platform. It can be used as supporting structure for telescope secondary mirror. In order to adapt the special dynamic environment of the telescope secondary mirror and to be installed in extremely narrow space, a unique <span class="hlt">parallel</span> platform is designed. PSS Stewart platform and SPS Stewart platform are analyzed and compared. Then the PSS Stewart platform is chosen for detailed design. The virtual prototyping model of the <span class="hlt">parallel</span> platform is built. The model is used for the analysis and calculation of multi-body dynamics. With the help of ANSYS, the finite element model of the platform is built and then the analysis is performed. According to the above analysis the experimental prototype of the platform is built.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1090857','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1090857"><span id="translatedtitle">Coupled models and <span class="hlt">parallel</span> <span class="hlt">simulations</span> for three-dimensional full-Stokes ice sheet modeling</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zhang, Huai; Ju, Lili; Gunzburger, Max; Ringler, Todd; Price, Stephen</p> <p>2011-01-01</p> <p>A three-dimensional full-Stokes computational model is considered for determining the dynamics, temperature, and thickness of ice sheets. The governing thermomechanical equations consist of the three-dimensional full-Stokes system with nonlinear rheology for the momentum, an advective-diffusion energy equation for temperature evolution, and a mass conservation equation for icethickness changes. Here, we discuss the variable resolution meshes, the finite element discretizations, and the <span class="hlt">parallel</span> algorithms employed by the model components. The solvers are integrated through a well-designed coupler for the exchange of parametric data between components. The discretization utilizes high-quality, variable-resolution centroidal Voronoi Delaunay triangulation meshing and existing <span class="hlt">parallel</span> solvers. We demonstrate the gridding technology, discretization schemes, and the efficiency and scalability of the <span class="hlt">parallel</span> solvers through computational experiments using both simplified geometries arising from benchmark test problems and a realistic Greenland ice sheet geometry.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4334526','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4334526"><span id="translatedtitle">Neurite, a Finite Difference Large Scale <span class="hlt">Parallel</span> Program for the <span class="hlt">Simulation</span> of Electrical Signal Propagation in Neurites under Mechanical Loading</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine</p> <p>2015-01-01</p> <p>With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference <span class="hlt">parallel</span> program for <span class="hlt">simulating</span> electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to <span class="hlt">simulate</span> the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The <span class="hlt">simulation</span> of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the <span class="hlt">simulated</span> cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore <span class="hlt">parallelized</span> using graphics processing units in order to reduce the burden of the <span class="hlt">simulation</span> costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the <span class="hlt">parallel</span> implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012JPhCS.385a2009S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012JPhCS.385a2009S"><span id="translatedtitle">Convergence order vs. <span class="hlt">parallelism</span> in the numerical <span class="hlt">simulation</span> of the bidomain equations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Sharomi, Oluwaseun; Spiteri, Raymond J.</p> <p>2012-10-01</p> <p>The propagation of electrical activity in the human heart can be modelled mathematically by the bidomain equations. The bidomain equations represent a multi-scale reaction-diffusion model that consists of a set of ordinary differential equations governing the dynamics at the cellular level coupled with a set of partial differential equations governing the dynamics at the tissue level. Significant computation is generally required to generate clinically useful data from the bidomain equations. Contemporary developments in computer architecture, in particular multi- and many-core computers and graphics processing units, have made such computations feasible. However, the zeal to take advantage to <span class="hlt">parallel</span> architectures has typically caused another important aspect of numerical methods for the solution of differential equations to be overlooked, namely the convergence order. It is well known that higher-order methods are generally more efficient than lower-order ones when solutions are smooth and relatively high accuracy is desired. In these situations, serial implementations of high-order methods may remain surprisingly competitive with <span class="hlt">parallel</span> implementations of low-order methods. In this paper, we examine the effect of order on the numerical solution of the bidomain equations in <span class="hlt">parallel</span>. We find that high-order methods, in particular high-order time-integration methods with relatively better stability properties, tend to outperform their low-order counterparts, even when the latter are run in <span class="hlt">parallel</span>. In other words, increasing integration order often trumps increasing available computational resources, especially when relatively high accuracy is desired.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2905722','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2905722"><span id="translatedtitle">Qualitative <span class="hlt">Simulation</span> of Photon Transport in Free Space Based on Monte Carlo Method and Its <span class="hlt">Parallel</span> Implementation</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Chen, Xueli; Gao, Xinbo; Qu, Xiaochao; Chen, Duofang; Ma, Bin; Wang, Lin; Peng, Kuan; Liang, Jimin; Tian, Jie</p> <p>2010-01-01</p> <p>During the past decade, Monte Carlo method has obtained wide applications in optical imaging to <span class="hlt">simulate</span> photon transport process inside tissues. However, this method has not been effectively extended to the <span class="hlt">simulation</span> of free-space photon transport at present. In this paper, a uniform framework for noncontact optical imaging is proposed based on Monte Carlo method, which consists of the <span class="hlt">simulation</span> of photon transport both in tissues and in free space. Specifically, the simplification theory of lens system is utilized to model the camera lens equipped in the optical imaging system, and Monte Carlo method is employed to describe the energy transformation from the tissue surface to the CCD camera. Also, the focusing effect of camera lens is considered to establish the relationship of corresponding points between tissue surface and CCD camera. Furthermore, a <span class="hlt">parallel</span> version of the framework is realized, making the <span class="hlt">simulation</span> much more convenient and effective. The feasibility of the uniform framework and the effectiveness of the <span class="hlt">parallel</span> version are demonstrated with a cylindrical phantom based on real experimental results. PMID:20689705</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21277303','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21277303"><span id="translatedtitle">Influence of the <span class="hlt">parallel</span> nonlinearity on zonal flows and heat transport in global gyrokinetic particle-in-cell <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Jolliet, S.; McMillan, B. F.; Vernay, T.; Villard, L.; Hatzky, R.; Bottino, A.; Angelino, P.</p> <p>2009-07-15</p> <p>In this paper, the influence of the <span class="hlt">parallel</span> nonlinearity on zonal flows and heat transport in global particle-in-cell ion-temperature-gradient <span class="hlt">simulations</span> is studied. Although this term is in theory orders of magnitude smaller than the others, several authors [L. Villard, P. Angelino, A. Bottino et al., Plasma Phys. Contr. Fusion 46, B51 (2004); L. Villard, S. J. Allfrey, A. Bottino et al., Nucl. Fusion 44, 172 (2004); J. C. Kniep, J. N. G. Leboeuf, and V. C. Decyck, Comput. Phys. Commun. 164, 98 (2004); J. Candy, R. E. Waltz, S. E. Parker et al., Phys. Plasmas 13, 074501 (2006)] found different results on its role. The study is performed using the global gyrokinetic particle-in-cell codes TORB (theta-pinch) [R. Hatzky, T. M. Tran, A. Koenies et al., Phys. Plasmas 9, 898 (2002)] and ORB5 (tokamak geometry) [S. Jolliet, A. Bottino, P. Angelino et al., Comput. Phys. Commun. 177, 409 (2007)]. In particular, it is demonstrated that the <span class="hlt">parallel</span> nonlinearity, while important for energy conservation, affects the zonal electric field only if the <span class="hlt">simulation</span> is noise dominated. When a proper convergence is reached, the influence of <span class="hlt">parallel</span> nonlinearity on the zonal electric field, if any, is shown to be small for both the cases of decaying and driven turbulence.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015JCoPh.298..161W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015JCoPh.298..161W"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive mesh refinement method based on WENO finite difference scheme for the <span class="hlt">simulation</span> of multi-dimensional detonation</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Cheng; Dong, XinZhuang; Shu, Chi-Wang</p> <p>2015-10-01</p> <p>For numerical <span class="hlt">simulation</span> of detonation, computational cost using uniform meshes is large due to the vast separation in both time and space scales. Adaptive mesh refinement (AMR) is advantageous for problems with vastly different scales. This paper aims to propose an AMR method with high order accuracy for numerical investigation of multi-dimensional detonation. A well-designed AMR method based on finite difference weighted essentially non-oscillatory (WENO) scheme, named as AMR&WENO is proposed. A new cell-based data structure is used to organize the adaptive meshes. The new data structure makes it possible for cells to communicate with each other quickly and easily. In order to develop an AMR method with high order accuracy, high order prolongations in both space and time are utilized in the data prolongation procedure. Based on the message passing interface (MPI) platform, we have developed a workload balancing <span class="hlt">parallel</span> AMR&WENO code using the Hilbert space-filling curve algorithm. Our numerical experiments with detonation <span class="hlt">simulations</span> indicate that the AMR&WENO is accurate and has a high resolution. Moreover, we evaluate and compare the performance of the uniform mesh WENO scheme and the <span class="hlt">parallel</span> AMR&WENO method. The comparison results provide us further insight into the high performance of the <span class="hlt">parallel</span> AMR&WENO method.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26511211','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26511211"><span id="translatedtitle">In-Series Versus In-<span class="hlt">Parallel</span> Mechanical Circulatory Support for the Right Heart: A <span class="hlt">Simulation</span> Study.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Hsu, Po-Lin; McIntyre, Madeleine; Boehning, Fiete; Dang, Weiguo; Parker, Jack; Autschbach, Rüdiger; Schmitz-Rode, Thomas; Steinseifer, Ulrich</p> <p>2016-06-01</p> <p>Right heart failure (RHF) is a serious health issue with increasing incidence and high mortality. Right ventricular assist devices (RVADs) have been used to support the end-stage failing right ventricle (RV). Current RVADs operate in <span class="hlt">parallel</span> with native RV, which alter blood flow pattern and increase RV afterload, associated with high tension in cardiac muscles and long-term valve complications. We are developing an in-series RVAD for better RV unloading. This article presents a mathematical model to compare the effects of RV unloading and hemodynamic restoration on an overloaded or failing RV. The model was used to <span class="hlt">simulate</span> both in-series (sRVAD) and in-<span class="hlt">parallel</span> (pRVAD) (right atrium-pulmonary artery cannulation) support for severe RHF. The results demonstrated that sRVAD more effectively unloads the RV and restores the balance between RV oxygen supply and demand in RHF patients. In comparison to <span class="hlt">simulated</span> pRVAD and published clinical and in silico studies, the sRVAD was able to provide comparable restoration of key hemodynamic parameters and demonstrated superior afterload and volume reduction. This study concluded that in-series support was able to produce effective afterload reduction and preserve the valve functionality and native blood flow pattern, eliminating complications associated with in-<span class="hlt">parallel</span> support. PMID:26511211</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4164817','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4164817"><span id="translatedtitle">Analysis and <span class="hlt">Simulation</span> of the Dynamic Spectrum Allocation Based on <span class="hlt">Parallel</span> Immune Optimization in Cognitive Wireless Networks</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Huixin, Wu; Duo, Mo; He, Li</p> <p>2014-01-01</p> <p>Spectrum allocation is one of the key issues to improve spectrum efficiency and has become the hot topic in the research of cognitive wireless network. This paper discusses the real-time feature and efficiency of dynamic spectrum allocation and presents a new spectrum allocation algorithm based on the master-slave <span class="hlt">parallel</span> immune optimization model. The algorithm designs a new encoding scheme for the antibody based on the demand for convergence rate and population diversity. For improving the calculating efficiency, the antibody affinity in the population is calculated in multiple computing nodes at the same time. <span class="hlt">Simulation</span> results show that the algorithm reduces the total spectrum allocation time and can achieve higher network profits. Compared with traditional serial algorithms, the algorithm proposed in this paper has better speedup ratio and <span class="hlt">parallel</span> efficiency. PMID:25254255</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20110010844','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20110010844"><span id="translatedtitle">A Computer <span class="hlt">Simulation</span> of the System-Wide Effects of <span class="hlt">Parallel</span>-Offset Route Maneuvers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl</p> <p>2010-01-01</p> <p>Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying <span class="hlt">parallel</span>-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing <span class="hlt">parallel</span>-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2010JCoPh.229.5123R&link_type=ABSTRACT','NASAADS'); return false;" href="http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2010JCoPh.229.5123R&link_type=ABSTRACT"><span id="translatedtitle"><span class="hlt">Parallel</span> finite element <span class="hlt">simulations</span> of incompressible viscous fluid flow by domain decomposition with Lagrange multipliers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Rivera, Christian A.; Heniche, Mourad; Glowinski, Roland; Tanguy, Philippe A.</p> <p>2010-07-01</p> <p>A <span class="hlt">parallel</span> approach to solve three-dimensional viscous incompressible fluid flow problems using discontinuous pressure finite elements and a Lagrange multiplier technique is presented. The strategy is based on non-overlapping domain decomposition methods, and Lagrange multipliers are used to enforce continuity at the boundaries between subdomains. The novelty of the work is the coupled approach for solving the velocity-pressure-Lagrange multiplier algebraic system of the discrete Navier-Stokes equations by a distributed memory <span class="hlt">parallel</span> ILU (0) preconditioned Krylov method. A penalty function on the interface constraints equations is introduced to avoid the failure of the ILU factorization algorithm. To ensure portability of the code, a message based memory distributed model with MPI is employed. The method has been tested over different benchmark cases such as the lid-driven cavity and pipe flow with unstructured tetrahedral grids. It is found that the partition algorithm and the order of the physical variables are central to <span class="hlt">parallelization</span> performance. A speed-up in the range of 5-13 is obtained with 16 processors. Finally, the algorithm is tested over an industrial case using up to 128 processors. In considering the literature, the obtained speed-ups on distributed and shared memory computers are found very competitive.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1996PhDT.......100C','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1996PhDT.......100C"><span id="translatedtitle">Effects of rotation on turbulent convection: Direct numerical <span class="hlt">simulation</span> using <span class="hlt">parallel</span> processors</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Chan, Daniel Chiu-Leung</p> <p></p> <p>A new <span class="hlt">parallel</span> implicit adaptive mesh refinement (AMR) algorithm is developed for the prediction of unsteady behaviour of laminar flames. The scheme is applied to the solution of the system of partial-differential equations governing time-dependent, two- and three-dimensional, compressible laminar flows for reactive thermally perfect gaseous mixtures. A high-resolution finite-volume spatial discretization procedure is used to solve the conservation form of these equations on body-fitted multi-block hexahedral meshes. A local preconditioning technique is used to remove numerical stiffness and maintain solution accuracy for low-Mach-number, nearly incompressible flows. A flexible block-based octree data structure has been developed and is used to facilitate automatic solution-directed mesh adaptation according to physics-based refinement criteria. The data structure also enables an efficient and scalable <span class="hlt">parallel</span> implementation via domain decomposition. The <span class="hlt">parallel</span> implicit formulation makes use of a dual-time-stepping like approach with an implicit second-order backward discretization of the physical time, in which a Jacobian-free inexact Newton method with a preconditioned generalized minimal residual (GMRES) algorithm is used to solve the system of nonlinear algebraic equations arising from the temporal and spatial discretization procedures. An additive Schwarz global preconditioner is used in conjunction with block incomplete LU type local preconditioners for each sub-domain. The Schwarz preconditioning and block-based data structure readily allow efficient and scalable <span class="hlt">parallel</span> implementations of the implicit AMR approach on distributed-memory multi-processor architectures. The scheme was applied to solutions of steady and unsteady laminar diffusion and premixed methane-air combustion and was found to accurately predict key flame characteristics. For a premixed flame under terrestrial gravity, the scheme accurately predicted the frequency of the natural</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/11015901','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/11015901"><span id="translatedtitle">Dislocation emission at the Silicon/Silicon nitride interface: A million atom molecular dynamics <span class="hlt">simulation</span> on <span class="hlt">parallel</span> computers</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Bachlechner; Omeltchenko; Nakano; Kalia; Vashishta; Ebbsjo; Madhukar</p> <p>2000-01-10</p> <p>Mechanical behavior of the Si(111)/Si(3)N4(0001) interface is studied using million atom molecular dynamics <span class="hlt">simulations</span>. At a critical value of applied strain <span class="hlt">parallel</span> to the interface, a crack forms on the silicon nitride surface and moves toward the interface. The crack does not propagate into the silicon substrate; instead, dislocations are emitted when the crack reaches the interface. The dislocation loop propagates in the (1; 1;1) plane of the silicon substrate with a speed of 500 (+/-100) m/s. Time evolution of the dislocation emission and nature of defects is studied. PMID:11015901</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/936676','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/936676"><span id="translatedtitle"><span class="hlt">Simulation</span> of Large <span class="hlt">Parallel</span> Plasma Flows in the Tokamak SOL Driven by Cross-Field Transport Asymmetries</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pigarov, A Y; Krasheninnikov, S I; LaBombard, B; Rognlien, T D</p> <p>2006-06-06</p> <p>Large-Mach-number <span class="hlt">parallel</span> plasma flows in the single-null SOL of different tokamaks are <span class="hlt">simulated</span> with multi-fluid transport code UEDGE. The key role of poloidal asymmetry of cross-field plasma transport as the driving mechanism for such flows is discussed. The impact of ballooning-like diffusive and convective transport and plasma flows on divertor detachment, material migration, impurity flows, and erosion/deposition profiles is studied. The results on well-balanced double null plasma modeling that are indicative of strong asymmetry of cross-field transport are presented.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li class="active"><span>25</span></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_25 --> <center> <div class="footer-extlink text-muted"><small>Some links on this page may take you to non-federal websites. Their policies may differ from this site.</small> </div> </center> <div id="footer-wrapper"> <div class="footer-content"> <div id="footerOSTI" class=""> <div class="row"> <div class="col-md-4 text-center col-md-push-4 footer-content-center"><small><a href="http://www.science.gov/disclaimer.html">Privacy and Security</a></small> <div class="visible-sm visible-xs push_footer"></div> </div> <div class="col-md-4 text-center col-md-pull-4 footer-content-left"> <img src="http://www.osti.gov/images/DOE_SC31.png" alt="U.S. Department of Energy" usemap="#doe" height="31" width="177"><map style="display:none;" name="doe" id="doe"><area shape="rect" coords="1,3,107,30" href="http://www.energy.gov" alt="U.S. Deparment of Energy"><area shape="rect" coords="114,3,165,30" href="http://www.science.energy.gov" alt="Office of Science"></map> <a ref="http://www.osti.gov" style="margin-left: 15px;"><img src="http://www.osti.gov/images/footerimages/ostigov53.png" alt="Office of Scientific and Technical Information" height="31" width="53"></a> <div class="visible-sm visible-xs push_footer"></div> </div> <div class="col-md-4 text-center footer-content-right"> <a href="http://www.osti.gov/nle"><img src="http://www.osti.gov/images/footerimages/NLElogo31.png" alt="National Library of Energy" height="31" width="79"></a> <a href="http://www.science.gov"><img src="http://www.osti.gov/images/footerimages/scigov77.png" alt="science.gov" height="31" width="98"></a> <a href="http://worldwidescience.org"><img src="http://www.osti.gov/images/footerimages/wws82.png" alt="WorldWideScience.org" height="31" width="90"></a> </div> </div> </div> </div> </div> <p><br></p> </div><!-- container --> </body> </html>